Life and computing: 리눅스 Tip (with Linux Commands)

References: Command Line Tricks For Data Scientists

find

파일 찾기

$ find 파일찾을위치 -name 파일명
예: $ find ./ -name aaa.txt

grep

특정 문자열를 담고 있는 파일을 검색

$ grep -r 찾을문자열 대상위치
예: $ grep -r "aaa" ./*

해당 파일에서 특정 단어가 몇번 사용되었는 지 count

$ grep -c 단어 filename.csv

grep을 colorful하게 설정

$ alias grep="grep --color=auto"

nautilus

nautilus 설치

$ sudo apt-get install nautilus-open-terminal

iconv

텍스트 파일의 인코딩 변경하기.

$ iconv -f ISO-8859-1 -t UTF-8 input.txt output.txt

모든 인코딩 목록 조회

$ iconv -l

convert 할 수 없는 문자는 silent하게 discard

$ iconv -c

head

기본 출력은 10줄
3줄만 출력하고 싶다면

$ head -n 3 filename.txt

특정 개수의 bytes만 출력하고 싶다면

$ head -c 4 filename.txt

텍스트 일부분의 comma를 pipe로 변경하여 출력해보고 싶다면,

$ head mydata.csv | sed 's/,/|/g'

tr

파일 내의 tab을 comma로 변환하기

$ cat tab_delimited.txt | tr "\\t" "," > comma_delimited.csv

파일 내의 대문자를 소문자로 변환하기 (basic regex)

cat filename.csv | tr '[A-Z]' '[a-z]'

파일 내의 특정 문자 클래스를 translate 하기

cat README.md | tr "[:punct:][:space:]" "\n" | tr "[:upper:]" "[:lower:]" | grep . | sort | uniq -c | sort -nr
[:alnum:] all letters and digits
[:alpha:] all letters
[:blank:] all horizontal whitespace
[:cntrl:] all control characters
[:digit:] all digits
[:graph:] all printable characters, not including space
[:lower:] all lower case letters
[:print:] all printable characters, including space
[:punct:] all punctuation characters
[:space:] all horizontal or vertical whitespace
[:upper:] all upper case letters
[:xdigit:] all hexadecimal digits

wc

word count
파일의 라인수 세기

$ wc -l myfile.txt

파일 내의 단어 개수

$ wc -w

파일 내의 문자 개수

$ wc -m

split

파일을 일정 라인별로 자르기

$ split -l 500 filename.csv new_filename_

더불어 특정 디렉토리 내의 모든 파일에 확장자 붙이기

$ find . -type f -exec mv '{}' '{}'.csv \;

sed

line-by-line으로 작동하는 스트림 에디터.

(파일 끝까지 변경) s/old/new/g
(파일에서 맨처음 발견된 한번만 변경) s/old/new

awk

1977년 Brian Kernighan.
텍스트 처리 및 수치 연산, 문자연산에 활용 가능.
튜토리얼1

Life and computing

2014년 11월 3일 월요일

리눅스 Tip (with Linux Commands)

find

grep

nautilus

iconv

head

tr

wc

split

sed

awk

댓글 없음:

댓글 쓰기