Alternative content
Awk is a programming language that is specifically designed for quickly manipulating space delimited data. Although you can achieve all its functionality with Perl, awk is simpler in many practical cases.
Why awk? You can replace a pipeline of 'stuff | grep | sed | cut...' with a single call to awk. For a simple script, most of the timelag is in loading these apps into memory, and it's much faster to do it all with one. This is ideal for something like an openbox pipe menu where you want to generate something on the fly. You can use awk to make a neat one-liner for some quick job in the terminal, or build an awk section into a shell script. You can find a lot of online tutorials, but here I will only show a few examples which cover most of bioinformatician daily uses of awk.
choose rows where column 3 is larger than column 5:
awk '$3>$5' input.txt > output.txt
extract column 2,4,5:
awk '{print $2,$4,$5}' input.txt > output.txt
awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt
show rows between 20th and 80th:
awk 'NR>=20&&NR<=80' input.txt > output.txt
calculate the average of column 2:
awk '{x+=$2}END{print x/NR}' input.txt
regex (egrep):
awk '/^test[0-9]+/' input.txt
calculate the sum of column 2 and 3 and put it at the end of a row or replace the first column:
awk '{print $0,$2+$3}' input.txt
awk '{$1=$2+$3;print}' input.txt
join two files on column 1:
awk 'BEGIN{while((getline<"file1.txt")>0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt > output.txt
count number of occurrence of column 2 (uniq -c):
awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt
apply "uniq" on column 2, only printing the first occurrence (uniq):
awk '!($2 in l){print;l[$2]=1}' input.txt
count different words (wc):
awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt
deal with simple CSV:
awk -F, '{print $1,$2}'
substitution (sed is simpler in this case):
awk '{sub(/test/, "no", $0);print}' input.txt
OK now here's where to read this stuff properly explained. roll
Two thorough tutorials:
http://www.gnu.org/software/gawk/manual/gawk.html
http://www.grymoire.com/Unix/Awk.html
A famous list of useful one-liners - though they're short, many are quite tricky:
http://www.pement.org/awk/awk1line.txt
And some nice explanations of those one-liners. After reading this you'll have a pretty good grasp!
http://www.catonmat.net/blog/awk-one-li … -part-one/
http://www.catonmat.net/blog/ten-awk-ti … -pitfalls/