BOL: Bash script to split multifasta file !

BioScripts
Neel
Bash script to split multifasta file !

Bash script to split multifasta file !

By Neel 1194 days ago

#Using awk, we can easily split a file (multi.fa) into chunks of size N (here, N=500), by using the following one-liner:

awk 'BEGIN {n=0;} /^>/ {if(n%500==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fa

#OR

awk -v chunksize=$(grep ">" multi.fasta -c) 'BEGIN{n=0; chunksize=int(chunksize/10)+1 } /^>/ {if(n%chunksize==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fasta

#Another great solution is genome tools (gt), which you can find here: http://genometools.org/, which has the following simple command:

gt splitfasta -numfiles 10 multi.fasta

BOL

Neel

Our Sponsors

Bash script to split multifasta file !