Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Bash script to handle Multifasta files

  • Public
By BioStar 1357 days ago
#Convert all lowercase residues to uppercase in a FASTA sequence file $ awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' input.fasta > output.fasta #Rearrange FASTA sequences according to their length $ awk '/^>/ {printf("%s%s\t",(N>0?"\n":""), $0);N++;next;} {printf("%s",$0);} END {printf("\n");}' input.fasta |\ >awk -F '\t' '{printf("%d\t%s\n",length($2),$0;)}' |\ >sort -k1,1n | cut -f 2- |tr "\t" "\n" > output.fasta #Add ‘>’ at the beginning of headers in a FASTA file $ awk '{if ($0 ~/_/) {printf ">";} print $0; }' input.fasta > output.fasta #Match FASTA headers in two different multi-FASTA files $ awk 'NR=FNR{a[$0];next}$0 in a{print $0}' input1.fasta input2.fasta #Merge all FASTA files in a directory into a single FASTA file $ awk'1' *.fa > all.fa