BOL: Bash script to handle Multifasta files

BioScripts
BioStar
Bash script to handle Multifasta files

Bash script to handle Multifasta files

By BioStar 2028 days ago

#Convert all lowercase residues to uppercase in a FASTA sequence file

$ awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' input.fasta > output.fasta

#Rearrange FASTA sequences according to their length

$ awk '/^>/ {printf("%s%s\t",(N>0?"\n":""), $0);N++;next;} {printf("%s",$0);} END {printf("\n");}' input.fasta |\
>awk -F '\t' '{printf("%d\t%s\n",length($2),$0;)}' |\
>sort -k1,1n | cut -f 2- |tr "\t" "\n" > output.fasta

#Add ‘>’ at the beginning of headers in a FASTA file

$ awk '{if ($0 ~/_/) {printf ">";} print $0; }' input.fasta > output.fasta

#Match FASTA headers in two different multi-FASTA files

$ awk 'NR=FNR{a[$0];next}$0 in a{print $0}' input1.fasta input2.fasta

#Merge all FASTA files in a directory into a single FASTA file

$ awk'1' *.fa > all.fa

BOL

BioStar

Our Sponsors

Bash script to handle Multifasta files