Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Question: Question: How to separate FASTA sequences by their name in files?

Manshi Raghubanshi
4123 days ago

Question: How to separate FASTA sequences by their name in files?

I have one big file with lots of fasta sequences. I would like to separate the sequences by their name in files. Can anyone please help me to do so. Thanks for your time.

Sample fasta sequences in file:

>SBI_03055_PP_HisKa|Streptomyces bingchenggensis BCW-1
MSSSDAARTGSAEGARDSKRVRTRRTLRDWIVDVCCISLAALFSLTASESMATDPSVSDE
ALFADLMAGVVACLALWLRRRRPVELALVLLAAGVVSYYVAGPLLVALFTVAVHRPLRTV
AWVGGAALAQIFAAPAVHPDPDLDYIGDVLLGALLVSGAIGWGMFVRSRRLLLESLRERA
ARAEAEAALRAERTQRLTRERIAREMHDVLAHRLSLLSVHAGALEYRADASPQEVAEAAG
VIRSSAHQALQDLREVIGVLRAPDSDATAEGSPPDRPQPTLADLPRLVEESRRAGMRVTL
SDEAGVAGADA
>SBI_03056_RR_NarL|Streptomyces bingchenggensis BCW-1
VSSSPPESTDPAPAPTPTPPAPDPAPTPTPGSGPSLTPIRLLVVDDDPLVRAGLRLMLGG
ASSGIEIVGEASDGAEVAALVDRHSPDVVLMDIRMPTVDGLTATEQLRQREPAPEVVVLT
TFNADEHVLRALRAGAAGFVLKDTPPADLVAAVRRVAAGEPVLSPTVTQQLIEHVAGSGR
DARQERARALLDQLNDREREVAVAVGEGKSNAEISAGLFMSVATVKTHVSRILTKLDLNN
RVQIALLAHDAGLLE

Answers
1

Hi Manshi, In this script I assume you have only two different type of fasta files, one have PP and second RR names in fasta sequence names.

#/usr/bin/perl
use strict;

open(INFILE, 'input.fasta') || (warn "Can't open \n");
open OUTFILE1, ">" , 'abc1.fasta' or die "open abc1.fasta: $!";
open OUTFILE2, ">" , 'abc2.fasta' or die "open abc2.fasta: $!";

$/ = "\>"; # to break the line with > sign ..

while () {
chomp $_;
if ($_=~/_RR_/) { print OUTFILE1 ">$_\n"; } ## You can change the string _RR_ to anyting
elsif ($_=~/_PP_/) { print OUTFILE2 ">$_\n"; }

}
close INFILE; close OUTFILE1; close OUTFILE2;

 #Warning the script is not tested

0

Use Shell to convert multifasta file into fasta file