BOL: Extract fasta sequence from a multifasta file with fasta header Ids

BioScripts
Abhimanyu Singh
Extract fasta sequence from a multifasta file with fasta header Ids

Extract fasta sequence from a multifasta file with fasta header Ids

By Abhimanyu Singh 2890 days ago Comments (1)

#!/usr/bin/perl

use strict;
use warnings;

#Usage: perl <list_of_ids_one_per_line> <fasta> <outfile>

my $list = shift @ARGV;
my $fasta = shift @ARGV;
my $out = shift @ARGV;
my %select;

open LIST, "$list" or die;
while (<LIST>) {
    chomp;
    s/>//g;
    $select{$_} = 1;
}
close LIST;

$/ = "\n>";
open OUT, ">$out" or die;
open FASTA, "$fasta" or die;
while (<FASTA>) {
    s/>//g;
    my ($id) = split (/\n/, $_);
    print OUT ">$_" if (defined $select{$id});
}
close FASTA;
close OUT;

Comments

- Rahul Nayak@rahul
Rahul Nayak 2282 days ago
perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' fasta.file
If you have a large number of sequences that you want to extract, then you most likely have the sequence identifiers in a separate file. Assuming that you have one sequence identifier per line in the file ids.file, then you can use this one line:
perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.file fasta.file

BOL

Abhimanyu Singh

Our Sponsors

Extract fasta sequence from a multifasta file with fasta header Ids

Comments