BOL: Extract fasta sequence from a multifasta file with fasta header Ids

Extract fasta sequence from a multifasta file with fasta header Ids

Abhimanyu Singh — Tue, 13 Jun 2017 07:49:37 -0500

#!/usr/bin/perl

use strict;
use warnings;

#Usage: perl   

my $list = shift @ARGV;
my $fasta = shift @ARGV;
my $out = shift @ARGV;
my %select;

open LIST, "$list" or die;
while () {
    chomp;
    s/>//g;
    $select{$_} = 1;
}
close LIST;

$/ = "\n>";
open OUT, ">$out" or die;
open FASTA, "$fasta" or die;
while () {
    s/>//g;
    my ($id) = split (/\n/, $_);
    print OUT ">$_" if (defined $select{$id});
}
close FASTA;
close OUT;

Comment by Rahul Nayak

Rahul Nayak — Mon, 11 Feb 2019 06:07:39 -0600

perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' fasta.file

If you have a large number of sequences that you want to extract, then you most likely have the sequence identifiers in a separate file. Assuming that you have one sequence identifier per line in the file ids.file, then you can use this one line:

perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.file fasta.file