<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: bash script to extract sequence by ids !]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/43735/bash-script-to-extract-sequence-by-ids?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/43735/bash-script-to-extract-sequence-by-ids?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/43735/bash-script-to-extract-sequence-by-ids</guid>
	<pubDate>Tue, 01 Feb 2022 23:20:42 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/43735/bash-script-to-extract-sequence-by-ids</link>
	<title><![CDATA[bash script to extract sequence by ids !]]></title>
	<description><![CDATA[<code>Use a Perl one-liner, grep and seqtk subseq to extract the desired fasta sequences:

# Create test input:

cat &gt; in.fasta &lt;&lt;EOF
&gt;BGI_novel_T016697 Solyc03g033550.3.1
CTGACGTATACAATTAAGCCGCG
&gt;BGI_novel_T016313 Solyc03g025570.2.1
TTCAAGTGTTAGTTTCACATCAT
&gt;BGI_novel_T018109 Solyc03g080075.1.1
GCAAGGGAAAGAAGTATTACTAG
&gt;BGI_novel_T016817 BGI_novel_G001220
GCCCAAGTCATAGGTAGTGCCTG
&gt;BGI_novel_T016141 Solyc03g007600.3.1
ACGTACGTACGTACGTACGTACG
EOF

cat &gt; gene_ids.txt &lt;&lt;EOF
Solyc03g033550.3.1
Solyc03g080075.1.1
Solyc00g256710.2.1
Solyc01g010890.3.1
EOF

# Extract ids and gene ids into a tsv file:
perl -lne &#039;@f = /^&gt;(\S+)\s+(\S+)/ and print join &quot;\t&quot;, @f;&#039; in.fasta &gt; ids_gene_ids.tsv

# Select ids that correspond to the desired gene ids:
grep -f gene_ids.txt ids_gene_ids.tsv | cut -f1 &gt; ids.selected.txt

# Extract fasta sequence that correspond to desired gene ids:
seqtk subseq in.fasta ids.selected.txt &gt; out.fasta                

cat out.fasta
Output:

&gt;BGI_novel_T016697 Solyc03g033550.3.1
CTGACGTATACAATTAAGCCGCG
&gt;BGI_novel_T018109 Solyc03g080075.1.1
GCAAGGGAAAGAAGTATTACTAG
Note that seqtk can be installed, for example, using conda.</code>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

</channel>
</rss>