Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




bash script to extract sequence by ids !

Use a Perl one-liner, grep and seqtk subseq to extract the desired fasta sequences: # Create test input: cat > in.fasta <<EOF >BGI_novel_T016697 Solyc03g033550.3.1 CTGACGTATACAATTAAGCCGCG >BGI_novel_T016313 Solyc03g025570.2.1 TTCAAGTGTTAGTTTCACATCAT >BGI_novel_T018109 Solyc03g080075.1.1 GCAAGGGAAAGAAGTATTACTAG >BGI_novel_T016817 BGI_novel_G001220 GCCCAAGTCATAGGTAGTGCCTG >BGI_novel_T016141 Solyc03g007600.3.1 ACGTACGTACGTACGTACGTACG EOF cat > gene_ids.txt <<EOF Solyc03g033550.3.1 Solyc03g080075.1.1 Solyc00g256710.2.1 Solyc01g010890.3.1 EOF # Extract ids and gene ids into a tsv file: perl -lne '@f = /^>(\S+)\s+(\S+)/ and print join "\t", @f;' in.fasta > ids_gene_ids.tsv # Select ids that correspond to the desired gene ids: grep -f gene_ids.txt ids_gene_ids.tsv | cut -f1 > ids.selected.txt # Extract fasta sequence that correspond to desired gene ids: seqtk subseq in.fasta ids.selected.txt > out.fasta cat out.fasta Output: >BGI_novel_T016697 Solyc03g033550.3.1 CTGACGTATACAATTAAGCCGCG >BGI_novel_T018109 Solyc03g080075.1.1 GCAAGGGAAAGAAGTATTACTAG Note that seqtk can be installed, for example, using conda.