<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Python script to extract a protein sequence from a genome using a General Feature Format (GFF) file !]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/44448/python-script-to-extract-a-protein-sequence-from-a-genome-using-a-general-feature-format-gff-file?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/44448/python-script-to-extract-a-protein-sequence-from-a-genome-using-a-general-feature-format-gff-file?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44448/python-script-to-extract-a-protein-sequence-from-a-genome-using-a-general-feature-format-gff-file</guid>
	<pubDate>Thu, 01 Feb 2024 01:48:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44448/python-script-to-extract-a-protein-sequence-from-a-genome-using-a-general-feature-format-gff-file</link>
	<title><![CDATA[Python script to extract a protein sequence from a genome using a General Feature Format (GFF) file !]]></title>
	<description><![CDATA[<code>#You typically need the corresponding genome sequence file in FASTA format. The GFF file contains information about the #features (such as genes) in the genome, including their locations and annotations.

#The outline of the steps :

#Parse the GFF file to extract information about the gene locations.
#Use the gene locations to extract the corresponding DNA sequences from the genome in FASTA format.
#Translate the DNA sequences into protein sequences.

#Simple example using Python and Biopython


from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq

def extract_protein_sequence(gff_file, genome_fasta, gene_id):
    # Step 1: Parse the GFF file
    gene_locations = {}
    with open(gff_file, &#039;r&#039;) as gff:
        for line in gff:
            if not line.startswith(&#039;#&#039;):
                fields = line.strip().split(&#039;\t&#039;)
                if fields[2] == &#039;gene&#039;:
                    gene_id = fields[8].split(&#039;;&#039;)[0].split(&#039;=&#039;)[1]
                    gene_locations[gene_id] = (int(fields[3]), int(fields[4]))

    # Step 2: Extract DNA sequence from the genome
    genome_record = SeqIO.read(genome_fasta, &#039;fasta&#039;)
    gene_start, gene_end = gene_locations[gene_id]
    gene_dna_sequence = genome_record.seq[gene_start - 1:gene_end]

    # Step 3: Translate DNA sequence into protein sequence
    gene_protein_sequence = gene_dna_sequence.translate()

    return gene_protein_sequence

# Example usage
gff_file = &#039;path/to/your/file.gff&#039;
genome_fasta = &#039;path/to/your/genome.fasta&#039;
gene_id_to_extract = &#039;your_gene_id&#039;

protein_sequence = extract_protein_sequence(gff_file, genome_fasta, gene_id_to_extract)
print(protein_sequence)</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>