<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Python script to download covid genome !]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/43000/python-script-to-download-covid-genome?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/43000/python-script-to-download-covid-genome?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/43000/python-script-to-download-covid-genome</guid>
	<pubDate>Fri, 26 Mar 2021 07:01:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/43000/python-script-to-download-covid-genome</link>
	<title><![CDATA[Python script to download covid genome !]]></title>
	<description><![CDATA[<code>#!/usr/bin/env python3

# these are the publicly available &quot;complete&quot; sequences
# https://www.gisaid.org/ has more (1200?), but they require you to sign up

import requests
import yaml

seqs = yaml.load(requests.get(&quot;https://www.ncbi.nlm.nih.gov/core/assets/genbank/files/ncov-sequences.yaml&quot;).text)
seqs = seqs[&#039;genbank-sequences&#039;]
print(&quot;got %d sequences&quot; % len(seqs))

from Bio import Entrez
allseq = {}
for x in seqs:
  if &#039;gene-region&#039; in x and x[&#039;gene-region&#039;] == &quot;complete&quot;:
    nm = x[&#039;accession&#039;]
    print(&quot;downloading&quot;, nm)
    dna = Entrez.efetch(db=&#039;nucleotide&#039;,id=nm, rettype = &#039;fasta&#039;, retmode= &#039;text&#039;).read().split(&quot;\n&quot;)[1:]
    allseq[nm] = &#039;&#039;.join(dna)

import json
with open(&quot;data/allseq.json&quot;, &quot;w&quot;) as f:
  json.dump(allseq, f)</code>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>

</channel>
</rss>