<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Ancient whole genome duplication (WGD) detection tools !]]></title>
	<link>https://bioinformaticsonline.com/blog/view/42936/ancient-whole-genome-duplication-wgd-detection-tools?</link>
	<atom:link href="https://bioinformaticsonline.com/blog/view/42936/ancient-whole-genome-duplication-wgd-detection-tools?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42936/ancient-whole-genome-duplication-wgd-detection-tools</guid>
	<pubDate>Sun, 07 Mar 2021 00:32:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42936/ancient-whole-genome-duplication-wgd-detection-tools</link>
	<title><![CDATA[Ancient whole genome duplication (WGD) detection tools !]]></title>
	<description><![CDATA[<p>There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.</p><p>At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:</p><p><strong>GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline</strong><br /><strong>https://peerj.com/articles/6303/</strong><br /><strong>WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector</strong><br /><strong>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3</strong><br /><strong>wgd: https:// github.com/arzwa/wgd</strong><br /><strong>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1</strong><br /><strong>https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x</strong><br /><strong>GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2</strong><br /><strong>https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x</strong><br /><strong>https://github.com/dfguan/purge_dups</strong><br /><strong>https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1</strong></p><p>This article introduces the usage of wgd.</p><p>Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software</p><p><strong>BLAST</strong><br /><strong>MCL</strong><br /><strong>MUSCLE/MAFFT/PRANK</strong><br /><strong>PAML</strong><br /><strong>PhyML/FastTree</strong><br /><strong>i-ADHoRe</strong></p><p>But the good news is that most of the software it depends on can be installed with bioconda</p><blockquote><p>conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich<br />conda activate wgd</p></blockquote><p>Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory</p><p>After that, the installation is much simpler</p><blockquote><p>git clone https://github.com/arzwa/wgd.git<br />cd wgd<br />pip install .<br />pip install git+https://github.com/arzwa/wgd.git<br />For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0</p></blockquote><p>Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/</p><blockquote><p>tar -zxvf i-adhore-3.0.01.tar.gz<br />cd i-adhore-3.0.01<br />mkdir -p build &amp;&amp; cd build<br />cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/<br />make -j 4 <br />make insatall</p></blockquote><p>Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)</p><p><strong>Download the tutorial for CDS and GFF annotation files</strong></p><blockquote><p><strong>mkdir -p wgd_tutorial &amp;&amp; cd wgd_tutorial</strong><br /><strong>wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz</strong><br /><strong>wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz</strong><br /><strong>gunzip *.gz</strong></p></blockquote><p>First conda activate wgdstart our analysis environment, and then start the analysis</p><p>Step 1 : Use to wgd mclidentify homologous genes in the genome</p><blockquote><p>wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out</p></blockquote><p>Step 2 : Use to wgd ksdbuild Ks distribution</p><blockquote><p>wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta</p></blockquote><p>Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point</p><blockquote><p>wgd syn --feature gene --gene_attribute ID \<br /> -ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \<br /> Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl</p></blockquote><p>&nbsp;For more reading - There are 9 sub-modules in WGD</p><ul>
<li><span>kde: KDE fitting to the Ks distribution</span></li>
<li><span>ksd: Ks distribution construction</span></li>
<li><span>mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.</span></li>
<li><span><span>mix: Hybrid modeling of Ks distribution.</span></span></li>
<li><span>pre: preprocess the CDS file</span></li>
<li><span>syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis</span></li>
<li><span>viz: draw histogram and density plot</span></li>
<li><span>wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn</span></li>
<li><span>wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD</span></li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>