<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/34488?offset=370</link>
	<atom:link href="https://bioinformaticsonline.com/related/34488?offset=370" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/36405/earth-biogenome-project</guid>
	<pubDate>Wed, 25 Apr 2018 07:48:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/36405/earth-biogenome-project</link>
	<title><![CDATA[Earth BioGenome Project]]></title>
	<description><![CDATA[<p><span>The central goal of the Earth BioGenome Project is to understand the evolution and organization of life on our planet by sequencing and functionally annotating the genomes of 1.5 million known species of eukaryotes, a massive group that includes plants, animals, fungi and other organisms whose cells have a nucleus that houses their chromosomal DNA. To date, the genomes of less than 0.2 percent of eukaryotic species have been sequenced.&nbsp;</span></p><p><span>More at&nbsp;https://www.ucdavis.edu/news/earth-biogenome-project-aims-sequence-dna-all-complex-life</span></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36897/gmcloser-closing-gaps-in-assemblies-accurately-with-a-likelihood-based-selection-of-contig-or-long-read-alignments</guid>
	<pubDate>Mon, 11 Jun 2018 05:43:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36897/gmcloser-closing-gaps-in-assemblies-accurately-with-a-likelihood-based-selection-of-contig-or-long-read-alignments</link>
	<title><![CDATA[GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments]]></title>
	<description><![CDATA[GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3–100-fold higher than those of other available tools, with similar efficiency.

https://academic.oup.com/bioinformatics/article/31/23/3733/209212<p>Address of the bookmark: <a href="https://academic.oup.com/bioinformatics/article/31/23/3733/209212" rel="nofollow">https://academic.oup.com/bioinformatics/article/31/23/3733/209212</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/37905/phased-human-genome-assembly</guid>
	<pubDate>Mon, 08 Oct 2018 09:10:54 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/37905/phased-human-genome-assembly</link>
	<title><![CDATA[Phased Human Genome Assembly !]]></title>
	<description><![CDATA[<p>The new publicly available assembly (PacBio&nbsp;<a href="https://www.globenewswire.com/Tracker?data=IM2cKfZgtHafORdb9VSstujBjyW-aIzFILCtXNAkcY_yqVmxdjvG01R_FZQC7zLxs-alqquXwsW6MG98G9-g-ym8Nue2pmUZMtkIg3FIat2mYbJ-z2Ra367GlinbO13x" target="_blank" title=""><span style="text-decoration: underline;">HG00733</span></a>) has the fewest gaps of any human genome assembly, with more than half of the genome contained in gapless sequence at least 27 Mb long. The primary contig assembly is 2.89 Gb long and consists of 865 contigs that were assembled with PacBio data generated with the company&rsquo;s Sequel<span>&reg;</span>&nbsp;System. Using the&nbsp;<a href="https://www.globenewswire.com/Tracker?data=jOa6mE1Y5r8VbU1CaCgx1A0HsoVzJ7waxOiDKgvmKL6cwJq_eH4nWrGj2vLkNpxHl1-5CH4htDB4113PXT8WU60hvHQ-KKpvAwQwveEGvz3N4d0q7QHSa_X97LW8_9xEiYqfsc4d24ca-IpVYZsf7Ue-XL7fSIIZw_EHK-F96t1aaQNRcD-z1PP5qvlZbVwX" target="_blank" title=""><span style="text-decoration: underline;">FALCON-Unzip assembler</span></a>, maternal and paternal haplotypes were resolved over more than 80% of the genome. Maternal and paternal haplotype blocks were then further phased using Hi-C technology and the&nbsp;<a href="https://www.globenewswire.com/Tracker?data=jOa6mE1Y5r8VbU1CaCgx1IrQmRcKvNQm83FLTqQE6OGzutM-fEggnm4Z-nsniK0D_YmDKS_UKWE0NHtHbgvbL973Y2-9NhrWhYKizXQ4lpiTvlqPf1UZdjqVs7BDjISgDnovv8foYw8es8jQzAg5Xfq1CH36NOnWQgA_X04XSvyEEEj0q801Im6cV5M5K4eL15vb_ZgUayccOvDY_fc6lxxPAAAyA4h16-zUN44Y81KdujciCrJrv5xynMIXEjRsaIKCf6eCX_Q1j_uZlN5TD0MVr6HulTYG8lGgyL0x-eQ=" target="_blank" title=""><span style="text-decoration: underline;">FALCON-Phase method</span></a>developed in collaboration with Phase Genomics. The genome was then&nbsp;<em>de novo</em>&nbsp;scaffolded using Phase Genomics&rsquo;&nbsp;<a href="https://www.globenewswire.com/Tracker?data=4wcqEWHJpCHRJARQkC0oVkYT9htT14iVebujxcW1nMpAjmigHGQ46ObCGetRfyaZm1ADIHaV1-30B9izTAhjJ-efhFlxorUxs08kdV-9AAzQyuHJ9S7wxnRRnyegsTZd" target="_blank" title=""><span style="text-decoration: underline;">Proximo Hi-C platform</span></a>, resulting in the first chromosome-scale diploid assembly of a single individual accomplished with only two technologies. More specific details about the assembly are included on the PacBio blog.</p><p>The data are available using NCBI accession IDs: BioProject: (<a href="https://www.globenewswire.com/Tracker?data=YZtCuhY2wu5H0yIso9jtUufPXbwyHh1QOZ1jBggGpK5NtXaU_JGC9X39F3uHZ96uVmu6hW5OB2Qq805hUEW2OhSNCm630yFiEF6_nsAwYB0=" target="_blank" title=""><span style="text-decoration: underline;">PRJNA483067</span></a>), assembly: [<a href="https://www.globenewswire.com/Tracker?data=CEXZ7E56JOsRgfH4Wq3r5LVbv4QH_UIekV9idYBys9l8K7pFft824jmYWNzJqK7lQ9fMbaAtbURpm8gM7zqUbpPUrydFwrkJGGtG-NBHctjyjddiFY-p06xZPm2mHXE2" target="_blank" title=""><span style="text-decoration: underline;">RBJD00000000</span></a>] and sequence data (<a href="https://www.globenewswire.com/Tracker?data=pELP2RpqTqTRaPF9yN1N7GZYlQmTxpY0aW-B8xaNw6iyD-Lylw7X3UzMDK3YS4AIYgLtD13em2XsbzOwKhXuNbI4Ks6-LSyXl1_yVdFoB0U=" target="_blank" title=""><span style="text-decoration: underline;">SRP155659</span></a>).</p><p><span>Additional Resources</span></p><ul>
<li><a href="http://globenewswire.com/Tracker?data=zXpdadphSgIAIEWeq46yRPm5-TU0H7wTkL48ue4I9GsaHd5mJyMb9PgXgAsElREkLOCOdWdJ8uW9DHB-LyQ7xhzbd97Qis6CuAlqD0ubGgY%3D" target="_blank" title=""><span style="text-decoration: underline;">Interactive map</span></a>&nbsp;showcasing global initiatives underway to generate reference-quality human genome assemblies for diverse populations</li>
<li><a href="http://globenewswire.com/Tracker?data=EQ8NIaaa8k1Nw1MPRJYIHYrqgsDy92kU8W0siJdGQhq5IJ0dcb890PFFm-C1SrAlFf0xkxUVRxZefFK5ebhoIzmS-6OjR1G9sTxOkCOwRHCAZWmHL-e7uGSuZYcw1VsDp8AeDWO0RwcepMMB6hAoR6BBCJDiJVVZtdFlWBn2uxs%3D" target="_blank" title=""><span style="text-decoration: underline;">BioReport Podcast</span></a>&nbsp;on the value of ethnic-specific reference genomes</li>
<li><em>Nature Reviews Genetics</em>&nbsp;paper from NHGRI:&nbsp;<a href="http://globenewswire.com/Tracker?data=dffu-wPD_JX1_KVeCA6VFy-kP1tlAUbn7d85saXD59dnnJfT2BE3N_Rbm6kT4BvifA_XEs49ioa75cy4HyFi90RA_LRa2QFF6Y4mr-dcoMucljZw0K4JNDZuwWkWPE51cVC2Lqq3E3C1aZ8un6Bq3i-OO_NiVH0hh23hUw4wC84%3D" target="_blank" title=""><span style="text-decoration: underline;">Prioritizing&nbsp;diversity&nbsp;in human genomics research</span></a></li>
<li>Article in&nbsp;<em>The Journal of Precision Medicine</em>: &ldquo;<a href="http://globenewswire.com/Tracker?data=yokLqO2TCBLCdj6uZl-GYbqcGMWBerBYjSPrLMumNrWF2p5XlXq9yl5p-1b5xx3Ckfn5ZjQWkdhxLttbiNae5gccUCP-9RWPUqvTu9MuU9zgJ1c8e14lAladCuEOiVZ2oVRiqssPtLu9hgQWw4ad5EUxZemevsHE4BHC6IiFmMZ6DS6ApwZu-IonFgCFBIcjWOpitQthDASosfaqkMi9LsKgLU9F0WGVJDDOzHXpddhjfCUdEEJ7xC1p8uh9TSiCZgZV6XPlUJSe8n0C_9TtOw%3D%3D" target="_blank" title=""><span style="text-decoration: underline;">Minority Report &ndash; Ethnic Diversity and the Real Promise for Precision Medicine</span></a>&rdquo;</li>
<li>Article&nbsp;in&nbsp;<em>Bio-IT World</em>: &ldquo;<a href="http://globenewswire.com/Tracker?data=rLp1pKetctTPitNEnRjOVDZ3Cvw3FUdL6_ybXncvhjR4ksOrX3y6HUK8WtLlKHT7XZzq_woUjZ-uw20YNvsP0GZAmy5lVqETt27oBLi02wFtTH_6ubELIHtBu8vfVyKnqKp-YhosFG5K7y0RUtzmNjOAlCYPAeVXabn2a2AiSePxUXA_tSy_g79hjYm63x9dPN9oFQGYedOsyHD_ls8DKw%3D%3D" target="_blank" title=""><span style="text-decoration: underline;">Genomic Data Standards Are a Necessity</span></a>&rdquo;</li>
<li>NHGRI Project Award:&nbsp;<a href="http://globenewswire.com/Tracker?data=FbqTEeRffJ88lFryYX6MiOefXvIXFdZDAyW4nrFoYNHaJyMEYIcb7I4BIcEQmxzsKOjrlf9F8irfRJeJLOqG8KFsl-kvkhakUkg3BfYdKGnpLzKYyWbUFR0aKMeEXirHBi7oDLEUSDO45qxANwxyee-pqZXfzAIwF1Wcuaf7EIzNqRqmBUJ3TyNyI05lwAo9gDKmApMnJo5VxPj5P_6rY8lisuv1PNSAh_kJPOuhVBk%3D" target="_blank" title=""><span style="text-decoration: underline;">High Quality Human and Non-Human Primate Genome Assemblies</span></a></li>
</ul><p>More details are available on the PacBio website:</p><ul>
<li>Blog post:&nbsp;<a href="http://globenewswire.com/Tracker?data=ycj-ujgsKzVyljNa11buVmIS5tk9B733VsFZEw77nBXo-IkBvcoG16dN9vuTiY3nm2G5dJZS5Iva3w_znrEtJVDuU8cVlFpozY2ibinKwrMGxkXZVSqW8_uD8fbySRjM5Q_cjuPU22ARFSSLCc9vHJx9WHnb9Rza-qPbuWgewa0rWWStq2fQY5mLpeaQf5fcDJnyQkvDAMI3fauXdzyThg%3D%3D" target="_blank" title=""><span style="text-decoration: underline;">Data Release: Highest-Quality, Most Contiguous Individual Human Genome Assembly to Date</span></a></li>
<li>Blog post:&nbsp;<a href="http://globenewswire.com/Tracker?data=GlZZ9nyp5mDSjJPPfhVD1-dZ_W2l8s0eAUox3TQs949zyGjzO7dx9xodyvyqerdqPC-G3ZhdPEs9xNhJwflrwgHPYQL3kTofprKHBBq3O4gn9E75YUBweJw9b6tTE89sMLUQzF-vRNNDjero3mibm_uG-fSHoYBTm2ZlyEmwzZ5E9tXVd5_RjG0Xnej2E0scA0SncEItAF6Q7vdOydTV_Yr9yYT2TmKY5jtyAt6ZrNGn3McqfV9mMRkR-8dYJLqrQln9JiEkWTwUae6Blj56HyjyXKl6Dfa_CyNuy4r-EWU%3D" target="_blank" title=""><span style="text-decoration: underline;">For Reference-Grade Human Genome Assemblies, SMRT Sequencing Yields Optimal Results</span></a></li>
<li>Webinar: &nbsp;<a href="http://globenewswire.com/Tracker?data=xlnfDwMNLGZZvtexJYsUgMe-DV8HNrYx2QqjwIjfj40dToVtqrBi-gvhknHZmIe8GV_3WU3_9LIlP6GzG3ZoajnDIpwECzdMV5Vyy8Ast4Y2AiHJckf7rBhZVEU4_mV4JB0k3I9XjN2jHK8Cp5uBxyIWWqPdI6qBBdCYYhYLXUTkKpaZEV98oCfC5ET2Q7OSwUM7NieKa75yzMHwaPEYwg%3D%3D" target="_blank" title=""><span style="text-decoration: underline;">Assembling High-Quality Human Reference Genomes for Global Populations</span></a></li>
<li>FALCON-Phase&nbsp;<a href="http://globenewswire.com/Tracker?data=4Z9LDdRq3w2zYFQXEFGmz6u-Vrbfh96syfzrQMKhegLRo2PUvk7s3Xz_y1o--NuTLoCQMrHsqOEBUHIL1IPeOmhyf6Eqwdp8dv8xYo9gSVI%3D" target="_blank" title=""><span style="text-decoration: underline;">press release</span></a>&nbsp;and article&nbsp;<a href="http://globenewswire.com/Tracker?data=4Z9LDdRq3w2zYFQXEFGmz9Ts_IJqHWWrKd33x_ldJEU9mSKXpcVTTi9ioY0kVqrbrXHeCKDf4TdPnAoPJaGBK3YeZtYp-nXZacgyPESZ1XboSUZEJ9rIhDyW7bTLL5HN" target="_blank" title=""><span style="text-decoration: underline;">preprint</span></a></li>
<li>PacBio research focus webpage about&nbsp;<a href="http://globenewswire.com/Tracker?data=E-zzUkw4N01KR4muPun47qg4HX8ToDvLS4sX953hLM2wRyQZ2upkLR4WidyXTFDRLWQORpqxnkbD-CNzsOJyIfH8mJPbrLwRf04J4yjuNdem-Fulc8QIT3OCi4wx5LpqgC2ymLE0rYX5UOpbFPBgvA%3D%3D" target="_blank" title=""><span style="text-decoration: underline;">Human Population Genetics</span></a></li>
</ul><p>&nbsp;Ref:&nbsp;https://stockguru.com/2018/10/08/pacific-biosciences-releases-highest-quality-most-contiguous-individual-human-genome-assembly-to-date/</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38063/referee-genome-assembly-quality-scores</guid>
	<pubDate>Sun, 04 Nov 2018 16:44:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38063/referee-genome-assembly-quality-scores</link>
	<title><![CDATA[Referee: Genome assembly quality scores]]></title>
	<description><![CDATA[<p>Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.</p>
<p>We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.</p>
<h3>Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.</h3>
<p>https://github.com/gwct/referee</p><p>Address of the bookmark: <a href="https://gwct.github.io/referee/#" rel="nofollow">https://gwct.github.io/referee/#</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38758/roary-the-pan-genome-pipeline</guid>
	<pubDate>Tue, 22 Jan 2019 05:52:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38758/roary-the-pan-genome-pipeline</link>
	<title><![CDATA[Roary: the Pan Genome Pipeline]]></title>
	<description><![CDATA[<p><span>Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM. Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes.</span></p><p>Address of the bookmark: <a href="https://sanger-pathogens.github.io/Roary/" rel="nofollow">https://sanger-pathogens.github.io/Roary/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39269/ragoo-fast-reference-guided-scaffolding-of-genome-assembly-contigs</guid>
	<pubDate>Wed, 17 Apr 2019 19:45:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39269/ragoo-fast-reference-guided-scaffolding-of-genome-assembly-contigs</link>
	<title><![CDATA[RaGOO: Fast Reference-Guided Scaffolding of Genome Assembly Contigs]]></title>
	<description><![CDATA[<p>Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC:&nbsp;<a href="https://www.biorxiv.org/content/early/2019/01/13/519637">Fast and accurate reference-guided scaffolding of draft genomes</a>.&nbsp;<em>bioRxiv</em>&nbsp;2019.</p>
<p>RaGOO is a tool for coalescing genome assembly contigs into pseudochromosomes via minimap2 alignments to a closely related reference genome. The focus of this tool is on practicality and therefore has the following features:</p>
<ol>
<li>Good performance. On a MacBook Pro using Arabidopsis data, pseudochromosome construction takes less than a minute and the whole pipeline with SV calling takes ~2 minutes.</li>
<li>Intact ordering and orienting of contigs.</li>
<li><a href="https://github.com/malonge/RaGOO/wiki/Breaking-Chimeric-Contigs">Chimeric contig correction</a></li>
<li><a href="https://github.com/malonge/RaGOO/wiki/GFF-File-Lift-Over">GFF lift-over</a></li>
<li><a href="https://github.com/malonge/RaGOO/wiki/Calling-Structural-Variants">Structural variant calling with and integrated version of Assemblytics</a></li>
<li>Confidence scores associated with the grouping, localization, and orientation for each contig.</li>
</ol><p>Address of the bookmark: <a href="https://github.com/malonge/RaGOO" rel="nofollow">https://github.com/malonge/RaGOO</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40549/mgse-mapping-based-genome-size-estimation</guid>
	<pubDate>Fri, 17 Jan 2020 02:11:43 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40549/mgse-mapping-based-genome-size-estimation</link>
	<title><![CDATA[MGSE: Mapping-based Genome Size Estimation]]></title>
	<description><![CDATA[<p>MGSE can harness the power of files generated in genome sequencing projects to predict the genome size. Required are the FASTA file containing a high continuity assembly and a BAM file with all available reads mapped to this assembly. The script construct_cov_file.py (https://doi.org/10.1186/s12864-018-5360-z) allows the generation of a COV file based on the (sorted) BAM file (also possible via MGSE directly). Next, this COV file can be used by MGSE to calculate the coverage in provided reference regions and to calculate the total number of mapped bases. Both values are subjected to the genome size estimation. Providing accurate reference regions is crucial for this genome size estimation.</p><p>Address of the bookmark: <a href="https://github.com/bpucker/MGSE" rel="nofollow">https://github.com/bpucker/MGSE</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41691/genobuntu-package-for-next-generation-sequencing-and-genome-assembly</guid>
	<pubDate>Mon, 18 May 2020 16:47:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41691/genobuntu-package-for-next-generation-sequencing-and-genome-assembly</link>
	<title><![CDATA[Genobuntu: Package for Next Generation Sequencing and Genome Assembly]]></title>
	<description><![CDATA[<div>
<p>Genobuntu is a software package containing more than 70 software and packages oriented towards NGS. In its current version, Genobuntu supports pre assembly tools, genome assemblers as well as post assembly tools.<br><br>Commonly used biological software and example script files for different assembly pipelines have also been provided, where the example script files can be updated to suit one&rsquo;s experimental needs. Genobuntu attempts to reduce the amount of time and energy needed to build software workstations and it can also act as a good teaching source for a class room setting.<br><br>Therefore, Genobuntu offers a well-tailored environment for both novices and experts working in the field of genome assembly.</p>
</div>
<div>
<h3>Features</h3>
<ul>
<li>Velvet</li>
<li>MiB</li>
<li>SSAKE</li>
<li>EULER</li>
<li>VCAKE</li>
<li>ABySS</li>
<li>ALLPATHS</li>
<li>Celera</li>
<li>SHARCGS</li>
<li>Allpaths</li>
<li>IDBA</li>
<li>TAIPAN</li>
<li>Edena</li>
<li>SOAPdenovo</li>
<li>Maq</li>
<li>IDBA-UD</li>
<li>No. of Reads present in the Ref. Seq.</li>
<li>ART NGS Reads Simulator</li>
<li>HiTEC, FASTQC</li>
<li>Minimum Description Length</li>
<li>SOAPaligner</li>
<li>Sequencing Read Archive Toolkit</li>
</ul>
</div><p>Address of the bookmark: <a href="https://sourceforge.net/projects/genobuntu/" rel="nofollow">https://sourceforge.net/projects/genobuntu/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43315/genome-assembly-workshop-2020</guid>
	<pubDate>Wed, 25 Aug 2021 04:30:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43315/genome-assembly-workshop-2020</link>
	<title><![CDATA[Genome Assembly Workshop 2020]]></title>
	<description><![CDATA[<p><span>Our team offers custom bioinformatics services to academic and private organizations. We have a strong academic background with a focus on cutting edge, open source software. We replicate standard analysis pipelines (best practices) when appropriate, and/or develop novel applications and pipelines when needed, however we always emphasize biological interpretation of the data.</span></p>
<p><span>More at&nbsp;https://ucdavis-bioinformatics-training.github.io/</span></p><p>Address of the bookmark: <a href="https://ucdavis-bioinformatics-training.github.io/2020-Genome_Assembly_Workshop/snakemake/snakemake_intro" rel="nofollow">https://ucdavis-bioinformatics-training.github.io/2020-Genome_Assembly_Workshop/snakemake/snakemake_intro</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>