<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38886?offset=10</link>
	<atom:link href="https://bioinformaticsonline.com/related/38886?offset=10" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</guid>
	<pubDate>Thu, 25 Oct 2018 10:46:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</link>
	<title><![CDATA[QUAST-LG: Versatile genome assembly evaluation]]></title>
	<description><![CDATA[<p>QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.</p>
<h4>AVAILABILITY AND IMPLEMENTATION:</h4>
<p>http://cab.spbu.ru/software/quast-lg</p><p>Address of the bookmark: <a href="http://cab.spbu.ru/software/quast-lg/" rel="nofollow">http://cab.spbu.ru/software/quast-lg/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41501/hicanu-accurate-assembly-of-segmental-duplications-satellites-and-allelic-variants-from-high-fidelity-long-reads</guid>
	<pubDate>Fri, 27 Mar 2020 22:49:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41501/hicanu-accurate-assembly-of-segmental-duplications-satellites-and-allelic-variants-from-high-fidelity-long-reads</link>
	<title><![CDATA[HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads]]></title>
	<description><![CDATA[<p><span>HiCanu, a significant modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering.&nbsp;</span></p>
<p>More at&nbsp;<a href="https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3?fbclid=IwAR2PaN4GLjvAZpWmCE2q0EWk2dtwY7wiKxVlXn9PPG7OBSP06PP2gcCrv3A">https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3</a></p><p>Address of the bookmark: <a href="https://github.com/marbl/canu" rel="nofollow">https://github.com/marbl/canu</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</guid>
	<pubDate>Mon, 31 Jan 2022 07:18:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</link>
	<title><![CDATA[Short-read assembly using Spades !]]></title>
	<description><![CDATA[<h2 id="short-read-assembly-a-comparison">If we only had Illumina reads, we could also assemble these using the tool Spades.</h2><p>You can try this here, or try it later on your own data.</p><h2 id="get-data">Get data</h2><p>We will use the same Illumina data as we used above:</p><ul>
<li>illumina_R1.fastq.gz: the Illumina forward reads</li>
<li>illumina_R2.fastq.gz: the Illumina reverse reads</li>
</ul><h2 id="assemble">Assemble</h2><p>Run Spades:</p><div><pre>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o spades_assembly_all_illumina
</pre></div><ul>
<li><code>-1</code>&nbsp;is input file of forward reads</li>
<li><code>-2</code>&nbsp;is input file of reverse reads</li>
<li><code>--careful</code>&nbsp;minimizes mismatches and short indels</li>
<li><code>--cov-cutoff auto</code>&nbsp;computes the coverage threshold (rather than the default setting, &ldquo;off&rdquo;)</li>
<li><code>-o</code>&nbsp;is the output directory</li>
</ul><h2 id="results">Results</h2><p>Move into the output directory and look at the contigs:</p><div><pre>infoseq contigs.fasta</pre></div>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</guid>
	<pubDate>Tue, 26 Apr 2016 03:38:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</link>
	<title><![CDATA[ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies]]></title>
	<description><![CDATA[<p>Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.</p>
<p>More at&nbsp;http://www.ncbi.nlm.nih.gov/pubmed/23303509</p><p>Address of the bookmark: <a href="http://sc932.github.io/ALE/about.html" rel="nofollow">http://sc932.github.io/ALE/about.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27438/hagfish-assess-an-assembly-through-creative-use-of-coverage-plots</guid>
	<pubDate>Fri, 20 May 2016 19:08:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27438/hagfish-assess-an-assembly-through-creative-use-of-coverage-plots</link>
	<title><![CDATA[Hagfish - assess an assembly through creative use of coverage plots]]></title>
	<description><![CDATA[<p>Hagfish is a tool that is to be used in data analysis of Next Generation Sequencing (NGS) experiments. Hagfish builds on the concept of coverage plots and aims to assist (amongst others) in quality control of&nbsp;<em style="font-size: 12.8px;">de novo</em>&nbsp;genome assembly or identification of structural variation in a genome re-sequencing experiment.</p>
<p>Hagfish requires a reference sequence and a&nbsp;<span>paired end</span>&nbsp;re-sequencing data set. Hagfish has more power the larger the insert size of the paired end library is.</p>
<p>Quick links:&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Install">Installation</a>,<a href="https://github.com/mfiers/hagfish/wiki/Operation">Operation</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/ReadMappers">Read mappers</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Scripts">Hagfish scripts</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Plots">Hagfish plots</a></p><p>Address of the bookmark: <a href="https://github.com/mfiers/hagfish" rel="nofollow">https://github.com/mfiers/hagfish</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31139/pbsuite-software-for-long-read-sequencing-data-from-pacbio</guid>
	<pubDate>Mon, 27 Feb 2017 09:54:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31139/pbsuite-software-for-long-read-sequencing-data-from-pacbio</link>
	<title><![CDATA[PBSuite: Software for Long-Read Sequencing Data from PacBio]]></title>
	<description><![CDATA[<p><span>PBJelly - the genome upgrading tool.&nbsp;</span><br><span>PBHoney - the structural variation discovery tool&nbsp;</span><br><br><span>Both are contained within the PBSuite code found in downloads.</span><br><br><span>----- PBJelly -----</span><br><span>Read The Paper&nbsp;</span><br><a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0047768" target="_blank">http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0047768</a><br><br><span>PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.&nbsp;</span><br><br><span>----- PBHoney -----</span><br><span>Read The Paper</span><br><a href="http://www.biomedcentral.com/1471-2105/15/180/abstract" target="_blank">http://www.biomedcentral.com/1471-2105/15/180/abstract</a><br><br><span>PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.</span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/pb-jelly/" rel="nofollow">https://sourceforge.net/projects/pb-jelly/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32709/cabog-celera-assembler-with-best-overlap-graph</guid>
	<pubDate>Mon, 15 May 2017 05:04:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32709/cabog-celera-assembler-with-best-overlap-graph</link>
	<title><![CDATA[CABOG: Celera Assembler with Best Overlap Graph]]></title>
	<description><![CDATA[<p>CABOG (Celera Assembler with Best Overlap Graph) is scientific software for&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/24/24/2818.abstract">DNA research</a>. CABOG has been a critical component of many genome sequencing projects. CABOG operates on small genomes such as bacterial as well as large genomes such as mammalian. CABOG is an extension of the Celera Assembler software that was originally developed at&nbsp;<a href="http://www.celera.com/">Celera</a>&nbsp;for the 2001 publication of the first draft human genome sequence. The software was released to the public domain in 2004. Its open source&nbsp;<a href="http://wgs-assembler.sf.net/">repository</a>&nbsp;on Source Forge is an internet resource for scientists around the world.&nbsp;</p>
<p>CABOG is one of many software programs called genome assemblers. These programs exist to overcome the fundamental limitation of all sequencing machines, namely, that they read out very few DNA letters at a time. These programs reconstruct genomes that are billions of letters long from the hundreds of letters per read that modern sequencers provide. What these programs do is often described as a scaled up version of a family solving a jigsaw puzzle.</p>
<p>The CABOG software was the first to accomplish many scientific goals. It was the first to assemble the genome of a multicellular organism (<em>Drosophila melanogaster</em>, 2000). It was the first to assemble both parental haplotypes of one human genome (J. Craig Venter, 2007). It was the first to assemble environmental sequence from the oceans (Sargasso Sea in 2004 and Global Ocean Sampling in 2007). It was first to combine reads from first-generation Sanger sequencing machines and second-generation pyrosequencing machines (Marine microbes, 2006). Today, CABOG is one of the leading assembly programs for data sets that include paired end data from the Roche 454 line of sequencing machines.</p><p>Address of the bookmark: <a href="http://www.jcvi.org/cms/research/projects/cabog/overview/" rel="nofollow">http://www.jcvi.org/cms/research/projects/cabog/overview/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39624/cogent-a-tool-for-reconstructing-the-coding-genome-using-high-quality-full-length-transcriptome-sequences</guid>
	<pubDate>Tue, 18 Jun 2019 05:33:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39624/cogent-a-tool-for-reconstructing-the-coding-genome-using-high-quality-full-length-transcriptome-sequences</link>
	<title><![CDATA[Cogent: a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences.]]></title>
	<description><![CDATA[<div id="yui_3_14_1_1_1560853173251_3865">Cogent is a tool that identifies gene&nbsp;families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check&nbsp;assemblies&nbsp;for the presence of&nbsp;these known coding sequences.</div>
<div>&nbsp;</div>
<div>
<p>Cogent is a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences. It is designed to be used on&nbsp;<a href="https://github.com/PacificBiosciences/cDNA_primer/wiki">Iso-Seq data</a>&nbsp;and in cases where there is no reference genome or the ref genome is highly incomplete.</p>
<p>See a&nbsp;<a href="https://www.dropbox.com/s/mn6hwhguh0pqceu/20160106_Cogent_developers_conference_slides_Cuttlefish.pdf?dl=0">recent presentation</a>&nbsp;on Cogent being applied to the Cuttlefish Iso-Seq data.</p>
<p><a href="https://www.dropbox.com/s/kz0gi7qg0w82k9a/20161026_Cogent_manuscript_forGitHub.pdf?dl=0">Cogent preliminary draft paper (updated 2016Dec version)</a>,&nbsp;<a href="https://www.dropbox.com/s/37412o8glvnfhf9/20161026_Cogent_ManuscriptPlusSupplement_forGitHub.pdf?dl=0">Supplementary</a></p>
<p>Please see&nbsp;<a href="https://github.com/Magdoll/Cogent/wiki">wiki</a>&nbsp;for details on usage.</p>
</div><p>Address of the bookmark: <a href="https://github.com/Magdoll/Cogent" rel="nofollow">https://github.com/Magdoll/Cogent</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37211/jbrowse-embeddable-genome-browser-built-completely-with-javascript-and-html5</guid>
	<pubDate>Fri, 29 Jun 2018 09:19:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37211/jbrowse-embeddable-genome-browser-built-completely-with-javascript-and-html5</link>
	<title><![CDATA[JBrowse: Embeddable genome browser built completely with JavaScript and HTML5]]></title>
	<description><![CDATA[JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl.

Headline Features:
Fast, smooth scrolling and zooming. Explore your genome with unparalleled speed.
Scales easily to multi-gigabase genomes and deep-coverage sequencing.
Quickly open and view data files on your computer without uploading them to any server.
Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with either .tbi or .idx index), REST, and more.  BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed.
Includes an optional “faceted” track selector (see demo) suitable for large installations with thousands of tracks.
Very light server resource requirements. In fact, JBrowse has no back-end server code, just tools for formatting data files to be read directly over HTTP. Serve huge datasets from a single low-cost cloud instance.
Can run as a stand-alone app on OSX and Windows using the Electron platform
Highly extensible plugin architecture, with a large plugin registry of existing examples here https://gmod.github.io/jbrowse-registry

https://jbrowse.org/<p>Address of the bookmark: <a href="https://github.com/GMOD/jbrowse" rel="nofollow">https://github.com/GMOD/jbrowse</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>