<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38012?offset=0</link>
	<atom:link href="https://bioinformaticsonline.com/related/38012?offset=0" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37962/wtdbg2-a-de-novo-sequence-assembler-for-long-noisy-reads-produced-by-pacbio-or-oxford-nanopore</guid>
	<pubDate>Fri, 19 Oct 2018 08:48:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37962/wtdbg2-a-de-novo-sequence-assembler-for-long-noisy-reads-produced-by-pacbio-or-oxford-nanopore</link>
	<title><![CDATA[Wtdbg2: a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore]]></title>
	<description><![CDATA[<p><span>Wtdbg2 is a&nbsp;</span><em>de novo</em><span>&nbsp;sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb&nbsp;</span><a href="https://www.nature.com/articles/nature25458">Axolotl</a><span>&nbsp;genome at a speed tens of times faster than&nbsp;</span><a href="https://github.com/marbl/canu">CANU</a><span>&nbsp;and&nbsp;</span><a href="https://github.com/PacificBiosciences/FALCON">FALCON</a><span>while producing contigs of comparable base accuracy.</span></p><p>Address of the bookmark: <a href="https://github.com/ruanjue/wtdbg2" rel="nofollow">https://github.com/ruanjue/wtdbg2</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38892/wtdbg2-a-fuzzy-bruijn-graph-approach-to-long-noisy-reads-assembly</guid>
	<pubDate>Mon, 04 Feb 2019 04:53:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38892/wtdbg2-a-fuzzy-bruijn-graph-approach-to-long-noisy-reads-assembly</link>
	<title><![CDATA[wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly]]></title>
	<description><![CDATA[<p><span>Wtdbg2 is a&nbsp;</span><em>de novo</em><span>&nbsp;sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output.&nbsp;</span></p>
<pre>./wtdbg2 -x rs -g 4.6m -t 16 -i reads.fa.gz -fo prefix
./wtpoa-cns -t 16 -i prefix.ctg.lay.gz -fo prefix.ctg.fa</pre><p>Address of the bookmark: <a href="https://github.com/ruanjue/wtdbg2" rel="nofollow">https://github.com/ruanjue/wtdbg2</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38304/lordfast-sensitive-and-fast-alignment-search-tool-for-long-noisy-read-sequencing-data</guid>
	<pubDate>Tue, 27 Nov 2018 04:43:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38304/lordfast-sensitive-and-fast-alignment-search-tool-for-long-noisy-read-sequencing-data</link>
	<title><![CDATA[lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data]]></title>
	<description><![CDATA[<p><span>lordFAST is a sensitive tool for mapping long reads with high error rates. lordFAST is specially designed for aligning reads from PacBio sequencing technology but provides the user the ability to change alignment parameters depending on the reads and application.</span></p>
<p>lordFAST, a novel long-read mapper that is specifically designed to align reads generated by PacBio and potentially other SMS technologies to a reference. lordFAST not only has higher sensitivity than the available alternatives, it is also among the fastest and has a very low memory footprint.</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/vpc-ccg/lordfast" rel="nofollow">https://github.com/vpc-ccg/lordfast</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37563/colormap-correcting-long-reads-by-mapping-short-reads</guid>
	<pubDate>Mon, 20 Aug 2018 14:17:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37563/colormap-correcting-long-reads-by-mapping-short-reads</link>
	<title><![CDATA[CoLoRMap: Correcting Long Reads by Mapping short reads]]></title>
	<description><![CDATA[<p><span>Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap</span></p>
<p><span>ehaghshe@sfu.ca or cedric.chauve@sfu.ca</span></p><p>Address of the bookmark: <a href="https://github.com/sfu-compbio/colormap" rel="nofollow">https://github.com/sfu-compbio/colormap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26587/last</guid>
	<pubDate>Wed, 09 Mar 2016 14:27:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26587/last</link>
	<title><![CDATA[LAST]]></title>
	<description><![CDATA[<p style="text-align: center;"><img src="http://last.cbrc.jp/lastwebfig.png" alt="sketch of  similar regions in sequences" style="border: 0px;"></p>
<p>LAST can:</p>
<ul>
<li>Handle <strong>big</strong> sequence data, e.g:
<ul>
<li>Compare two vertebrate genomes</li>
<li>Align billions of DNA reads to a genome</li>
</ul>
</li>
<li>Indicate the <a href="http://lastweb.cbrc.jp/about.html">reliability</a> of each aligned column.</li>
<li>Use sequence quality data <a href="http://nar.oxfordjournals.org/content/38/7/e100.abstract">properly</a>.</li>
<li>Compare DNA to proteins, with frameshifts.</li>
<li>Compare PSSMs to sequences</li>
<li>Calculate the likelihood of chance similarities between random sequences.</li>
<li>Do split and spliced alignment.</li>
<li><a href="http://last.cbrc.jp/doc/last-train.html">Train</a> alignment parameters for unusual kinds of sequence (e.g. nanopore).</li>
</ul><p>Address of the bookmark: <a href="http://last.cbrc.jp/" rel="nofollow">http://last.cbrc.jp/</a></p>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37776/rhat-a-seed-and-extension-based-noisy-long-read-alignment-tool</guid>
	<pubDate>Sun, 23 Sep 2018 05:12:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37776/rhat-a-seed-and-extension-based-noisy-long-read-alignment-tool</link>
	<title><![CDATA[rHAT: a seed-and-extension-based noisy long read alignment tool]]></title>
	<description><![CDATA[<p><span>rHAT is a seed-and-extension-based noisy long read alignment tool. It is suitable for aligning 3rd generation sequencing reads which are in large read length with relatively high error rate, especially Pacbio's Single Molecule Read-time (SMRT) sequencing reads.</span></p><p>Address of the bookmark: <a href="https://github.com/dfguan/rHAT" rel="nofollow">https://github.com/dfguan/rHAT</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33461/graphmap-a-highly-sensitive-and-accurate-mapper-for-long-error-prone-reads</guid>
	<pubDate>Wed, 07 Jun 2017 04:18:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33461/graphmap-a-highly-sensitive-and-accurate-mapper-for-long-error-prone-reads</link>
	<title><![CDATA[GraphMap - A highly sensitive and accurate mapper for long, error-prone reads]]></title>
	<description><![CDATA[<p>GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html<br><br><strong>Features</strong><br><br>&nbsp;&nbsp;&nbsp; Mapping position agnostic to alignment parameters.<br>&nbsp;&nbsp;&nbsp; Consistently very high sensitivity and precision across different error profiles, rates and sequencing technologies even with default parameters.<br>&nbsp;&nbsp;&nbsp; Circular genome handling to resolve coverage drops near ends of the genome.<br>&nbsp;&nbsp;&nbsp; E-value.<br>&nbsp;&nbsp;&nbsp; Meaningful mapping quality.<br>&nbsp;&nbsp;&nbsp; Various alignment strategies (semiglobal bit-vector and Gotoh, anchored).<br>&nbsp;&nbsp;&nbsp; Overlapping of reads for de novo assembly.<br>&nbsp;&nbsp;&nbsp; Transcriptome mapping through internal construction of a transcriptome from a given genomic reference and a GTF file.<br>&nbsp;&nbsp;&nbsp; ...and much more.<br><br>GraphMap is also used as an overlapper in a new de novo genome assembly project called Ra (https://github.com/mariokostelac/ra-integrate).<br>Ra attempts to create de novo assemblies from raw nanopore and PacBio reads without requiring error correction, for which a highly sensitive overlapper is required.<br><br>Currently, development of a new spliced-alignment mode for mapping RNA-seq reads is under way.<br>Description of the current effort as well as how to reach the experimental implementation can be found here: doc/rnaseq.md.</p><p>Address of the bookmark: <a href="https://github.com/isovic/graphmap" rel="nofollow">https://github.com/isovic/graphmap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36755/minialign-fast-and-accurate-alignment-tool-for-pacbio-and-nanopore-long-reads</guid>
	<pubDate>Thu, 24 May 2018 08:33:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36755/minialign-fast-and-accurate-alignment-tool-for-pacbio-and-nanopore-long-reads</link>
	<title><![CDATA[minialign: fast and accurate alignment tool for PacBio and Nanopore long reads]]></title>
	<description><![CDATA[Minialign is a little bit fast and moderately accurate nucleotide sequence alignment tool designed for PacBio and Nanopore long reads. It is built on three key algorithms, minimizer-based index of the minimap overlapper, array-based seed chaining, and SIMD-parallel Smith-Waterman-Gotoh extension.<p>Address of the bookmark: <a href="https://github.com/ocxtal/minialign" rel="nofollow">https://github.com/ocxtal/minialign</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>