<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads?</link>
	<atom:link href="https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>