<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: CANU: Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing.]]></title>
	<link>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing?</link>
	<atom:link href="https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing</guid>
	<pubDate>Tue, 26 Apr 2016 11:38:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing</link>
	<title><![CDATA[CANU: Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing.]]></title>
	<description><![CDATA[<p>Canu is a fork of the&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page" title="Celera Assembler">Celera Assembler</a>&nbsp;designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered.</p>
<p>Canu is a hierachical assembly pipeline which runs in four steps:</p>
<ul>
<li>Detect overlaps in high-noise sequences using&nbsp;<a href="https://github.com/marbl/MHAP" title="MHAP">MHAP</a></li>
<li>Generate corrected sequence consensus</li>
<li>Trim corrected sequences</li>
<li>Assemble trimmed corrected sequences</li>
</ul>
<p>Read the&nbsp;<a href="http://canu.readthedocs.org/" title="docs">documentation</a></p>
<p>New release https://github.com/marbl/canu/releases</p><p>Address of the bookmark: <a href="https://github.com/marbl/canu" rel="nofollow">https://github.com/marbl/canu</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink='true'>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing#item-annotation-3463</guid>
	<pubDate>Mon, 06 Aug 2018 09:49:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing#item-annotation-3463</link>
	<title><![CDATA[Comment by Rahul Nayak]]></title>
	<description><![CDATA[<p>➜ bin git:(master) ✗ ./canu</p>
<p>usage: canu [-version] [-citation] \<br> [-correct | -trim | -assemble | -trim-assemble] \<br> [-s &lt;assembly-specifications-file&gt;] \<br> -p &lt;assembly-prefix&gt; \<br> -d &lt;assembly-directory&gt; \<br> genomeSize=&lt;number&gt;[g|m|k] \<br> [other-options] \<br> [-pacbio-raw |<br> -pacbio-corrected |<br> -nanopore-raw |<br> -nanopore-corrected] file1 file2 ...</p>
<p>example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz</p>
<p><br> To restrict canu to only a specific stage, use:<br> -correct - generate corrected reads<br> -trim - generate trimmed reads<br> -assemble - generate an assembly<br> -trim-assemble - generate trimmed reads and then assemble them</p>
<p>The assembly is computed in the -d &lt;assembly-directory&gt;, with output files named<br> using the -p &lt;assembly-prefix&gt;. This directory is created if needed. It is not<br> possible to run multiple assemblies in the same directory.</p>
<p>The genome size should be your best guess of the haploid genome size of what is being<br> assembled. It is used primarily to estimate coverage in reads, NOT as the desired<br> assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'</p>
<p>Some common options:<br> useGrid=string<br> - Run under grid control (true), locally (false), or set up for grid control<br> but don't submit any jobs (remote)<br> rawErrorRate=fraction-error<br> - The allowed difference in an overlap between two raw uncorrected reads. For lower<br> quality reads, use a higher number. The defaults are 0.300 for PacBio reads and<br> 0.500 for Nanopore reads.<br> correctedErrorRate=fraction-error<br> - The allowed difference in an overlap between two corrected reads. Assemblies of<br> low coverage or data with biological differences will benefit from a slight increase<br> in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.<br> gridOptions=string<br> - Pass string to the command used to submit jobs to the grid. Can be used to set<br> maximum run time limits. Should NOT be used to set memory limits; Canu will do<br> that for you.<br> minReadLength=number<br> - Ignore reads shorter than 'number' bases long. Default: 1000.<br> minOverlapLength=number<br> - Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500.<br> A full list of options can be printed with '-options'. All options can be supplied in<br> an optional sepc file with the -s option.</p>
<p>Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.<br> Reads are specified by the technology they were generated with, and any processing performed:<br> -pacbio-raw &lt;files&gt; Reads are straight off the machine.<br> -pacbio-corrected &lt;files&gt; Reads have been corrected.<br> -nanopore-raw &lt;files&gt;<br> -nanopore-corrected &lt;files&gt;</p>
<p>Complete documentation at http://canu.readthedocs.org/en/latest/</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink='true'>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing#item-annotation-3376</guid>
	<pubDate>Tue, 22 May 2018 07:57:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27090/canu-assembling-large-genomes-with-single-molecule-sequencing-and-locality-sensitive-hashing#item-annotation-3376</link>
	<title><![CDATA[Comment by Poonam Mahapatra]]></title>
	<description><![CDATA[<p><a href="https://canu.readthedocs.io/en/latest/">Canu</a>&nbsp;is one of the best de novo assemblers available for long reads - it&rsquo;s a fork and updated version of the Celera assembler that was used to assemble the human genome.</p>
<p>It is quite a complex beast that has HPC integration built in - though you can turn this off. However, large assembly jobs are best run in parallel, making HPC integration essential. This can get tough if your cluster has a non-standard configuration.</p>
<p>Run canu without any options to get help:</p>
<div>
<div>
<pre><code>canu
</code></pre>
</div>
</div>
<p>This produces:</p>
<div>
<div>
<pre><code>usage: canu [-version] \
            [-correct | -trim | -assemble | -trim-assemble] \
            [-s &lt;assembly-specifications-file&gt;] \
             -p &lt;assembly-prefix&gt; \
             -d &lt;assembly-directory&gt; \
             genomeSize=&lt;number&gt;[g|m|k] \
            [other-options] \
            [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq

  By default, all three stages (correct, trim, assemble) are computed.
  To compute only a single stage, use:
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the (created) -d &lt;assembly-directory&gt;, with most
  files named using the -p &lt;assembly-prefix&gt;.

  The genome size is your best guess of the genome size of what is being assembled.
  It is used mostly to compute coverage in reads.  Fractional values are allowed: '4.7m'
  is the same as '4700k' and '4700000'

  A full list of options can be printed with '-options'.  All options
  can be supplied in an optional sepc file.

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed
  with gz, bz2 or xz.  Reads are specified by the technology they were
  generated with:
    -pacbio-raw         &lt;files&gt;
    -pacbio-corrected   &lt;files&gt;
    -nanopore-raw       &lt;files&gt;
    -nanopore-corrected &lt;files&gt;

Complete documentation at http://canu.readthedocs.org/en/latest/
</code></pre>
</div>
</div>
<p>Canu has three stages which it runs in order:</p>
<ul>
<li>Correct</li>
<li>Trim</li>
<li>Assemble</li>
</ul>
<p>By default canu runs these one after the other, but they can be run individually.</p>
<p>An example &ldquo;full pipeline&rdquo; command would be:</p>
<div>
<div>
<pre><code>canu <span>-p</span> meta <span>\</span>
     <span>-d</span> meta <span>\</span>
     <span>genomeSize</span><span>=</span>40m <span>\</span>
     <span>useGrid</span><span>=</span><span>false</span> <span>\</span>
     <span>-nanopore-raw</span> /vol_b/public_data/minion_brown_metagenome/brown_metagenome.2D.10.fasta
</code></pre>
</div>
</div>
<p>This puts output in directory meta with prefix &ldquo;meta&rdquo;. We estimate the genome size, tell canu NOT to use HPC (as we don&rsquo;t have one for porecamp) and give it some ONT data as fasta.</p>
<p>This runs pretty quickly but doesn&rsquo;t assemble anything. It&rsquo;s a low coverage synthetic metagenome, so no surprise. It does produce corrected reads though! These could be used in the metagenomics practical (hint!)</p>
<p>Now try the E coli subset:</p>
<div>
<div>
<pre><code>canu <span>-p</span> ecoli      
     <span>-d</span> ecoli      
     <span>genomeSize</span><span>=</span>4.8m      
     <span>useGrid</span><span>=</span><span>false</span>      
     <span>-nanopore-raw</span> /vol_b/public_data/minion_ecoli_sample/ecoli_sample.template.fasta
</code></pre>
</div>
</div>
<p>This one will take a bit longer ;)</p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>

</channel>
</rss>