<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/34931?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/34931?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36533/mecat-fast-mapping-error-correction-and-de-novo-assembly-for-single-molecule-sequencing-reads</guid>
	<pubDate>Fri, 11 May 2018 05:07:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36533/mecat-fast-mapping-error-correction-and-de-novo-assembly-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads]]></title>
	<description><![CDATA[<p>MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assemblying large genomes. For example, on a 32-thread computer with 2.0 GHz CPU , MECAT takes 9.5 days to assemble a human genome based on 54x SMRT data, which is 40 times faster than the current&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>. MECAT performance were compared with&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>,&nbsp;<a href="https://github.com/PacificBiosciences/falcon">FALCON</a>&nbsp;and&nbsp;<a href="http://canu.readthedocs.io/en/latest/">Canu(v1.3)</a>&nbsp;in five real datasets. The quality of assembled contigs produced by MECAT is the same or better than that of the&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>&nbsp;and&nbsp;<a href="https://github.com/PacificBiosciences/falcon">FALCON</a>.&nbsp;</p>
<p>https://www.nature.com/articles/nmeth.4432</p><p>Address of the bookmark: <a href="https://github.com/xiaochuanle/MECAT" rel="nofollow">https://github.com/xiaochuanle/MECAT</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</guid>
	<pubDate>Mon, 20 Aug 2018 04:08:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</link>
	<title><![CDATA[FinisherSC:a repeat-aware tool for upgrading de novo assembly using long reads]]></title>
	<description><![CDATA[<p><br>Here is the command to run the tool:</p>
<pre><code>python finisherSC.py destinedFolder mummerPath
</code></pre>
<p>If you are running on server computer and would like to use multiple threads, then the following commands can generate 20 threads to run FinisherSC.</p>
<pre><code>python finisherSC.py -par 20 destinedFolder mummerPath
</code></pre>
<p>Sometimes, if the names of raw reads and contigs consists of special characters/formats, FinisherSC/MUMmer may not parse them correctly. In that case, you want to have a quick renaming of the names of contigs/reads in contigs.fasta or raw_reads.fasta using the following command.</p>
<pre><code>    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' raw_reads.fasta &gt; newRaw_reads.fasta
    cp newRaw_reads.fasta raw_reads.fasta
    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' contigs.fasta &gt; newContigs.fasta
    cp newContigs.fasta contigs.fasta</code></pre><p>Address of the bookmark: <a href="https://github.com/kakitone/finishingTool" rel="nofollow">https://github.com/kakitone/finishingTool</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Tue, 02 Apr 2019 21:54:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44529/contigextender-a-new-approach-to-improving-de-novo-sequence-assembly-for-viral-metagenomics-data</guid>
	<pubDate>Wed, 08 May 2024 07:32:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44529/contigextender-a-new-approach-to-improving-de-novo-sequence-assembly-for-viral-metagenomics-data</link>
	<title><![CDATA[ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data]]></title>
	<description><![CDATA[<p dir="auto">ContigExtender, was developed to extend contigs, complementing de novo assembly. ContigExtender employs a novel recursive Overlap Layout Candidates (r-OLC) strategy that explores multiple extending paths to achieve longer and highly accurate contigs. ContigExtender is effective for extending contigs significantly in in silico synthesized and real metagenomics datasets.</p>
<p dir="auto">More at&nbsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953547/</p>
<p dir="auto"><a href="https://camo.githubusercontent.com/72dc78177cd84dd0c667a2922a9fd984fb548b5ec94b11f9a547211a4adba3b1/68747470733a2f2f692e696d6775722e636f6d2f7734516944496a2e706e67" target="_blank"><img src="https://camo.githubusercontent.com/72dc78177cd84dd0c667a2922a9fd984fb548b5ec94b11f9a547211a4adba3b1/68747470733a2f2f692e696d6775722e636f6d2f7734516944496a2e706e67" alt="extension process" title="extension process" style="border: 0px;"></a></p><p>Address of the bookmark: <a href="https://github.com/dengzac/contig-extender" rel="nofollow">https://github.com/dengzac/contig-extender</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/32713/salzberg-lab</guid>
  <pubDate>Mon, 15 May 2017 05:14:01 -0500</pubDate>
  <link></link>
  <title><![CDATA[Salzberg lab]]></title>
  <description><![CDATA[
<p>We are a computational biology lab that develops novel methods for analysis of DNA and RNA sequences. Our research includes software for aligning and assembling RNA-seq data, whole-genome assembly, and microbiome analysis. We work closely with biomedical scientists to apply these methods to current problems arising in a broad spectrum of biological and medical research areas. We’re also part of the Center for Computational Biology, a group of 20+ faculty members and their labs at Johns Hopkins working on computational, statistical, and mathematical methods that can turn massive genomic data sets into biologically and clinically useful information.</p>

<p>https://salzberg-lab.org/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36476/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Fri, 04 May 2018 19:16:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36476/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for long and noisy reads, such as those produced by PacBio and Oxford Nanopore Technologies. The algorithm uses an A-Bruijn graph to find the overlaps between reads and does not require them to be error-corrected. After the initial assembly, Flye performs an extra repeat classification and analysis step to improve the structural accuracy of the resulting sequence. The package also includes a polisher module, which produces the final assembly of high nucleotide-level quality.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36592/lachesis-genome-assembly-with-hi-c-based-contact-probability-maps-lachesis</guid>
	<pubDate>Mon, 14 May 2018 04:26:30 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36592/lachesis-genome-assembly-with-hi-c-based-contact-probability-maps-lachesis</link>
	<title><![CDATA[LACHESIS: Genome Assembly with Hi-C-based Contact Probability Maps (LACHESIS)]]></title>
	<description><![CDATA[<p>LACHESIS is method that exploits contact probability map data (e.g. from Hi-C) for chromosome-scale&nbsp;<em>de novo</em>&nbsp;genome assembly.</p>
<p>Further information about LACHESIS, including source code, documentation and a user's guide are available at:&nbsp;<a href="http://shendurelab.github.io/LACHESIS/">http://shendurelab.github.io/LACHESIS</a>.</p>
<p>Manuscript describing LACHESIS was published as: Burton JN#, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J#.&nbsp;<em>Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.</em>&nbsp;Nature Biotechnology 2013 Dec;31(12):1119-25. doi:&nbsp;<a href="http://dx.doi.org/10.1038/nbt.2727">10.1038/nbt.272</a>. PubMed PMID:&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pubmed/24185095">24185095</a>.</p>
<p>&nbsp;</p>
<p>http://shendurelab.github.io/LACHESIS/</p><p>Address of the bookmark: <a href="http://shendurelab.github.io/LACHESIS/" rel="nofollow">http://shendurelab.github.io/LACHESIS/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/1926</guid>
	<pubDate>Sun, 11 Aug 2013 11:42:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/1926</link>
	<title><![CDATA[Want to Know which genome assembler rule the world ?]]></title>
	<description><![CDATA[<p><span><strong>Assemblathon 2</strong>: evaluating de novo methods of genome assembly&nbsp;</span></p><p><span><a href="http://www.gigasciencejournal.com/content/2/1/10/abstract">http://www.gigasciencejournal.com/content/2/1/10/abstract</a></span></p><p><span><a href="http://blogs.nature.com/news/2013/07/genome-assembly-contest-prompts-soul-searching.html">http://blogs.nature.com/news/2013/07/genome-assembly-contest-prompts-soul-searching.html</a></span></p><p><a href="http://assemblathon.org/post/44431915644/feedback-and-analysis-of-the-assemblathon-2-p">http://assemblathon.org/post/44431915644/feedback-and-analysis-of-the-assemblathon-2-p</a></p><p>&nbsp;</p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</guid>
	<pubDate>Mon, 10 Jul 2017 05:56:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</link>
	<title><![CDATA[Omega2: metagenome assembly pipeline]]></title>
	<description><![CDATA[<p><span>Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.</span></p><p>Address of the bookmark: <a href="http://omega.omicsbio.org/" rel="nofollow">http://omega.omicsbio.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>