<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/37737?offset=20</link>
	<atom:link href="https://bioinformaticsonline.com/related/37737?offset=20" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</guid>
	<pubDate>Mon, 20 Aug 2018 04:08:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</link>
	<title><![CDATA[FinisherSC:a repeat-aware tool for upgrading de novo assembly using long reads]]></title>
	<description><![CDATA[<p><br>Here is the command to run the tool:</p>
<pre><code>python finisherSC.py destinedFolder mummerPath
</code></pre>
<p>If you are running on server computer and would like to use multiple threads, then the following commands can generate 20 threads to run FinisherSC.</p>
<pre><code>python finisherSC.py -par 20 destinedFolder mummerPath
</code></pre>
<p>Sometimes, if the names of raw reads and contigs consists of special characters/formats, FinisherSC/MUMmer may not parse them correctly. In that case, you want to have a quick renaming of the names of contigs/reads in contigs.fasta or raw_reads.fasta using the following command.</p>
<pre><code>    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' raw_reads.fasta &gt; newRaw_reads.fasta
    cp newRaw_reads.fasta raw_reads.fasta
    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' contigs.fasta &gt; newContigs.fasta
    cp newContigs.fasta contigs.fasta</code></pre><p>Address of the bookmark: <a href="https://github.com/kakitone/finishingTool" rel="nofollow">https://github.com/kakitone/finishingTool</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40516/nextdenovo-string-graph-based-de-novo-assembler-for-tgs-long-reads</guid>
	<pubDate>Sun, 05 Jan 2020 04:08:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40516/nextdenovo-string-graph-based-de-novo-assembler-for-tgs-long-reads</link>
	<title><![CDATA[NextDenovo: string graph-based de novo assembler for TGS long reads]]></title>
	<description><![CDATA[<p>NextDenovo is a string graph-based<span>&nbsp;</span><em>de novo</em><span>&nbsp;</span>assembler for TGS long reads. It uses a "correct-then-assemble" strategy similar to canu, but requires significantly less computing resources and storages. After assembly, the per-base error rate is about 97-98%, to further improve single base accuracy, please use<span>&nbsp;</span><a href="https://github.com/Nextomics/NextPolish">NextPolish</a>.</p>
<p>NextDenovo contains two core modules: NextCorrect and NextGraph. NextCorrect can be used to correct TGS long reads with approximately 15% sequencing errors, and NextGraph can be used to construct a string graph with corrected reads. It also contains a modified version of<span>&nbsp;</span><a href="https://github.com/lh3/minimap2">minimap2</a><span>&nbsp;</span>for adapting input and output and producing more sensitive and accurate dovetail overlaps, and some useful utilities (see<span>&nbsp;</span><a href="https://github.com/Nextomics/NextDenovo/blob/master/doc/UTILITY.md">here</a><span>&nbsp;</span>for more details).</p><p>Address of the bookmark: <a href="https://github.com/Nextomics/NextDenovo" rel="nofollow">https://github.com/Nextomics/NextDenovo</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41501/hicanu-accurate-assembly-of-segmental-duplications-satellites-and-allelic-variants-from-high-fidelity-long-reads</guid>
	<pubDate>Fri, 27 Mar 2020 22:49:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41501/hicanu-accurate-assembly-of-segmental-duplications-satellites-and-allelic-variants-from-high-fidelity-long-reads</link>
	<title><![CDATA[HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads]]></title>
	<description><![CDATA[<p><span>HiCanu, a significant modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering.&nbsp;</span></p>
<p>More at&nbsp;<a href="https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3?fbclid=IwAR2PaN4GLjvAZpWmCE2q0EWk2dtwY7wiKxVlXn9PPG7OBSP06PP2gcCrv3A">https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3</a></p><p>Address of the bookmark: <a href="https://github.com/marbl/canu" rel="nofollow">https://github.com/marbl/canu</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31137/finishersc-a-repeat-aware-and-scalable-tool-for-upgrading-de-novo-assembly-using-long-reads</guid>
	<pubDate>Mon, 27 Feb 2017 09:49:45 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31137/finishersc-a-repeat-aware-and-scalable-tool-for-upgrading-de-novo-assembly-using-long-reads</link>
	<title><![CDATA[FinisherSC: a repeat-aware and scalable tool for upgrading de novo assembly using long reads]]></title>
	<description><![CDATA[<p><span>FinisherSC, a repeat-aware and scalable tool for upgrading&nbsp;</span><em>de novo</em><span>&nbsp;assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high concordance.</span></p><p>Address of the bookmark: <a href="http://kakitone.github.io/finishingTool/" rel="nofollow">http://kakitone.github.io/finishingTool/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32190/dbg2olcefficient-assembly-of-large-genomes-using-long-erroneous-reads-of-the-third-generation-sequencing-technologies</guid>
	<pubDate>Wed, 19 Apr 2017 10:09:51 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32190/dbg2olcefficient-assembly-of-large-genomes-using-long-erroneous-reads-of-the-third-generation-sequencing-technologies</link>
	<title><![CDATA[DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies]]></title>
	<description><![CDATA[<p>DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies</p>
<p>Our work is published in Scientific Reports:</p>
<p>Ye, C. et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies. Sci. Rep. 6, 31900; doi: 10.1038/srep31900 (2016).</p>
<p><a href="http://www.nature.com/articles/srep31900">http://www.nature.com/articles/srep31900</a></p>
<p>The manual can be downloaded from:</p>
<p><a href="https://github.com/yechengxi/DBG2OLC/raw/master/Manual.docx">https://github.com/yechengxi/DBG2OLC/raw/master/Manual.docx</a></p>
<p>To use precompiled versions,please go to:</p>
<p><a href="https://github.com/yechengxi/DBG2OLC/tree/master/compiled">https://github.com/yechengxi/DBG2OLC/tree/master/compiled</a></p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/yechengxi/DBG2OLC" rel="nofollow">https://github.com/yechengxi/DBG2OLC</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36478/the-marvel-assembler</guid>
	<pubDate>Fri, 04 May 2018 19:18:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36478/the-marvel-assembler</link>
	<title><![CDATA[The MARVEL assembler]]></title>
	<description><![CDATA[<p><span>MARVEL consists of a set of tools that facilitate the overlapping, patching, correction and assembly of noisy (not so noisy ones as well) long reads.</span></p>
<p>The assembly process can be summarized as follows:</p>
<ol>
<li>overlap</li>
<li>patch reads</li>
<li>overlap (again)</li>
<li>scrubbing</li>
<li>assembly graph construction and touring</li>
<li>optional read correction</li>
<li>fasta file creation</li>
</ol><p>Address of the bookmark: <a href="https://github.com/schloi/MARVEL" rel="nofollow">https://github.com/schloi/MARVEL</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37759/pandaseq-is-a-program-to-align-illumina-reads-optionally-with-pcr-primers-embedded-in-the-sequence-and-reconstruct-an-overlapping-sequence</guid>
	<pubDate>Fri, 21 Sep 2018 10:19:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37759/pandaseq-is-a-program-to-align-illumina-reads-optionally-with-pcr-primers-embedded-in-the-sequence-and-reconstruct-an-overlapping-sequence</link>
	<title><![CDATA[PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.]]></title>
	<description><![CDATA[<p>Development packages for zlib and libbz2 are needed, as well as a standard compiler environment. On Ubuntu, this can be installed via:</p>
<pre><code>sudo apt-get install build-essential libtool automake zlib1g-dev libbz2-dev pkg-config
</code></pre>
<p>On MacOS, the Apple Developer tools and Fink (or MacPorts or Brew) must be installed, then:</p>
<pre><code>sudo fink install bzip2-dev pkgconfig</code></pre><p>Address of the bookmark: <a href="https://github.com/neufeld/pandaseq" rel="nofollow">https://github.com/neufeld/pandaseq</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32943/npscarf-scaffolding-and-completing-assemblies-in-real-time-fashion</guid>
	<pubDate>Tue, 23 May 2017 04:53:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32943/npscarf-scaffolding-and-completing-assemblies-in-real-time-fashion</link>
	<title><![CDATA[npScarf: Scaffolding and Completing Assemblies in Real-time Fashion]]></title>
	<description><![CDATA[<p><em>npScarf</em>&nbsp;(jsa.np.npscarf) is a program that scaffolds and completes draft genomes assemblies in real-time with Oxford Nanopore sequencing. The pipeline can run on a computing cluster as well as on a laptop computer for microbial datasets. It also facilitates the real-time analysis of positional information such as gene ordering and the detection of genes from mobile elements (plasmids and genomic islands).</p>
<p>Complete paper at&nbsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321748/</p><p>Address of the bookmark: <a href="https://github.com/mdcao/npScarf" rel="nofollow">https://github.com/mdcao/npScarf</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43722/crossmap-program-for-genome-coordinates-conversion-between-different-assemblies</guid>
	<pubDate>Tue, 25 Jan 2022 17:59:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43722/crossmap-program-for-genome-coordinates-conversion-between-different-assemblies</link>
	<title><![CDATA[CrossMap: program for genome coordinates conversion between different assemblies]]></title>
	<description><![CDATA[<p><span>CrossMap is a program for genome coordinates conversion between&nbsp;</span><em>different assemblies</em><span>&nbsp;(such as&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/assembly/2928/">hg18 (NCBI36)</a><span>&nbsp;&lt;=&gt;&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/assembly/2758/">hg19 (GRCh37)</a><span>). It supports commonly used file formats including&nbsp;</span><a href="https://samtools.github.io/hts-specs/SAMv1.pdf">BAM</a><span>,&nbsp;</span><a href="https://en.wikipedia.org/wiki/CRAM_(file_format)">CRAM</a><span>,&nbsp;</span><a href="https://en.wikipedia.org/wiki/SAM_(file_format)">SAM</a><span>,&nbsp;</span><a href="https://genome.ucsc.edu/goldenPath/help/wiggle.html">Wiggle</a><span>,&nbsp;</span><a href="https://genome.ucsc.edu/goldenPath/help/bigWig.html">BigWig</a><span>,&nbsp;</span><a href="https://genome.ucsc.edu/FAQ/FAQformat.html#format1">BED</a><span>,&nbsp;</span><a href="https://genome.ucsc.edu/FAQ/FAQformat.html#format3">GFF</a><span>,&nbsp;</span><a href="https://genome.ucsc.edu/FAQ/FAQformat.html#format4">GTF</a><span>,&nbsp;</span><a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/">MAF</a><span>&nbsp;</span><a href="https://samtools.github.io/hts-specs/VCFv4.2.pdf">VCF</a><span>, and&nbsp;</span><a href="https://sites.google.com/site/gvcftools/home/about-gvcf">gVCF</a><span>.</span></p><p>Address of the bookmark: <a href="http://crossmap.sourceforge.net/" rel="nofollow">http://crossmap.sourceforge.net/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>