<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/32905?offset=110</link>
	<atom:link href="https://bioinformaticsonline.com/related/32905?offset=110" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38563/hecil-a-hybrid-error-correction-algorithm-for-long-reads-with-iterative-learning</guid>
	<pubDate>Tue, 01 Jan 2019 12:01:00 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38563/hecil-a-hybrid-error-correction-algorithm-for-long-reads-with-iterative-learning</link>
	<title><![CDATA[HECIL: A Hybrid Error Correction Algorithm for Long Reads with Iterative Learning]]></title>
	<description><![CDATA[<p><span>HECIL&mdash;Hybrid Error Correction with Iterative Learning&mdash;a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read alignments.&nbsp;</span></p>
<p><span><span>HECIL&rsquo;s core algorithm by introducing an iterative learning paradigm that enhances the correction policy at each iteration by incorporating knowledge gathered from previous iterations via data-driven confidence metrics assigned to prior corrections.</span></span></p><p>Address of the bookmark: <a href="https://github.com/NDBL/HECIL" rel="nofollow">https://github.com/NDBL/HECIL</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</guid>
	<pubDate>Tue, 08 May 2018 04:52:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</link>
	<title><![CDATA[Metassembler: merging and optimizing de novo genome assemblies]]></title>
	<description><![CDATA[<p><span>Metassembler combines multiple whole genome de novo assemblies into a combined consensus assembly using the best segments of the individual assemblies.</span></p>
<p><span><span>Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence.&nbsp;</span></span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/metassembler/?source=directory" rel="nofollow">https://sourceforge.net/projects/metassembler/?source=directory</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44591/yamp-yet-another-metagenomic-pipeline</guid>
	<pubDate>Sat, 06 Jul 2024 04:26:00 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44591/yamp-yet-another-metagenomic-pipeline</link>
	<title><![CDATA[YAMP: Yet Another Metagenomic Pipeline]]></title>
	<description><![CDATA[<p><span>YAMP is constructed on&nbsp;</span><a href="https://www.nextflow.io/docs/latest/index.html">Nextflow</a><span>, a framework based on the dataflow programming model, which allows writing workflows that are highly parallel, easily portable (including on distributed systems), and very flexible and customisable, characteristics which have been inherited by YAMP. New modules can be added easily and the existing ones can be customised -- even though we have already provided default parameters deriving from our own experience.</span></p><p>Address of the bookmark: <a href="https://github.com/alesssia/YAMP" rel="nofollow">https://github.com/alesssia/YAMP</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44589/sourmash-quickly-search-compare-and-analyze-genomic-and-metagenomic-data-sets</guid>
	<pubDate>Sat, 06 Jul 2024 04:24:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44589/sourmash-quickly-search-compare-and-analyze-genomic-and-metagenomic-data-sets</link>
	<title><![CDATA[sourmash: Quickly search, compare, and analyze genomic and metagenomic data sets.]]></title>
	<description><![CDATA[<p dir="auto">sourmash is a k-mer analysis multitool, and we aim to provide stable, robust programmatic and command-line APIs for a variety of sequence comparisons. Some of our special sauce includes:</p>
<ul dir="auto">
<li><code>FracMinHash</code>&nbsp;sketching, which enables accurate comparisons (including ANI) between data sets of different sizes</li>
<li><code>sourmash gather</code>, a combinatorial k-mer approach for more accurate metagenomic profiling</li>
</ul>
<p dir="auto">Please see the&nbsp;<a href="https://sourmash.readthedocs.io/en/latest/publications.html#sourmash-fundamentals">sourmash publications</a>&nbsp;for details.</p>
<p dir="auto">The name is a riff off of&nbsp;<a href="https://github.com/marbl/Mash">Mash</a>, combined with @ctb's love of whiskey. (<a href="https://en.wikipedia.org/wiki/Sour_mash">Sour mash</a>&nbsp;is used in making whiskey.)</p>
<p dir="auto">Maintainers:&nbsp;<a href="mailto:titus@idyll.org">C. Titus Brown</a>&nbsp;(<a href="http://github.com/ctb">@ctb</a>),&nbsp;<a href="mailto:luiz@sourmash.bio">Luiz C. Irber, Jr</a>&nbsp;(<a href="http://github.com/luizirber">@luizirber</a>), and&nbsp;<a href="mailto:tessa@sourmash.bio">N. Tessa Pierce-Ward</a>&nbsp;(<a href="http://github.com/bluegenes">@bluegenes</a>).</p>
<p dir="auto">sourmash was initially developed by the&nbsp;<a href="http://ivory.idyll.org/lab/">Lab for Data-Intensive Biology</a>&nbsp;at the&nbsp;<a href="http://www.vetmed.ucdavis.edu/">UC Davis School of Veterinary Medicine</a>, and now includes contributions from the global research and developer community.</p><p>Address of the bookmark: <a href="https://github.com/sourmash-bio/sourmash" rel="nofollow">https://github.com/sourmash-bio/sourmash</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35033/bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries</guid>
	<pubDate>Wed, 03 Jan 2018 00:25:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35033/bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries</link>
	<title><![CDATA[BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries]]></title>
	<description><![CDATA[<p>BBSplit internally uses BBMap to map reads to multiple genomes at once, and determine which genome they match best. This is different than with ordinary mapping. If a genome (say, human) contains an exact repeat somewhere, reads mapping to it will be mapped ambiguously. But if you want to determine whether reads are mouse or human, it does not matter whether they map ambiguously within human, only whether they are ambiguous between human and mouse. BBSplit tracks this additional ambiguity information and decides how to use it based on the &ldquo;ambig2&rdquo; flag. The normal use of BBSplit is like Seal, either quantifying how many reads go to each reference, or splitting the reads into multiple output files, one per reference. BBSplit can only be run using references indexed with BBSplit, as they contain additional information regarding which sequences came from which reference file.</p><p><span>BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using&nbsp;</span><a href="http://seqanswers.com/forums/showthread.php?t=41057" target="_blank">BBMap</a><span>. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.</span><br /><br /><span>For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this:</span><br /><br /><strong>bbsplit.sh in=reads.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu=clean.fq int=t</strong><br /><br /><span>This will produce 3 output files:</span><br /><strong>out_ecoli.fq</strong><span>&nbsp;(ecoli reads)</span><br /><strong>out_salmonella.fq</strong><span>&nbsp;(salmonella reads)</span><br /><strong>clean.fq</strong><span>&nbsp;(unmapped reads)</span><br /><br /><span>In this case, "int=t" means that the input file is paired and interleaved. For single-end reads you would leave that out. For paired reads in 2 files, you would do this:</span><br /><strong>bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq</strong></p><p><strong><span>BBSplit is available here:</span><br /><a href="https://sourceforge.net/projects/bbmap/" target="_blank">https://sourceforge.net/projects/bbmap/</a></strong></p><p><span>The sensitivity can be raised to be equivalent to BBMap with these flags: "minratio=0.56 minhits=1 maxindel=16000"</span></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26919/pear-a-fast-and-accurate-illumina-paired-end-read-merger</guid>
	<pubDate>Wed, 06 Apr 2016 13:27:23 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26919/pear-a-fast-and-accurate-illumina-paired-end-read-merger</link>
	<title><![CDATA[PEAR: a fast and accurate Illumina Paired-End reAd mergeR]]></title>
	<description><![CDATA[<p><strong>PEAR</strong>&nbsp;is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.</p>
<p>PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer.</p>
<p>More at&nbsp;http://www.exelixis-lab.org/web/software/pear</p>
<p>Paper:&nbsp;http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933873/</p><p>Address of the bookmark: <a href="http://www.exelixis-lab.org/web/software/pear" rel="nofollow">http://www.exelixis-lab.org/web/software/pear</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37574/simlord-a-read-simulator-for-third-generation-sequencing-reads</guid>
	<pubDate>Wed, 22 Aug 2018 10:40:27 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37574/simlord-a-read-simulator-for-third-generation-sequencing-reads</link>
	<title><![CDATA[SimLoRD: A read simulator for third generation sequencing reads]]></title>
	<description><![CDATA[<p>SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.</p>
<p>Reads are simulated from both strands of a provided or randomly generated reference sequence.</p>
<div id="rst-header-features">
<ul>
<li>The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)</li>
<li>The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length</li>
<li>Quality values and number of passes depend on fragment length.</li>
<li>Provided subread error probabilities are modified according to number of passes</li>
<li>Outputs reads in FASTQ format and alignments in SAM format</li>
</ul>
</div><p>Address of the bookmark: <a href="https://bitbucket.org/genomeinformatics/simlord/" rel="nofollow">https://bitbucket.org/genomeinformatics/simlord/</a></p>]]></description>
	<dc:creator>Aaryan Lokwani</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44527/alvis-a-tool-for-contig-and-read-alignment-visualisation-and-chimera-detection</guid>
	<pubDate>Wed, 08 May 2024 07:02:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44527/alvis-a-tool-for-contig-and-read-alignment-visualisation-and-chimera-detection</link>
	<title><![CDATA[Alvis: a tool for contig and read ALignment VISualisation and chimera detection]]></title>
	<description><![CDATA[<p><span>Alvis, a simple command line tool that can generate visualisations for a number of common alignment analysis tasks. Alvis is a fast and portable tool that accepts input in a variety of alignment formats and will output production ready vector images. Additionally, Alvis will highlight potentially chimeric reads or contigs, a common source of misassemblies.</span></p>
<p>More at&nbsp;https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04056-0</p><p>Address of the bookmark: <a href="https://github.com/SR-Martin/alvis" rel="nofollow">https://github.com/SR-Martin/alvis</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</guid>
	<pubDate>Mon, 27 Nov 2017 08:05:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</link>
	<title><![CDATA[SPAdes hybrid genome assembly]]></title>
	<description><![CDATA[<p>When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the&nbsp;<a href="https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0101-6">B fragilis assembly</a>&nbsp;by Mick Watson&rsquo;s group.</p><p>Again, running spades.py will show you the options:</p><div><pre><code>spades.py
</code></pre></div><p>This produces:</p><div><pre><code>SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o &lt;output_dir&gt;

Basic options:
-o      &lt;output_dir&gt;    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12    &lt;filename&gt;      file with interlaced forward and reverse paired-end reads
-1      &lt;filename&gt;      file with forward paired-end reads
-2      &lt;filename&gt;      file with reverse paired-end reads
-s      &lt;filename&gt;      file with unpaired reads
--pe&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-&lt;or&gt;    orientation of reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--s&lt;#&gt;          &lt;filename&gt;      file with unpaired reads for single reads library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-&lt;or&gt;    orientation of reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--hqmp&lt;#&gt;-12    &lt;filename&gt;      file with interlaced reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-1     &lt;filename&gt;      file with forward reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-2     &lt;filename&gt;      file with reverse reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-s     &lt;filename&gt;      file with unpaired reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-&lt;or&gt;  orientation of reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--nxmate&lt;#&gt;-1   &lt;filename&gt;      file with forward reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--nxmate&lt;#&gt;-2   &lt;filename&gt;      file with reverse reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--sanger        &lt;filename&gt;      file with Sanger reads
--pacbio        &lt;filename&gt;      file with PacBio reads
--nanopore      &lt;filename&gt;      file with Nanopore reads
--tslr  &lt;filename&gt;      file with TSLR-contigs
--trusted-contigs       &lt;filename&gt;      file with trusted contigs
--untrusted-contigs     &lt;filename&gt;      file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from  &lt;cp&gt;    restart run with updated options and from the specified check-point ('ec', 'as', 'k&lt;int&gt;', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset       &lt;filename&gt;      file with dataset description in YAML format
-t/--threads    &lt;int&gt;           number of threads
                                [default: 16]
-m/--memory     &lt;int&gt;           RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir       &lt;dirname&gt;       directory for temporary files
                                [default: &lt;output_dir&gt;/tmp]
-k              &lt;int,int,...&gt;   comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff    &lt;float&gt;         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  &lt;33 or 64&gt;      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]
</code></pre></div><p>As you can see this is also a &ldquo;pipeline&rdquo; of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:</p><div><pre><code>spades.py -t 4 <span>\</span>
          -m 32 <span>\</span>
          -k 31,51,71 <span>\</span>
          --only-assembler <span>\</span>
          -1 miseq.1.fastq -2 miseq.2.fastq <span>\</span>
          --nanopore minion.fastq <span>\</span>
          -o hybrid_assembly
</code></pre></div><p>In turn, these parameters mean</p><ul>
<li>use 4 threads</li>
<li>max memory is 32Gb</li>
<li>use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71</li>
<li>only run the assembler, not the correction algorithm (for speed)</li>
<li>read 1 and read 2 of the MiSeq data</li>
<li>the nanopore data</li>
<li>put the output in folder &ldquo;hybrid_assembly&rdquo;</li>
</ul>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</guid>
	<pubDate>Tue, 19 Dec 2017 17:17:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</link>
	<title><![CDATA[String graph based genome assembly software and tools !]]></title>
	<description><![CDATA[<p>In&nbsp;<a href="https://en.wikipedia.org/wiki/Graph_theory" title="Graph theory">graph theory</a>, a&nbsp;<strong>string graph</strong>&nbsp;is an&nbsp;<a href="https://en.wikipedia.org/wiki/Intersection_graph" title="Intersection graph">intersection graph</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Curve" title="Curve">curves</a>&nbsp;in the plane; each curve is called a "string".&nbsp; String graphs were first proposed by E. W. Myers in a&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">2005 publication</a>.&nbsp;In&nbsp;recent&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Genome Research paper</a>&nbsp;describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the&nbsp;paper is coauthored by Jared Simpson, the developer of&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694472/">ABySS assembler</a>&nbsp;and Richard Durbin. iii)&nbsp;Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called &lsquo;string graph&rsquo;.</p><p>Following are the genome assembly tools based on string graph:</p><p>1.SGA (String Graph Assembler)&nbsp;https://github.com/jts/sga</p><p>Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.</p><p>2.&nbsp;SAGE: String-overlap Assembly of GEnomes&nbsp;https://github.com/lucian-ilie/SAGE2</p><p>SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.</p><p>3. FSG: Fast String Graph</p><p>The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.</p><p>4.&nbsp;&nbsp;BASE&nbsp;https://github.com/dhlbh/BASE</p><p>It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.&nbsp;BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.</p><p>5.&nbsp;Fermi&nbsp;https://github.com/lh3/fermi/</p><p>Fermi is a de novo assembler with a particular focus on assembling Illumina&nbsp;short sequence reads from a mammal-sized genome. In addition to the role of a&nbsp;typical assembler, fermi also aims to preserve heterozygotes which are often&nbsp;collapsed by other assemblers. Its ultimate goal is to find a minimal set of&nbsp;unitigs to represent all the information in raw reads.</p><p>If you want to learn about String Graph assembler, please read the following papers -</p><p>i)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">The Fragment Assembly String Graph - E. W. Myers</a></p><p>This paper describes the String Graph concept.</p><p>ii)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/26/12/i367.full#ref-20">Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin</a></p><p>This earlier paper from Simpson and Durbin</p><p>iii)&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin</a></p><p>&nbsp;</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>