<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36597?offset=120</link>
	<atom:link href="https://bioinformaticsonline.com/related/36597?offset=120" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</guid>
	<pubDate>Fri, 19 Oct 2018 07:25:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</link>
	<title><![CDATA[BASE: a practical de novo assembler for large genomes using long NGS reads]]></title>
	<description><![CDATA[<p><span>new&nbsp;</span><em>de novo</em><span>&nbsp;assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.</span></p><p>Address of the bookmark: <a href="https://github.com/dhlbh/BASE" rel="nofollow">https://github.com/dhlbh/BASE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37962/wtdbg2-a-de-novo-sequence-assembler-for-long-noisy-reads-produced-by-pacbio-or-oxford-nanopore</guid>
	<pubDate>Fri, 19 Oct 2018 08:48:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37962/wtdbg2-a-de-novo-sequence-assembler-for-long-noisy-reads-produced-by-pacbio-or-oxford-nanopore</link>
	<title><![CDATA[Wtdbg2: a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore]]></title>
	<description><![CDATA[<p><span>Wtdbg2 is a&nbsp;</span><em>de novo</em><span>&nbsp;sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). It assembles raw reads without error correction and then builds the consensus from intermediate assembly output. Wtdbg2 is able to assemble the human and even the 32Gb&nbsp;</span><a href="https://www.nature.com/articles/nature25458">Axolotl</a><span>&nbsp;genome at a speed tens of times faster than&nbsp;</span><a href="https://github.com/marbl/canu">CANU</a><span>&nbsp;and&nbsp;</span><a href="https://github.com/PacificBiosciences/FALCON">FALCON</a><span>while producing contigs of comparable base accuracy.</span></p><p>Address of the bookmark: <a href="https://github.com/ruanjue/wtdbg2" rel="nofollow">https://github.com/ruanjue/wtdbg2</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/33306/ancestral-sequence-reconstruction-asr-or-ancestral-genesequence-reconstructionresurrection-tools-to-study-molecular-evolution</guid>
	<pubDate>Tue, 30 May 2017 04:20:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/33306/ancestral-sequence-reconstruction-asr-or-ancestral-genesequence-reconstructionresurrection-tools-to-study-molecular-evolution</link>
	<title><![CDATA[Ancestral sequence reconstruction (ASR) or ancestral gene/sequence reconstruction/resurrection tools to study molecular evolution]]></title>
	<description><![CDATA[<p><span><strong>Ancestral sequence reconstruction</strong><span>&nbsp;(</span><strong>ASR</strong><span>) &ndash; also known as&nbsp;</span><strong>ancestral gene</strong><span>/</span><strong>sequence reconstruction</strong><span>/</span><strong>resurrection</strong><span>&nbsp;&ndash; is a technique used in the study of&nbsp;</span>molecular evolution<span>. The method consists of the synthesis of an ancestral&nbsp;</span>gene<span>&nbsp;and expression of the corresponding ancestral&nbsp;</span>protein<span>.&nbsp;</span><sup id="cite_ref-thornton_1-0"><a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-thornton-1"></a></sup><span>The idea of protein 'resurrection' was suggested in 1963 by Pauling and Zuckerkandl.</span><sup id="cite_ref-2"><a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-2"></a></sup><span>&nbsp;Some early efforts were made in the eighties-nineties, led by the laboratory of&nbsp;</span>Steven A. Benner<span>, showing the potential of this technique &ndash; one that only started to be fulfilled in the post-genomic era.</span><sup id="cite_ref-3"><a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-3"></a></sup><span>&nbsp;Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes.</span><sup id="cite_ref-4"><a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-4"></a></sup><span>&nbsp;Over the last decade, ancestral protein resurrection has developed as a strategy to reveal the mechanisms and dynamics of protein evolution.&nbsp;</span></span></p><p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/ASR_phylogeny.png/510px-ASR_phylogeny.png" alt="image" width="610" height="435" style="border: 0px; border: 0px;"></p><p><span>Following are the list of&nbsp;</span><strong style="font-size: 12.8px;">Ancestral /sequence/ reconstruction</strong><span>&nbsp;(</span><strong style="font-size: 12.8px;">ASR</strong><span>) tools:&nbsp;</span></p><p><a href="http://www.bx.psu.edu/miller_lab/car/" target="_blank" title="To inferCars official website"><span>inferCars</span></a></p><p><span><span><span><span><span>Reconstructs contiguous regions of an ancestral genome. Given information about adjacencies between conserved segments in each modern species, our goal is to infer segment order in the ancestral genome. To get a clean and precise statement of the problem, we formalize it using graph theory. We develop an algorithm that identifies a most parsimonious scenario for the history of each individual adjacency, although the whole-genome prediction is not guaranteed to optimize traditional measures like the number of breakpoints. We introduce weights to the graph edges to model the reliability of each adjacency.</span></span></span></span></span></p><p><span><span><a href="http://paleogenomics.irmacs.sfu.ca/ANGES/" target="_blank" title="To ANGES official website">ANGES</a>:</span><a href="http://paleogenomics.irmacs.sfu.ca/ANGES/" target="_blank" title="To ANGES official website">reconstructing ANcestral GEnomeS maps</a></span></p><p><span><span><span><span><span><span>A suite of Python programs that allows reconstructing ancestral genome maps from the comparison of the organization of extant-related genomes. ANGES can reconstruct ancestral genome maps for multichromosomal linear genomes and unichromosomal circular genomes. It implements methods inspired from techniques developed to compute physical maps of extant genomes.</span></span></span></span></span></span></p><p><a href="http://virulence.molgen.mpg.de/cocos/" target="_blank" title="To Cocos official website"><span>Cocos</span></a></p><p><span><span><span><span><span><span><span>Constructs phylogenies of multi-domain proteins. With a given species tree and domain phylogenies, the procedure infers the composition of ancestral multi-domain proteins. Cocos implements and extend a suggested algorithmic approach by Behzadi and Vingron in an easy-to-use program. Such method could be applied to reconstruction of partial homologous units such as bacterial operons or protein complexes.</span></span></span></span></span></span></span></p><p><a href="https://github.com/msrosenberg/MySSP" target="_blank" title="To MySSP official website"><span>MySSP</span></a></p><p><span><span><span><span><span><span><span><span>Constructs an initial DNA sequence at the root of the tree and simulates evolution across the tree using a variety of common models of DNA evolution. MySSP is a program for the simulation of DNA sequence evolution across a phylogenetic tree. It is designed for large-scale studies, including simulation of multiple replicates and outputs sequences into NEXUS, MEGA, or FASTA formats. MySSP has a fairly simple graphical user interface (GUI) for basic use, but also has a specialized batch script interpreter to allow for more complicated or large-scale simulations.</span></span></span></span></span></span></span></span></p><p><span><span><a href="http://www.cs.cmu.edu/~ckingsf/software/parana/" target="_blank" title="To PARANA official website">PARANA</a>:&nbsp;</span><a href="http://www.cs.cmu.edu/~ckingsf/software/parana/" target="_blank" title="To PARANA official website">Parsimonious Ancestral Reconstruction And Network Analysis</a></span></p><p><span><span><span><span><span><span><span><span><span>Performs parsimony based inference of ancestral biological networks. Given multiple extant networks and phylogenetic information relating extant nodes, PARANA finds a parsimonious set of ancestral interaction events (edge gains and losses) which explain the extant networks. The framework adopted by PARANA is able to represent network evolution under models that support gene duplication and loss and independent interaction gain and loss. The method works on both directed and undirected networks and can incorporate asymmetric interaction gain and loss costs. In contrast to previous approaches, PARANA does not require knowing the relative ordering of unrelated duplication events and thus, works on phylogenetic trees even where branch lengths are not provided.</span></span></span></span></span></span></span></span></span></p><p><span><span><a href="http://www-labs.iro.umontreal.ca/~mabrouk/" target="_blank" title="To GapAdj official website">GapAdj</a>:&nbsp;</span><a href="http://www-labs.iro.umontreal.ca/~mabrouk/" target="_blank" title="To GapAdj official website">Gapped Adjacencies</a></span></p><p><span><span><span><span><span><span><span><span><span><span>A synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of contiguous ancestral regions (CARs).</span></span></span></span></span></span></span></span></span></span></p><p><a href="http://ancestors.bioinfo.uqam.ca/"><span><span><span><span><span><span><span><span><span><span>ANCESTOR</span></span></span></span></span></span></span></span></span></span></a></p><p><span><span><span><span><span><span><span><span><span><span><span>A web server allowing one to easily and quickly perform the last three steps of the ancestral genome reconstruction procedure. Ancestors implements several alignment algorithms, an indel maximum likelihood solver and a context-dependent maximum likelihood substitution inference algorithm. The results presented by the server include the posterior probabilities for the last two steps of the ancestral genome reconstruction and the expected error rate of each ancestral base prediction.</span></span></span></span></span></span></span></span></span></span></span></p><p><a href="http://bioinfo.lifl.fr/procars/" target="_blank" title="To ProCARs official website"><span>ProCARs</span></a></p><p>Reconstructs ancestral gene orders as contiguous ancestral regions (CARs) with a progressive homology-based method. ProCARs runs from a phylogeny tree (without branch lengths needed) with a marked ancestor and a block file. This homology-based method is based on iteratively detecting and assembling ancestral adjacencies, while allowing some micro-rearrangements of synteny blocks at the extremities of the progressively assembled CARs. The method starts with a set of blocks as the initial set of CARs, and detects iteratively the potential ancestral adjacencies between extremities of CARs, while building up the CARs progressively by adding, at each step, new non-conflicting adjacencies that induce the less homoplasy phenomenon. The species tree is used, in some additional internal steps, to compute a score for the remaining conflicting adjacencies, and to detect other reliable adjacencies, in order to reach completely assembled ancestral genomes.</p><p><a href="http://fastml.tau.ac.il/" target="_blank" title="To FastML official website"><span>FastML</span></a></p><p>A user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences.</p><p><a href="http://rth.dk/resources/maxAlike/" target="_blank" title="To maxAlike official website"><span>maxAlike</span></a></p><p>Reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level.</p><p><span><span><a href="http://www.geneorder.org/server.php" target="_blank" title="To MLGO official website">MLGO</a>:&nbsp;</span><a href="http://www.geneorder.org/server.php" target="_blank" title="To MLGO official website">Maximum Likelihood for Gene Order Analysis</a></span></p><p>A web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO was designed for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. MLGO can be used to infer a phylogeny from genome rearrangement and gene order data, and can also obtain an estimation of ancestral genomes, given an input tree. MLGO takes the advantage of binary encoding on gene-order data, supports a fairly general model of genomic evolution (rearrangements plus duplications, insertions, and losses of genomic regions), and successfully accommodates itself into the framework of maximized likelihood.</p><p>Image Reference : Wiki</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44513/mike-an-ultrafast-assembly-and-alignment-free-approach-for-phylogenetic-tree-construction</guid>
	<pubDate>Mon, 08 Apr 2024 06:19:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44513/mike-an-ultrafast-assembly-and-alignment-free-approach-for-phylogenetic-tree-construction</link>
	<title><![CDATA[MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction]]></title>
	<description><![CDATA[<p><span>MIKE (MinHash-based&nbsp;</span><em>k</em><span>-mer algorithm). This algorithm is designed for the swift calculation of the Jaccard coefficient directly from raw sequencing reads and enables the construction of phylogenetic trees based on the resultant Jaccard coefficient. Simulation results highlight the superior speed of MIKE compared to existing state-of-the-art methods. We used MIKE to reconstruct a phylogenetic tree, incorporating 238 yeast, 303&nbsp;</span><em>Zea</em><span>, 141&nbsp;</span><em>Ficus</em><span>, 67&nbsp;</span><em>Oryza</em><span>, and 43&nbsp;</span><em>Saccharum spontaneum</em><span>&nbsp;samples. MIKE demonstrated accurate performance across varying evolutionary scales, reproductive modes, and ploidy levels, proving itself as a powerful tool for phylogenetic tree construction.</span></p><p>Address of the bookmark: <a href="https://github.com/Argonum-Clever2/mike" rel="nofollow">https://github.com/Argonum-Clever2/mike</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34246/unicycler-hybrid-assembly-pipeline-for-bacterial-genomes</guid>
	<pubDate>Fri, 10 Nov 2017 03:58:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34246/unicycler-hybrid-assembly-pipeline-for-bacterial-genomes</link>
	<title><![CDATA[Unicycler: Hybrid assembly pipeline for bacterial genomes]]></title>
	<description><![CDATA[<p><span>Unicycler is an assembly pipeline for bacterial genomes. It can assemble&nbsp;</span><a href="http://www.illumina.com/">Illumina</a><span>-only read sets where it functions as a&nbsp;</span><a href="http://cab.spbu.ru/software/spades/">SPAdes</a><span>-optimiser. It can also assembly long-read-only sets (</span><a href="http://www.pacb.com/">PacBio</a><span>&nbsp;or&nbsp;</span><a href="https://nanoporetech.com/">Nanopore</a><span>) where it runs a&nbsp;</span><a href="https://github.com/lh3/miniasm">miniasm</a><span>+</span><a href="https://github.com/isovic/racon">Racon</a><span>&nbsp;pipeline. For the best possible assemblies, give it both Illumina reads&nbsp;</span><em>and</em><span>&nbsp;long reads, and it will conduct a hybrid assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/rrwick/Unicycler" rel="nofollow">https://github.com/rrwick/Unicycler</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</guid>
	<pubDate>Mon, 27 Nov 2017 08:05:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</link>
	<title><![CDATA[SPAdes hybrid genome assembly]]></title>
	<description><![CDATA[<p>When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the&nbsp;<a href="https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0101-6">B fragilis assembly</a>&nbsp;by Mick Watson&rsquo;s group.</p><p>Again, running spades.py will show you the options:</p><div><pre><code>spades.py
</code></pre></div><p>This produces:</p><div><pre><code>SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o &lt;output_dir&gt;

Basic options:
-o      &lt;output_dir&gt;    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12    &lt;filename&gt;      file with interlaced forward and reverse paired-end reads
-1      &lt;filename&gt;      file with forward paired-end reads
-2      &lt;filename&gt;      file with reverse paired-end reads
-s      &lt;filename&gt;      file with unpaired reads
--pe&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-&lt;or&gt;    orientation of reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--s&lt;#&gt;          &lt;filename&gt;      file with unpaired reads for single reads library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-&lt;or&gt;    orientation of reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--hqmp&lt;#&gt;-12    &lt;filename&gt;      file with interlaced reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-1     &lt;filename&gt;      file with forward reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-2     &lt;filename&gt;      file with reverse reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-s     &lt;filename&gt;      file with unpaired reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-&lt;or&gt;  orientation of reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--nxmate&lt;#&gt;-1   &lt;filename&gt;      file with forward reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--nxmate&lt;#&gt;-2   &lt;filename&gt;      file with reverse reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--sanger        &lt;filename&gt;      file with Sanger reads
--pacbio        &lt;filename&gt;      file with PacBio reads
--nanopore      &lt;filename&gt;      file with Nanopore reads
--tslr  &lt;filename&gt;      file with TSLR-contigs
--trusted-contigs       &lt;filename&gt;      file with trusted contigs
--untrusted-contigs     &lt;filename&gt;      file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from  &lt;cp&gt;    restart run with updated options and from the specified check-point ('ec', 'as', 'k&lt;int&gt;', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset       &lt;filename&gt;      file with dataset description in YAML format
-t/--threads    &lt;int&gt;           number of threads
                                [default: 16]
-m/--memory     &lt;int&gt;           RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir       &lt;dirname&gt;       directory for temporary files
                                [default: &lt;output_dir&gt;/tmp]
-k              &lt;int,int,...&gt;   comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff    &lt;float&gt;         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  &lt;33 or 64&gt;      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]
</code></pre></div><p>As you can see this is also a &ldquo;pipeline&rdquo; of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:</p><div><pre><code>spades.py -t 4 <span>\</span>
          -m 32 <span>\</span>
          -k 31,51,71 <span>\</span>
          --only-assembler <span>\</span>
          -1 miseq.1.fastq -2 miseq.2.fastq <span>\</span>
          --nanopore minion.fastq <span>\</span>
          -o hybrid_assembly
</code></pre></div><p>In turn, these parameters mean</p><ul>
<li>use 4 threads</li>
<li>max memory is 32Gb</li>
<li>use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71</li>
<li>only run the assembler, not the correction algorithm (for speed)</li>
<li>read 1 and read 2 of the MiSeq data</li>
<li>the nanopore data</li>
<li>put the output in folder &ldquo;hybrid_assembly&rdquo;</li>
</ul>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</guid>
	<pubDate>Tue, 19 Dec 2017 17:17:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</link>
	<title><![CDATA[String graph based genome assembly software and tools !]]></title>
	<description><![CDATA[<p>In&nbsp;<a href="https://en.wikipedia.org/wiki/Graph_theory" title="Graph theory">graph theory</a>, a&nbsp;<strong>string graph</strong>&nbsp;is an&nbsp;<a href="https://en.wikipedia.org/wiki/Intersection_graph" title="Intersection graph">intersection graph</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Curve" title="Curve">curves</a>&nbsp;in the plane; each curve is called a "string".&nbsp; String graphs were first proposed by E. W. Myers in a&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">2005 publication</a>.&nbsp;In&nbsp;recent&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Genome Research paper</a>&nbsp;describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the&nbsp;paper is coauthored by Jared Simpson, the developer of&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694472/">ABySS assembler</a>&nbsp;and Richard Durbin. iii)&nbsp;Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called &lsquo;string graph&rsquo;.</p><p>Following are the genome assembly tools based on string graph:</p><p>1.SGA (String Graph Assembler)&nbsp;https://github.com/jts/sga</p><p>Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.</p><p>2.&nbsp;SAGE: String-overlap Assembly of GEnomes&nbsp;https://github.com/lucian-ilie/SAGE2</p><p>SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.</p><p>3. FSG: Fast String Graph</p><p>The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.</p><p>4.&nbsp;&nbsp;BASE&nbsp;https://github.com/dhlbh/BASE</p><p>It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.&nbsp;BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.</p><p>5.&nbsp;Fermi&nbsp;https://github.com/lh3/fermi/</p><p>Fermi is a de novo assembler with a particular focus on assembling Illumina&nbsp;short sequence reads from a mammal-sized genome. In addition to the role of a&nbsp;typical assembler, fermi also aims to preserve heterozygotes which are often&nbsp;collapsed by other assemblers. Its ultimate goal is to find a minimal set of&nbsp;unitigs to represent all the information in raw reads.</p><p>If you want to learn about String Graph assembler, please read the following papers -</p><p>i)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">The Fragment Assembly String Graph - E. W. Myers</a></p><p>This paper describes the String Graph concept.</p><p>ii)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/26/12/i367.full#ref-20">Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin</a></p><p>This earlier paper from Simpson and Durbin</p><p>iii)&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin</a></p><p>&nbsp;</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35762/genome-assembly-stats-plotting</guid>
	<pubDate>Wed, 28 Feb 2018 03:45:39 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35762/genome-assembly-stats-plotting</link>
	<title><![CDATA[Genome assembly stats plotting]]></title>
	<description><![CDATA[<p>A&nbsp;<em>de novo</em>&nbsp;genome assembly can be summarised b</p>
<p>y a number of metrics, including:</p>
<ul>
<li>Overall assembly length</li>
<li>Number of scaffolds/contigs</li>
<li>Length of longest scaffold/contig</li>
<li>Scaffold/contig N50 and N90Assembly base composition, in particular percentage GC and percentage Ns</li>
<li>CEGMA completeness</li>
<li>Scaffold/contig length/count distribution</li>
</ul>
<p>assembly-stats supports two widely used presentations of these values, tabular and cumulative length plots, and introduces an additional circular plot that summarises most commonly used assembly metrics in a single visualisation. Each of these presentations is generated using javascript from a common (JSON) data structure, allowing toggling between alternative views, and each can be applied to a single or multiple assemblies to allow direct comparison of alternate assemblies.</p>
<p>Tabular presentation allows direct comparison of exact values between assemblies, the limitations of this approach lie in the necessary omission of distributions and the challenge of interpreting ratios of values that may vary by several orders of magnitude.</p><p>Address of the bookmark: <a href="https://github.com/rjchallis/assembly-stats" rel="nofollow">https://github.com/rjchallis/assembly-stats</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</guid>
	<pubDate>Fri, 20 Jul 2018 10:03:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</link>
	<title><![CDATA[Converting a VCF into a FASTA given some reference !]]></title>
	<description><![CDATA[<p>Samtools/BCFtools (Heng Li) provides a Perl script&nbsp;<a href="https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl"><code>vcfutils.pl</code></a>&nbsp;which does this, the function&nbsp;<code>vcf2fq</code>&nbsp;(lines 469-528)</p><p>This script has been modified by others to convert InDels as well, e.g.&nbsp;<a href="https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl">this</a>&nbsp;by David Eccles</p><pre><code><span>./</span><span>vcf2fq</span><span>.</span><span>pl </span><span>-</span><span>f </span><span>&lt;</span><span>input</span><span>.</span><span>fasta</span><span>&gt;</span><span> </span><span>&lt;</span><span>all</span><span>-</span><span>site</span><span>.</span><span>vcf</span><span>&gt;</span><span> </span><span>&gt;</span><span> </span><span>&lt;</span><span>output</span><span>.</span><span>fastq</span><span>&gt;</span></code></pre><p>https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl</p><p>https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38210/skesa-strategic-k-mer-extension-for-scrupulous-assemblies</guid>
	<pubDate>Wed, 14 Nov 2018 04:45:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38210/skesa-strategic-k-mer-extension-for-scrupulous-assemblies</link>
	<title><![CDATA[SKESA: strategic k-mer extension for scrupulous assemblies]]></title>
	<description><![CDATA[<p><span>SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. </span></p>
<p><span>Source code for SKESA is freely available at&nbsp;</span><span><a href="https://github.com/ncbi/SKESA/releases"><span>https://github.com/ncbi/SKESA/releases</span></a></span><span>.</span></p>
<p>Research Paper&nbsp;@ <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1540-z">Link</a></p>
<p><span><span>SKESA algorithm are as follows:</span><br></span></p>
<p><span><img src="https://media.springernature.com/lw785/springer-static/image/art%3A10.1186%2Fs13059-018-1540-z/MediaObjects/13059_2018_1540_Fig4_HTML.png" alt="image" width="785" height="984" style="border: 0px; border: 0px;"></span></p><p>Address of the bookmark: <a href="https://github.com/ncbi/SKESA/releases" rel="nofollow">https://github.com/ncbi/SKESA/releases</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>