<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36012?offset=210</link>
	<atom:link href="https://bioinformaticsonline.com/related/36012?offset=210" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</guid>
	<pubDate>Tue, 19 Dec 2017 17:17:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</link>
	<title><![CDATA[String graph based genome assembly software and tools !]]></title>
	<description><![CDATA[<p>In&nbsp;<a href="https://en.wikipedia.org/wiki/Graph_theory" title="Graph theory">graph theory</a>, a&nbsp;<strong>string graph</strong>&nbsp;is an&nbsp;<a href="https://en.wikipedia.org/wiki/Intersection_graph" title="Intersection graph">intersection graph</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Curve" title="Curve">curves</a>&nbsp;in the plane; each curve is called a "string".&nbsp; String graphs were first proposed by E. W. Myers in a&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">2005 publication</a>.&nbsp;In&nbsp;recent&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Genome Research paper</a>&nbsp;describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the&nbsp;paper is coauthored by Jared Simpson, the developer of&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694472/">ABySS assembler</a>&nbsp;and Richard Durbin. iii)&nbsp;Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called &lsquo;string graph&rsquo;.</p><p>Following are the genome assembly tools based on string graph:</p><p>1.SGA (String Graph Assembler)&nbsp;https://github.com/jts/sga</p><p>Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.</p><p>2.&nbsp;SAGE: String-overlap Assembly of GEnomes&nbsp;https://github.com/lucian-ilie/SAGE2</p><p>SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.</p><p>3. FSG: Fast String Graph</p><p>The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.</p><p>4.&nbsp;&nbsp;BASE&nbsp;https://github.com/dhlbh/BASE</p><p>It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.&nbsp;BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.</p><p>5.&nbsp;Fermi&nbsp;https://github.com/lh3/fermi/</p><p>Fermi is a de novo assembler with a particular focus on assembling Illumina&nbsp;short sequence reads from a mammal-sized genome. In addition to the role of a&nbsp;typical assembler, fermi also aims to preserve heterozygotes which are often&nbsp;collapsed by other assemblers. Its ultimate goal is to find a minimal set of&nbsp;unitigs to represent all the information in raw reads.</p><p>If you want to learn about String Graph assembler, please read the following papers -</p><p>i)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">The Fragment Assembly String Graph - E. W. Myers</a></p><p>This paper describes the String Graph concept.</p><p>ii)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/26/12/i367.full#ref-20">Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin</a></p><p>This earlier paper from Simpson and Durbin</p><p>iii)&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin</a></p><p>&nbsp;</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</guid>
	<pubDate>Mon, 29 Jan 2018 04:55:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</link>
	<title><![CDATA[MGcV: the microbial genomic context viewer for comparative genome analysis]]></title>
	<description><![CDATA[<p><span>MGcV is an interactive web-based visalization tool tailored to facilitate small scale genome analysis. To start using MGcV:</span></p>
<ol>
<li>Supply your genes/genomic segments/phylogenetic tree of interest in the input-box by
<ul>
<li>selecting the type of identifier and pasting identifiers (one per line)</li>
<li><em><strong>or</strong></em>&nbsp;by using the&nbsp;<a>gene ID search tool</a></li>
<li><em><strong>or</strong></em>&nbsp;with the&nbsp;<a>BLAST search tool</a></li>
</ul>
</li>
<li>Click "Visualize context".</li>
</ol>
<p><span>Consult the&nbsp;</span><a href="http://mgcv.cmbi.ru.nl/help.html" target="_blank">documentation</a><span>&nbsp;to learn more about MGcV.</span></p><p>Address of the bookmark: <a href="http://mgcv.cmbi.ru.nl/" rel="nofollow">http://mgcv.cmbi.ru.nl/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</guid>
	<pubDate>Tue, 18 Feb 2020 07:43:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</link>
	<title><![CDATA[Carefully opt for human reference genome]]></title>
	<description><![CDATA[<p><a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" target="_blank">Heng Li posted several issues with the human reference genomes given in these resources</a> and suggests the following compressed FASTA file to be used as hg38/GRCh38 human reference genome.</p>
<p>if you map reads to GRCh38 or hg38, use the following:</p>
<div>
<div>
<pre><code>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
</code></pre>
</div>
</div>
<p>There are several other versions of GRCh37/GRCh38. What&rsquo;s wrong with them? Here are a collection of potential issues:</p>
<p>More at http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</p><p>Address of the bookmark: <a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" rel="nofollow">http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</guid>
	<pubDate>Tue, 08 May 2018 04:52:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</link>
	<title><![CDATA[Metassembler: merging and optimizing de novo genome assemblies]]></title>
	<description><![CDATA[<p><span>Metassembler combines multiple whole genome de novo assemblies into a combined consensus assembly using the best segments of the individual assemblies.</span></p>
<p><span><span>Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence.&nbsp;</span></span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/metassembler/?source=directory" rel="nofollow">https://sourceforge.net/projects/metassembler/?source=directory</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/36945/download-blasr-13-version</guid>
	<pubDate>Fri, 15 Jun 2018 03:01:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/36945/download-blasr-13-version</link>
	<title><![CDATA[Download blasr 1.3 version]]></title>
	<description><![CDATA[<p>DOWNLOAD LINK: https://github.com/BioInf-Wuerzburg/proovread/raw/master/util/blasr-1.3.1/blasr</p><p>I'm running "OPERA-LG_v2.0.5/bin/preprocess_reads.pl" and have the following error:</p><p>fail to open file './temporarySam'</p><p><br />[bwa_aln_core] write to the disk... 0.09 sec<br />[bwa_aln_core] 70778880 sequences have been processed.<br />[bwa_aln_core] calculate SA coordinate... 161.35 sec<br />[bwa_aln_core] write to the disk... 0.06 sec<br />[bwa_aln_core] 70989574 sequences have been processed.<br />[main] Version: 0.7.15-r1140<br />[main] CMD: bwa aln -t 30 all_p_ctg.fa -<br />[main] Real time: 2402.523 sec; CPU: 53429.488 sec<br />[E::hts_open_format] Failed to open file temporarySam<br />samtools sort: can't open "temporarySam": No such file or directory<br />[bwa_aln_core] convert to sequence coordinate... 1.00 sec<br />[bwa_aln_core] refine gapped alignments... 6.07 sec<br />[bwa_aln_core] print alignments... PREPROCESS:<br />Fastq format is recognized<br />[Thu Jun 14 18:16:47 2018] Building bwa index...<br />bwa index -p all_p_ctg.fa /home/urbe/Tools/OPERA-LG_v2.0.6/all_p_ctg.fa<br />[Thu Jun 14 18:18:35 2018] Finding the SA coordinates of the reads using BWA aln...<br />[Thu Jun 14 18:58:37 2018] Generate alignments of reads using bwa sampe...<br />bwa samse -n 1 all_p_ctg.fa read.sai - | grep '\(^@\|XT:A:U\)' | /usr/local/bin/samtools view -S -h -b -F 0x4 - | /usr/local/bin/samtools sort -@ 20 -no - temporarySam &gt; FALCON-Unzip-Scaff.bam<br />Mapping long-reads using blasr...<br />/home/urbe/Tools/SSpace/SSPACE-LongRead_v1-1/blasr -nproc 40 -m 1 -minMatch 5 -bestn 10 -noSplitSubreads -advanceExactMatches 1 -nCandidates 1 -maxAnchorsPerPosition 1 -sdpTupleSize 7 /media/urbe/MyDDrive/ONTdata/allONT/allONT.fasta /home/urbe/Tools/OPERA-LG_v2.0.6/all_p_ctg.fa | cut -d ' ' -f1-5,7-12 | sed 's/ /\t/g' &gt; FALCON-Unzip-Scaff.map<br />sh: 1: /home/urbe/Tools/SSpace/SSPACE-LongRead_v1-1/blasr: Permission denied<br />Sorting mapping results...<br />sort -k1,1 -k9,9g FALCON-Unzip-Scaff.map &gt; FALCON-Unzip-Scaff.map.sort<br />Analyzing sorted results...<br />Extracting linking information...<br />i3 2000 5000<br />i2 1000 2000<br />i4 5000 15000<br />i0 -200 300<br />i5 15000 40000<br />i1 300 1000<br />Repeat detection...<br />/home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_conflicting_edge.pl pairedEdges_i0 contig_length.dat 100 2<br />Illegal division by zero at /home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_conflicting_edge.pl line 93.<br />readline() on closed filehandle FILE at bin/OPERA-long-read.pl line 250.<br />rm anchor_contig_info.dat contig_length.dat filtered_edges.dat filtered_edges_cov.dat *.sai<br />rm: cannot remove 'anchor_contig_info.dat': No such file or directory<br />mv FALCON-Unzip-Scaff.bam FALCON-Unzip-Scaff-with-repeat.bam<br />/home/urbe/Tools/OPERA-LG_v2.0.6/bin//filter_repeat.pl FALCON-Unzip-Scaff-with-repeat.bam repeat.dat | /usr/local/bin/samtools view - -h -S -b &gt; FALCON-Unzip-Scaff.bam<br />rm FALCON-Unzip-Scaff-with-repeat.bam<br />/home/urbe/Tools/OPERA-LG_v2.0.6/bin/OPERA-LG config &gt; log<br />Analyzing 1 library: FALCON-Unzip-Scaff.bam<br />min library mean : 0<br />minimum contig length is 500<br />Current library: 1 out of 7<br />Analyzing file: pairedEdges_no_repeat_i0<br />Analyzing file: pairedEdges_no_repeat_i1<br />Analyzing file: pairedEdges_no_repeat_i2<br />Analyzing file: pairedEdges_no_repeat_i3<br />Analyzing file: pairedEdges_no_repeat_i4<br />Analyzing file: pairedEdges_no_repeat_i5<br />ln -s results/scaffoldSeq.fasta scaffoldSeq.fasta</p><p>To resolve this, try downloading blasr version 1.3 above and re-run :)</p>]]></description>
	<dc:creator>Jit</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/36945" length="0" type="inode/x-empty" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</guid>
	<pubDate>Fri, 20 Jul 2018 10:03:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/37396/converting-a-vcf-into-a-fasta-given-some-reference</link>
	<title><![CDATA[Converting a VCF into a FASTA given some reference !]]></title>
	<description><![CDATA[<p>Samtools/BCFtools (Heng Li) provides a Perl script&nbsp;<a href="https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl"><code>vcfutils.pl</code></a>&nbsp;which does this, the function&nbsp;<code>vcf2fq</code>&nbsp;(lines 469-528)</p><p>This script has been modified by others to convert InDels as well, e.g.&nbsp;<a href="https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl">this</a>&nbsp;by David Eccles</p><pre><code><span>./</span><span>vcf2fq</span><span>.</span><span>pl </span><span>-</span><span>f </span><span>&lt;</span><span>input</span><span>.</span><span>fasta</span><span>&gt;</span><span> </span><span>&lt;</span><span>all</span><span>-</span><span>site</span><span>.</span><span>vcf</span><span>&gt;</span><span> </span><span>&gt;</span><span> </span><span>&lt;</span><span>output</span><span>.</span><span>fastq</span><span>&gt;</span></code></pre><p>https://github.com/gringer/bioinfscripts/blob/master/vcf2fq.pl</p><p>https://github.com/lh3/samtools/blob/master/bcftools/vcfutils.pl</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41673/lr-gapcloser-a-tiling-path-based-gap-closer-that-uses-long-reads-to-complete-genome-assembly</guid>
	<pubDate>Thu, 14 May 2020 15:09:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41673/lr-gapcloser-a-tiling-path-based-gap-closer-that-uses-long-reads-to-complete-genome-assembly</link>
	<title><![CDATA[LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly]]></title>
	<description><![CDATA[<p>LR_Gapcloser is a gap closing tool using long reads from studied species. The long reads could be downloaed from public read archive database (for instance, NCBI SRA database ) or be your own data. Then they are fragmented and aligned to scaffolds using BWA mem algorithm in BWA package. In the package, we provided a compiled bwa, so the user needn't to install bwa. LR_Gapcloser uses the alignments to find the bridging that cross the gap, and then fills the long read original sequence into the genomic gaps.</p><p>Address of the bookmark: <a href="https://github.com/CAFS-bioinformatics/LR_Gapcloser" rel="nofollow">https://github.com/CAFS-bioinformatics/LR_Gapcloser</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37840/long-read-assembly-workshop</guid>
	<pubDate>Thu, 04 Oct 2018 17:23:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37840/long-read-assembly-workshop</link>
	<title><![CDATA[Long read assembly workshop !]]></title>
	<description><![CDATA[<p>This is a tutorial for a workshop on long-read (PacBio) genome assembly.</p>
<p>It demonstrates how to use long PacBio sequencing reads to assemble a bacterial genome, and includes additional steps for circularising, trimming, finding plasmids, and correcting the assembly with short-read Illumina data.</p>
<p>&nbsp;Please comment if you know any other long read addembly tutorial.</p><p>Address of the bookmark: <a href="http://sepsis-omics.github.io/tutorials/modules/cmdline_assembly_v2/" rel="nofollow">http://sepsis-omics.github.io/tutorials/modules/cmdline_assembly_v2/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38413/genobuntu-a-software-package-containing-more-than-70-software-and-packages-oriented-towards-ngs-and-genome-assembly</guid>
	<pubDate>Tue, 11 Dec 2018 05:15:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38413/genobuntu-a-software-package-containing-more-than-70-software-and-packages-oriented-towards-ngs-and-genome-assembly</link>
	<title><![CDATA[Genobuntu: A software package containing more than 70 software and packages oriented towards NGS and genome assembly]]></title>
	<description><![CDATA[<p><span>Genobuntu is a software package containing more than 70 software and packages oriented towards NGS. In its current version, Genobuntu supports pre assembly tools, genome assemblers as well as post assembly tools.&nbsp;</span><br><br><span>Commonly used biological software and example script files for different assembly pipelines have also been provided, where the example script files can be updated to suit one&rsquo;s experimental needs. Genobuntu attempts to reduce the amount of time and energy needed to build software workstations and it can also act as a good teaching source for a class room setting.&nbsp;</span></p>
<p>https://sourceforge.net/projects/genobuntu/</p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/genobuntu/" rel="nofollow">https://sourceforge.net/projects/genobuntu/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38526/versatile-genome-assembly-evaluation-with-quast-lg</guid>
	<pubDate>Fri, 21 Dec 2018 22:06:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38526/versatile-genome-assembly-evaluation-with-quast-lg</link>
	<title><![CDATA[Versatile genome assembly evaluation with QUAST-LG]]></title>
	<description><![CDATA[<p>QUAST-LG is an extension of&nbsp;<a href="http://cab.spbu.ru/software/quast/">QUAST</a>&nbsp;intended for evaluating large-scale genome assemblies (up to mammalian-size).</p>
<p>QUAST-LG&nbsp;is included in the QUAST&nbsp; package starting from version 5.0.0 (<a href="https://sourceforge.net/projects/quast/files/latest/download?source=files">download the latest release</a>). Run QUAST as usual and do not forget to add&nbsp;<span>‐‐large</span>&nbsp;option to your command!</p>
<p>A short list of the new features (see&nbsp;<a href="http://cab.spbu.ru/files/quast/latest-docs/CHANGES.txt">CHANGES</a>&nbsp;for all):</p>
<ul>
<li>Significant speedup achieved by both&nbsp;use of new fast aligner (<a href="https://github.com/lh3/minimap2">minimap2</a>) and the refactoring of alignment analyzing&nbsp;modules</li>
<li>New k-mer-based completeness and correctness metrics</li>
<li>BUSCO added for enhanced reference-free analysis</li>
<li>The concept of upper bound&nbsp;assembly (theoretical limits on the assembly&nbsp;completeness and&nbsp;contiguity for a given genome and set of reads)</li>
</ul><p>Address of the bookmark: <a href="http://cab.spbu.ru/software/quast-lg/" rel="nofollow">http://cab.spbu.ru/software/quast-lg/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>