<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43728?offset=90</link>
	<atom:link href="https://bioinformaticsonline.com/related/43728?offset=90" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36880/jvarkit-java-utilities-for-bioinformatics</guid>
	<pubDate>Fri, 08 Jun 2018 09:31:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36880/jvarkit-java-utilities-for-bioinformatics</link>
	<title><![CDATA[Jvarkit : Java utilities for Bioinformatics]]></title>
	<description><![CDATA[Collection of Java tool kits for bioinformatics works:

Jvarkit : Java utilities for Bioinformatics<p>Address of the bookmark: <a href="http://lindenb.github.io/jvarkit/" rel="nofollow">http://lindenb.github.io/jvarkit/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36800/genomemapper-simultaneous-alignment-of-short-reads-against-multiple-genomes</guid>
	<pubDate>Fri, 25 May 2018 09:29:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36800/genomemapper-simultaneous-alignment-of-short-reads-against-multiple-genomes</link>
	<title><![CDATA[GenomeMapper: Simultaneous alignment of short reads against multiple genomes]]></title>
	<description><![CDATA[GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. It can be used to align against multiple genomes simulanteously or against a single reference. If you are unsure which one is the appropriate GenomeMapper, you might want to use the latter

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768987/<p>Address of the bookmark: <a href="http://1001genomes.org/software/genomemapper.html" rel="nofollow">http://1001genomes.org/software/genomemapper.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37241/remilo-reference-assisted-misassembly-detection-algorithm-using-short-and-long-reads</guid>
	<pubDate>Fri, 06 Jul 2018 04:27:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37241/remilo-reference-assisted-misassembly-detection-algorithm-using-short-and-long-reads</link>
	<title><![CDATA[ReMILO: reference assisted misassembly detection algorithm using short and long reads.]]></title>
	<description><![CDATA[ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies.<p>Address of the bookmark: <a href="https://github.com/songc001/remilo" rel="nofollow">https://github.com/songc001/remilo</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43260/bioinformatics-tools-for-telomere-to-telomere-assembly</guid>
	<pubDate>Tue, 17 Aug 2021 13:17:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43260/bioinformatics-tools-for-telomere-to-telomere-assembly</link>
	<title><![CDATA[Bioinformatics tools for telomere to telomere assembly !]]></title>
	<description><![CDATA[<p>●&nbsp;<a href="https://github.com/arangrhie/merfin" target="_blank">Merfin</a>&nbsp;&ndash; k-mer-based assembly and variant calling evaluation for improved consensus accuracy (Arang Rhie)<br />●&nbsp;<a href="https://www.biorxiv.org/content/10.1101/2020.11.11.378133v1" target="_blank">PanGenie</a>&nbsp;&ndash; algorithm that leverages a pangenome reference built from haplotype-resolved genome assemblies in conjunction with k-mer count information from raw, short-read sequencing data to genotype a wide spectrum of genetic variation (Tobias Marschall)<br />●&nbsp;<a href="https://github.com/ConesaLab/SQANTI3" target="_blank">SQANTI3</a>&nbsp;&ndash; an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline (Roc&iacute;o Amor&iacute;n de Heged&uuml;s&nbsp;<a href="https://twitter.com/rocioadh" target="_blank">@rocioadh</a>)<br />●&nbsp;<a href="https://github.com/GenomeRIK/tama" target="_blank">tama</a>&nbsp;(Transcriptome Annotation by Modular Algorithms) &ndash; software designed for processing Iso-Seq data and other long-read transcriptome data (Richard Kuo&nbsp;<a href="https://twitter.com/GenomeRIK" target="_blank">@GenomeRIK</a>)<br />●&nbsp;<a href="https://github.com/PacificBiosciences/pbAA" target="_blank">pbaa</a>&nbsp;(PacBio Amplicon Analysis) &ndash; separates complex mixtures of amplicon targets from genomic samples to cluster and generate high-quality consensus sequences from HiFi reads (Zev Kronenberg&nbsp;<a href="https://twitter.com/zevkronenberg" target="_blank">@zevkronenberg</a>)<br />●&nbsp;<a href="https://github.com/yuanyuan929/bellerophon" target="_blank">bellerophon</a>&nbsp;&ndash; analyzes MHC typing and other low-complexity gene amplicon data; performs allele calling while detecting polymorphic sites within the sequences and removing potential chimeric sequence variants (Yuanyuan Cheng&nbsp;<a href="https://twitter.com/Yuanyuan929" target="_blank">@Yuanyuan929</a>)<br />●&nbsp;<a href="https://github.com/amwenger/svpack" target="_blank">svpack</a>&nbsp;&ndash; tools for filtering, comparing, and annotating structural variant (SV) calls in VCF format (Aaron Wenger)<br />●&nbsp;<a href="https://github.com/AntonBankevich/jumboDB" target="_blank">JumboDB</a>&nbsp;&ndash; tool for de Bruijn graph construction (Anton Bankevich&nbsp;<a href="https://twitter.com/AntonBankevich" target="_blank">@AntonBankevich</a>)<br />●&nbsp;<a href="https://github.com/ksahlin/ultra" target="_blank">uLTRA</a>&nbsp;&ndash; tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. (Kristoffer Sahlin&nbsp;<a href="https://twitter.com/krsahlin" target="_blank">@krsahlin</a>)<br />●&nbsp;<a href="https://www.biorxiv.org/content/10.1101/2021.01.25.428044v1.full.pdf" target="_blank">LeafGo</a>&nbsp;&ndash; workflow to rapidly produce high-quality de novo plant genomes (Luca Ermini&nbsp;<a href="https://twitter.com/ermini_luca" target="_blank">@ermini_luca</a>)</p><p>Reference:</p><p>https://www.pacb.com/blog/young-investigators-share-stellar-science-career-advice-and-bioinformatics-tools-at-smrt-leiden-2021/</p><p>&nbsp;</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37223/chopstitch-exon-annotation-and-splice-graph-construction-using-transcriptome-assembly-and-whole-genome-sequencing-data</guid>
	<pubDate>Tue, 03 Jul 2018 04:14:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37223/chopstitch-exon-annotation-and-splice-graph-construction-using-transcriptome-assembly-and-whole-genome-sequencing-data</link>
	<title><![CDATA[ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data]]></title>
	<description><![CDATA[ChopStitch is a new method for finding putative exons and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also detects base substitutions in transcript sequences corresponding to sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are reported as splice graphs in dot output format.<p>Address of the bookmark: <a href="https://github.com/bcgsc/ChopStitch" rel="nofollow">https://github.com/bcgsc/ChopStitch</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</guid>
	<pubDate>Mon, 20 Aug 2018 04:08:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</link>
	<title><![CDATA[FinisherSC:a repeat-aware tool for upgrading de novo assembly using long reads]]></title>
	<description><![CDATA[<p><br>Here is the command to run the tool:</p>
<pre><code>python finisherSC.py destinedFolder mummerPath
</code></pre>
<p>If you are running on server computer and would like to use multiple threads, then the following commands can generate 20 threads to run FinisherSC.</p>
<pre><code>python finisherSC.py -par 20 destinedFolder mummerPath
</code></pre>
<p>Sometimes, if the names of raw reads and contigs consists of special characters/formats, FinisherSC/MUMmer may not parse them correctly. In that case, you want to have a quick renaming of the names of contigs/reads in contigs.fasta or raw_reads.fasta using the following command.</p>
<pre><code>    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' raw_reads.fasta &gt; newRaw_reads.fasta
    cp newRaw_reads.fasta raw_reads.fasta
    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' contigs.fasta &gt; newContigs.fasta
    cp newContigs.fasta contigs.fasta</code></pre><p>Address of the bookmark: <a href="https://github.com/kakitone/finishingTool" rel="nofollow">https://github.com/kakitone/finishingTool</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38413/genobuntu-a-software-package-containing-more-than-70-software-and-packages-oriented-towards-ngs-and-genome-assembly</guid>
	<pubDate>Tue, 11 Dec 2018 05:15:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38413/genobuntu-a-software-package-containing-more-than-70-software-and-packages-oriented-towards-ngs-and-genome-assembly</link>
	<title><![CDATA[Genobuntu: A software package containing more than 70 software and packages oriented towards NGS and genome assembly]]></title>
	<description><![CDATA[<p><span>Genobuntu is a software package containing more than 70 software and packages oriented towards NGS. In its current version, Genobuntu supports pre assembly tools, genome assemblers as well as post assembly tools.&nbsp;</span><br><br><span>Commonly used biological software and example script files for different assembly pipelines have also been provided, where the example script files can be updated to suit one&rsquo;s experimental needs. Genobuntu attempts to reduce the amount of time and energy needed to build software workstations and it can also act as a good teaching source for a class room setting.&nbsp;</span></p>
<p>https://sourceforge.net/projects/genobuntu/</p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/genobuntu/" rel="nofollow">https://sourceforge.net/projects/genobuntu/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38801/genome-assembly-forensics-finding-the-elusive-mis-assembly</guid>
	<pubDate>Sat, 26 Jan 2019 18:02:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38801/genome-assembly-forensics-finding-the-elusive-mis-assembly</link>
	<title><![CDATA[Genome assembly forensics: finding the elusive mis-assembly]]></title>
	<description><![CDATA[<p><span>We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called&nbsp;</span><em>amosvalidate</em><span>. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at&nbsp;</span><a href="http://amos.sourceforge.net/" target="_blank">http://amos.sourceforge.net</a><span>.</span></p>
<p>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2397507/&nbsp;</p>
<p>http://amos.sourceforge.net/wiki/index.php/AMOS</p><p>Address of the bookmark: <a href="http://amos.sourceforge.net/wiki/index.php/AMOS" rel="nofollow">http://amos.sourceforge.net/wiki/index.php/AMOS</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>