<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36525?offset=270</link>
	<atom:link href="https://bioinformaticsonline.com/related/36525?offset=270" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35057/ectools-long-read-correction-and-other-correction-tools</guid>
	<pubDate>Fri, 05 Jan 2018 04:02:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35057/ectools-long-read-correction-and-other-correction-tools</link>
	<title><![CDATA[ECTOOLS: Long Read Correction and other Correction tools]]></title>
	<description><![CDATA[<p>Long Read Correction and other Correction tools</p>
<p>This package is a loose collection of scripts. To run the correction<br>routine see the section below. Descriptions of the other scripts<br>are at the bottom of this file.</p>
<p>Contact: gurtowsk@cshl.edu</p>
<p>In short, the correction algorithm takes as input the unitigs from a short read assembly and uses them to correct long read data. More background information for the algorithm can be found:<br>http://schatzlab.cshl.edu/presentations/2013-06-18.PBUserMeeting.pdf</p><p>Address of the bookmark: <a href="https://github.com/jgurtowski/ectools" rel="nofollow">https://github.com/jgurtowski/ectools</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</guid>
	<pubDate>Fri, 01 Jun 2018 08:07:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</link>
	<title><![CDATA[Gap filling or Contigs extensions tools !]]></title>
	<description><![CDATA[
<p>There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.</p>

<p>Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software</p>

<p>You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).</p>

<p>See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/ </p>

<p>In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41 </p>

<p>Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS</p>

<p>Here there is a useful explanation of several tools.</p>

<p>https://bioinformaticsonline.com/search?q=scaffolding&amp;entity_type=object&amp;entity_subtype=bookmarks&amp;offset=0&amp;search_type=entities</p>

<p>I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:</p>

<p>1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.</p>

<p>2. Repeat:<br />a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.<br />b. Identify unpaired reads that align extending these new paired-end reads.<br />c. Construct a new assembly C' from C and the new reads identified in (a) and (b).<br />d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.<br />e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.</p>

<p>3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.</p>

<p>4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.</p>
]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38743/molinspiration-broad-range-of-cheminformatics-software-tools-supporting-molecule-manipulation</guid>
	<pubDate>Sun, 20 Jan 2019 05:32:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38743/molinspiration-broad-range-of-cheminformatics-software-tools-supporting-molecule-manipulation</link>
	<title><![CDATA[molinspiration: broad range of cheminformatics software tools supporting molecule manipulation]]></title>
	<description><![CDATA[<p><span>Molinspiration offers&nbsp;</span><a href="https://www.molinspiration.com/products.html">broad range of cheminformatics software tools</a><span>&nbsp;supporting molecule manipulation and processing, including SMILES and SDfile conversion, normalization of molecules, generation of tautomers, molecule fragmentation, calculation of various molecular properties needed in QSAR, molecular modelling and drug design, high quality molecule depiction, molecular database tools supporting substructure and similarity searches. Our products support also fragment-based virtual screening, bioactivity prediction and data visualization. Molinspiration tools are written in Java, therefore can be used practically on any computer platform.</span></p><p>Address of the bookmark: <a href="https://www.molinspiration.com/" rel="nofollow">https://www.molinspiration.com/</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41996/wgd%E2%80%94simple-command-line-tools-for-the-analysis-of-ancient-whole-genome-duplications</guid>
	<pubDate>Thu, 23 Jul 2020 05:49:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41996/wgd%E2%80%94simple-command-line-tools-for-the-analysis-of-ancient-whole-genome-duplications</link>
	<title><![CDATA[wgd—simple command line tools for the analysis of ancient whole-genome duplications]]></title>
	<description><![CDATA[<p><span>wgd is a easy to use command-line tool for<span>&nbsp;</span></span><em>K</em><sub>S</sub><span><span>&nbsp;</span>distribution construction named wgd. The wgd suite provides commonly used<span>&nbsp;</span></span><em>K</em><sub>S</sub><span><span>&nbsp;</span>and colinearity analysis workflows together with tools for modeling and visualization, rendering these analyses accessible to genomics researchers in a convenient manner.</span></p>
<p><a href="https://academic.oup.com/bioinformatics/article/35/12/2153/5162749">https://academic.oup.com/bioinformatics/article/35/12/2153/5162749</a></p><p>Address of the bookmark: <a href="https://github.com/arzwa/wgd" rel="nofollow">https://github.com/arzwa/wgd</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43084/frequently-used-bioinformatics-tools-for-viral-genome-analysis</guid>
	<pubDate>Wed, 23 Jun 2021 07:40:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43084/frequently-used-bioinformatics-tools-for-viral-genome-analysis</link>
	<title><![CDATA[Frequently used bioinformatics tools for viral genome analysis !]]></title>
	<description><![CDATA[<p><strong>IVA: accurate de novo assembly of RNA virus genomes.</strong><br /> Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P, Otto TD.<br /> Bioinformatics. 2015 Jul 15;31(14):2374-6. doi: <a href="http://bioinformatics.oxfordjournals.org/content/31/14/2374.long">10.1093/bioinformatics/btv120</a>. Epub 2015 Feb 28.</p><p><a href="http://www.nature.com/nmeth/journal/v9/n1/full/nmeth.1814.html"><strong>Adapter sequences</strong></a>:<br /> <strong>Optimal enzymes for amplifying sequencing libraries.</strong><br /> Quail, M. a et al. Nat. Methods 9, 10-1 (2012).</p><p><a href="http://genome.cshlp.org/content/early/2012/01/12/gr.131383.111"><strong>GAGE</strong></a>:<br /> <strong>GAGE: A critical evaluation of genome assemblies and assembly algorithms.</strong><br /> Salzberg, S. L. et al. Genome Res. 22, 557-67 (2012).</p><p><a href="http://www.biomedcentral.com/1471-2105/14/160"><strong>KMC</strong></a>:<br /> <strong>Disk-based k-mer counting on a PC.</strong><br /> Deorowicz, S., Debudaj-Grabysz, A. &amp; Grabowski, S. BMC Bioinformatics 14, 160 (2013).</p><p><a href="http://genomebiology.com/2014/15/3/R46"><strong>Kraken</strong></a>:<br /> <strong>Kraken: ultrafast metagenomic sequence classification using exact alignments.</strong><br /> Wood, D. E. &amp; Salzberg, S. L. Genome Biol. 15, R46 (2014).</p><p><a href="http://genomebiology.com/2004/5/2/r12"><strong>MUMmer</strong></a>:<br /> <strong>Versatile and open software for comparing large genomes.</strong><br /> Kurtz, S. et al. Genome Biol. 5, R12 (2004).</p><p><strong>R</strong>:<br /> <strong>R: A language and environment for statistical computing.</strong><br /> R Core Team (2013). R Foundation for Statistical Computing, Vienna, Austria. URL <a href="http://www.R-project.org/">http://www.R-project.org/</a>.</p><p><a href="http://nar.oxfordjournals.org/content/39/9/e57"><strong>RATT</strong></a>:<br /> <strong>RATT: Rapid Annotation Transfer Tool.</strong><br /> Otto, T. D., Dillon, G. P., Degrave, W. S. &amp; Berriman, M. Nucleic Acids Res. 39, e57 (2011).</p><p><a href="http://bioinformatics.oxfordjournals.org/content/25/16/2078.abstract"><strong>SAMtools</strong></a>:<br /> <strong>The Sequence Alignment/Map format and SAMtools.</strong><br /> Li, H. et al. Bioinformatics 25, 2078-9 (2009).</p><p><a href="http://bioinformatics.oxfordjournals.org/content/early/2014/04/12/bioinformatics.btu170"><strong>Trimmomatic</strong></a>:<br /> <strong>Trimmomatic: A flexible trimmer for Illumina Sequence Data.</strong><br /> Bolger, A. M., Lohse, M. &amp; Usadel, B. Bioinformatics 1-7 (2014).</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44551/bioinformatic-tools-for-pathogens-informatics-at-cvr</guid>
	<pubDate>Sat, 08 Jun 2024 15:59:46 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44551/bioinformatic-tools-for-pathogens-informatics-at-cvr</link>
	<title><![CDATA[Bioinformatic tools for pathogens informatics at CVR]]></title>
	<description><![CDATA[<div><div><div><div><div><p>Novel sequencing and analytical approaches focused on studying viruses and virus-host interactions. Below you will find summaries and links to a number of bioinformatic tools that have been developed @ CVR.</p></div><div><h3><a href="http://giffordlabcvr.github.io/DIGS-tool/" target="_blank" title="DIGS">DIGS</a></h3></div><div><p>The database-integrated genome-screening (DIGS) tool provides a framework for implementing automated in silico screening of sequence databases using BLAST in combination with a relational database (MySQL).</p></div><div><h3><a href="https://bioinformatics.cvr.ac.uk/software/discvr/" target="" title="DisCVR">DisCVR</a></h3></div><div><p>DisCVR is a Diagnostic tool for detecting known human viruses in clinical samples from Next-Generation Sequencing (NGS) data. The tool uses a simple and straightforward Graphical User Interface and is optimized on Windows OS without compromising speed and accuracy.</p></div><div><h3><a href="http://josephhughes.github.io/DiversiTools/" target="_blank" title="DiversiTools">DiversiTools</a></h3></div><div><p>DiversiTools is a computational tool that is specifically tailored towards viral HTS data sets and the analysis of the underlying viral populations that they represent. It was initially developed in collaboration with a number of virologists interested in characterising the intra-host diversity of viral populations and studying their evolution across transmission chains at the micro-evolutionary scale.</p></div><div><h3><a href="http://glue-tools.cvr.gla.ac.uk/" target="_blank" title="GLUE">GLUE</a></h3></div><div><p>GLUE is a flexible data-centric bioinformatics environment for virus sequence data, with a focus on virus evolution and genomic variation. GLUE has been applied to a range of viruses. A GLUE-based resource focused on Hepatitis C virus is HCV-GLUE.</p></div><div><h3><a href="https://bioinformatics.cvr.ac.uk/tanoti/" target="_blank" title="Tanoti">Tanoti</a></h3></div><div><p>Tanoti is a BLAST guided reference based short read aligner. It is developed for maximising alignment in highly variable next generation sequence data sets (Illumina).</p></div><div><h3><a href="https://bioinformatics.cvr.ac.uk/victree/" target="_blank" title="VicTREE">ViCTree</a></h3></div><div><p>ViCTree is a bioinformatic framework that automatically selects new candidate virus sequences from GenBank, generates multiple sequence alignments, calculates a maximum likelihood phylogeny and integrates the sequences into the existing phylogenetic trees.&nbsp;<span>For more information click&nbsp;</span><a href="https://bioinformatics.cvr.ac.uk/victree_web/" target="_blank">here</a>.</p></div></div></div></div></div><div><div><div><div><div><h3><a href="https://bioinformatics.cvr.ac.uk/software/viral-host-predictor/" target="" title="Viral Host Predictor">Viral Host Predictor</a></h3></div><div><p>Viral Host Predictor provides a fast and simple way to predict the hosts and vectors of RNA viruses from viral sequences.</p></div><div><h3><a href="https://github.com/salvocamiolo/GRACy/releases/tag/v0.4.4" target="_blank" title="GRACy">GRACy</a></h3></div><div><p>GRACy is a bioinformatic tool designed for the analysis of Illumina data originated from Human cytomegalovirus samples. GRACy can be used to perform read quality filtering, genotyping, de novo assembly, variant detection, annotation and data submission to public database.</p></div><div><h3><a href="https://github.com/salvocamiolo/LoReTTA/releases/tag/v0.1" target="_blank" title="LoReTTA">LoReTTA</a></h3></div><div><p>LoReTTA (Long Read Template Targeted Assembler) is a reference assisted de novo assembler specifically designed to deal with PacBio reads generated from viral genomes.&nbsp;</p></div><div><h3><a href="https://bioinformatics.cvr.ac.uk/software/bingleseq/" target="" title="BingleSeq">BingleSeq</a></h3></div><div><p>BingleSeq is a R-package enables the user-friendly analysis of count tables obtained by both Bulk RNA-Seq and single-cell RNA-Seq protocols. The development of BingleSeq focused on providing a flexible and intuitive user experience.</p></div></div></div></div></div>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/43954/elgg-installation-steps</guid>
	<pubDate>Wed, 07 Sep 2022 00:43:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/43954/elgg-installation-steps</link>
	<title><![CDATA[Elgg Installation steps !]]></title>
	<description><![CDATA[<p>Elgg is an open source social networking engine that allows the creation of social environments such as campus social networks and internal collaborative platforms for organizations. Elgg offers a number of social networking features including microblogging, messaging, file-sharing and groups. This tutorial will guide you through the process of installing Elgg on a Ubuntu 18.04 VPS.</p><h2 id="Prerequisites">Prerequisites</h2><ul>
<li>A fresh Vultr Cloud Compute instance with Ubuntu 18.04 and root access.</li>
</ul><h2 id="Step_1__Install_Apache__MySQL__and_PHP">Step 1: Install Apache, MySQL, and PHP</h2><p>Elgg requires MySQL, PHP, and a web server. Before you can install Elgg, you will need to install the Apache web server, MySQL, and PHP.</p><p>Update the repository list.</p><pre><code>apt-get update
</code></pre><p>Install the Apache web server.</p><pre><code>apt-get install apache2 -y
</code></pre><p>Install MySQL.</p><pre><code>apt-get install mysql-server -y
</code></pre><p>Complete the MySQL installation by executing the following command.</p><pre><code>/usr/bin/mysql_secure_installation
</code></pre><p>During the installation, you will be asked to enter a root password. Enter a secure password. This will be the MySQL root password.</p><pre><code>Would you like to setup VALIDATE PASSWORD plugin? [Y/N] N
New password: password
Re-enter new password: password
Remove anonymous users? [Y/N] Y
Disallow root login remotely? [Y/N] Y
Remove test database and access to it? [Y/N] Y
Reload privilege tables now? [Y/N] Y
</code></pre><p>Install PHP 7.2, as well as the PHP modules required by Elgg.</p><pre><code>apt-get install php7.2 libapache2-mod-php7.2 php7.2-common php7.2-sqlite3 php7.2-curl php7.2-intl php7.2-mbstring php7.2-xmlrpc php7.2-mysql php7.2-gd php7.2-xml php7.2-cli php7.2-zip -y
</code></pre><h2 id="Step_2__Create_a_MySQL_database_for_Elgg">Step 2: Create a MySQL database for Elgg</h2><p>Elgg will require a MySQL database. Log into the MySQL console.</p><pre><code>mysql -u root -p
</code></pre><p>When prompted for a password, enter the MySQL root password you set in step 1. Once you are logged in to the MySQL console, create a new database.</p><pre><code>CREATE DATABASE elgg;
</code></pre><p>Create a new MySQL user and grant it privileges to the newly created database. You can replace&nbsp;<code>username</code>&nbsp;and&nbsp;<code>password</code>&nbsp;with the username and password of your choice.</p><pre><code>GRANT ALL PRIVILEGES on elgg.* to 'username'@'localhost' identified by 'password';
FLUSH PRIVILEGES;
</code></pre><p>Exit the MySQL console.</p><pre><code>exit
</code></pre><h2 id="Step_3__Download_and_Install_Elgg">Step 3: Download and Install Elgg</h2><p>Download the latest version of Elgg.</p><pre><code>cd /var/www/html
rm -r index.html
wget https://elgg.org/download/elgg-2.3.7.zip
</code></pre><p>Unzip the downloaded archive and move the files to the root of the Apache web server.</p><pre><code>apt install unzip
unzip elgg-2.3.7.zip
mv ./elgg-2.3.7/* . &amp;&amp; rm elgg-2.3.7.zip &amp;&amp; rm -r elgg-2.3.7
</code></pre><p>Create a data directory for Elgg.</p><pre><code>sudo mkdir -p /var/www/html/data
</code></pre><p>Set the appropriate file permissions.</p><pre><code>sudo chown -R www-data:www-data /var/www/html/
sudo chmod -R 755 /var/www/html/
</code></pre><h2 id="Step_4__Configure_Apache_for_Elgg">Step 4: Configure Apache for Elgg</h2><p>Elgg requires the Apache rewrite module. Enable the Apache rewrite module.</p><pre><code>sudo a2enmod rewrite
</code></pre><p>Create an Apache configuration file for the Elgg installation.</p><pre><code>sudo nano /etc/apache2/sites-available/elgg.conf
</code></pre><p>Paste the following snippet to the file, replacing&nbsp;<code>example.com</code>&nbsp;with your own domain name.</p><pre><code>&lt;VirtualHost *:80&gt;
     DocumentRoot /var/www/html/
     ServerName example.com
     &lt;Directory /var/www/html/&gt;
          Options FollowSymlinks
          AllowOverride All
          Require all granted
     &lt;/Directory&gt;
     ErrorLog ${APACHE_LOG_DIR}/error.log
     CustomLog ${APACHE_LOG_DIR}/access.log combined
&lt;/VirtualHost&gt;
</code></pre><p>Enable the configuration and restart the Apache server.</p><pre><code> sudo a2ensite elgg.conf
 sudo systemctl restart apache2.service</code></pre>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</guid>
	<pubDate>Mon, 10 Jul 2017 05:56:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</link>
	<title><![CDATA[Omega2: metagenome assembly pipeline]]></title>
	<description><![CDATA[<p><span>Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.</span></p><p>Address of the bookmark: <a href="http://omega.omicsbio.org/" rel="nofollow">http://omega.omicsbio.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>