<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41501?offset=60</link>
	<atom:link href="https://bioinformaticsonline.com/related/41501?offset=60" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37737/rebaler-program-for-conducting-reference-based-assemblies-using-long-reads</guid>
	<pubDate>Tue, 18 Sep 2018 07:52:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37737/rebaler-program-for-conducting-reference-based-assemblies-using-long-reads</link>
	<title><![CDATA[Rebaler: program for conducting reference-based assemblies using long reads.]]></title>
	<description><![CDATA[<p>Rebaler is a program for conducting reference-based assemblies using long reads. It relies mainly on&nbsp;<a href="https://github.com/lh3/minimap2">minimap2</a>&nbsp;for alignment and&nbsp;<a href="https://github.com/isovic/racon">Racon</a>&nbsp;for making consensus sequences.</p>
<p>I made Rebaler for bacterial genomes (specifically for the task of&nbsp;<a href="https://github.com/rrwick/Basecalling-comparison">testing basecallers</a>). It should in principle work for non-bacterial genomes as well, but I haven't tested it.</p><p>Address of the bookmark: <a href="https://github.com/rrwick/Rebaler" rel="nofollow">https://github.com/rrwick/Rebaler</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42946/aligngraph2-similar-genome-assisted-reassembly-pipeline-for-pacbio-long-reads</guid>
	<pubDate>Sun, 14 Mar 2021 09:42:47 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42946/aligngraph2-similar-genome-assisted-reassembly-pipeline-for-pacbio-long-reads</link>
	<title><![CDATA[AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads]]></title>
	<description><![CDATA[<p><span>AlignGraph2 is the second version of&nbsp;</span><a href="https://github.com/baoe/AlignGraph">AlignGraph</a><span>&nbsp;for PacBio long reads. It extends and refines contigs assembled from the long reads with a published genome similar to the sequencing genome.</span></p>
<p><span>More at&nbsp;https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbab022/6146772</span></p><p>Address of the bookmark: <a href="https://github.com/huangs001/AlignGraph2" rel="nofollow">https://github.com/huangs001/AlignGraph2</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32129/lordec-a-hybrid-error-correction-program-for-long-pacbio-reads</guid>
	<pubDate>Mon, 10 Apr 2017 04:16:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32129/lordec-a-hybrid-error-correction-program-for-long-pacbio-reads</link>
	<title><![CDATA[LoRDEC: a hybrid error correction program for long, PacBio reads]]></title>
	<description><![CDATA[<p>LoRDEC is a program to correct sequencing errors in long reads from 3rd generation sequencing with high error rate, and is especially intended for PacBio reads. It uses a hybrid strategy, meaning that it uses two sets of reads: the reference read set, whose error rate is assumed to be small, and the PacBio read set, which is then corrected using the reference set. Typically, the reference set contains Illumina reads.</p>
<p><br> Usually, errors in PacBio reads include many insertions and deletions, and comparatively less substitutions. LoRDEC can correct errors of all these types.<br> After correction, a larger portion of the sequence of PacBio reads is usable for detection of region of similarity with other sequences, for aligning them to the contigs of an assembly, etc.</p>
<p>Why is LoRDEC different?</p>
<ul>
<li>It is efficient and can process large read data sets, included from eukaryotic or vertebrate species, on a usual computing server, and even works on desktop/laptop computers.</li>
<li>It adopts a novel graph based approach: it builds a succinct De Bruijn Graph (DBG) representing the short reads, and seeks a corrective sequence for each erroneous region of a long read by traversing chosen paths in the graph.</li>
</ul><p>Address of the bookmark: <a href="http://www.atgc-montpellier.fr/lordec/" rel="nofollow">http://www.atgc-montpellier.fr/lordec/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40460/sviper-swipe-your-structural-variants-called-on-long-ontpacbio-reads-with-short-exact-illumina-reads</guid>
	<pubDate>Sun, 22 Dec 2019 03:48:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40460/sviper-swipe-your-structural-variants-called-on-long-ontpacbio-reads-with-short-exact-illumina-reads</link>
	<title><![CDATA[SViper: Swipe your Structural Variants called on long (ONT/PacBio) reads with short exact (Illumina) reads.]]></title>
	<description><![CDATA[<p>Call sviper</p>
<pre><code>~$ ./sviper -s short-reads.bam -l long-reads.bam -r ref.fa -c variants.vcf -o polished_variants
</code></pre>
<p>This will output a&nbsp;<code>polished_variants.vcf</code>&nbsp;file, that contains all the refined variants.</p>
<p>Sometimes it is helpful to look at the polished sequence, e.g. with the IGV browser. In that case you want SViper to output the polished and aligned sequences in a bam file via the option&nbsp;<code>--output-polished-bam</code>:</p>
<pre><code>~$ ./sviper -s short-reads.bam -l long-reads.bam -r ref.fa -c variants.vcf -o polished_variants --output-</code>polished-bam</pre><p>Address of the bookmark: <a href="https://github.com/smehringer/SViper" rel="nofollow">https://github.com/smehringer/SViper</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37840/long-read-assembly-workshop</guid>
	<pubDate>Thu, 04 Oct 2018 17:23:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37840/long-read-assembly-workshop</link>
	<title><![CDATA[Long read assembly workshop !]]></title>
	<description><![CDATA[<p>This is a tutorial for a workshop on long-read (PacBio) genome assembly.</p>
<p>It demonstrates how to use long PacBio sequencing reads to assemble a bacterial genome, and includes additional steps for circularising, trimming, finding plasmids, and correcting the assembly with short-read Illumina data.</p>
<p>&nbsp;Please comment if you know any other long read addembly tutorial.</p><p>Address of the bookmark: <a href="http://sepsis-omics.github.io/tutorials/modules/cmdline_assembly_v2/" rel="nofollow">http://sepsis-omics.github.io/tutorials/modules/cmdline_assembly_v2/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40994/biological-databases</guid>
	<pubDate>Wed, 12 Feb 2020 01:16:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40994/biological-databases</link>
	<title><![CDATA[Biological databases !]]></title>
	<description><![CDATA[<p>Now a days there are a lots of genomics databases available around the world. This bookmark is created to provide all links in one place ...</p>
<p>ftp://ftp.ncbi.nih.gov/genomes/</p>
<p>https://hgdownload.soe.ucsc.edu/downloads.html</p><p>Address of the bookmark: <a href="ftp://ftp.ncbi.nih.gov/genomes/" rel="nofollow">ftp://ftp.ncbi.nih.gov/genomes/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</guid>
	<pubDate>Mon, 31 Jan 2022 07:18:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/43728/short-read-assembly-using-spades</link>
	<title><![CDATA[Short-read assembly using Spades !]]></title>
	<description><![CDATA[<h2 id="short-read-assembly-a-comparison">If we only had Illumina reads, we could also assemble these using the tool Spades.</h2><p>You can try this here, or try it later on your own data.</p><h2 id="get-data">Get data</h2><p>We will use the same Illumina data as we used above:</p><ul>
<li>illumina_R1.fastq.gz: the Illumina forward reads</li>
<li>illumina_R2.fastq.gz: the Illumina reverse reads</li>
</ul><h2 id="assemble">Assemble</h2><p>Run Spades:</p><div><pre>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o spades_assembly_all_illumina
</pre></div><ul>
<li><code>-1</code>&nbsp;is input file of forward reads</li>
<li><code>-2</code>&nbsp;is input file of reverse reads</li>
<li><code>--careful</code>&nbsp;minimizes mismatches and short indels</li>
<li><code>--cov-cutoff auto</code>&nbsp;computes the coverage threshold (rather than the default setting, &ldquo;off&rdquo;)</li>
<li><code>-o</code>&nbsp;is the output directory</li>
</ul><h2 id="results">Results</h2><p>Move into the output directory and look at the contigs:</p><div><pre>infoseq contigs.fasta</pre></div>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</guid>
	<pubDate>Tue, 26 Apr 2016 03:38:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</link>
	<title><![CDATA[ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies]]></title>
	<description><![CDATA[<p>Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.</p>
<p>More at&nbsp;http://www.ncbi.nlm.nih.gov/pubmed/23303509</p><p>Address of the bookmark: <a href="http://sc932.github.io/ALE/about.html" rel="nofollow">http://sc932.github.io/ALE/about.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27438/hagfish-assess-an-assembly-through-creative-use-of-coverage-plots</guid>
	<pubDate>Fri, 20 May 2016 19:08:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27438/hagfish-assess-an-assembly-through-creative-use-of-coverage-plots</link>
	<title><![CDATA[Hagfish - assess an assembly through creative use of coverage plots]]></title>
	<description><![CDATA[<p>Hagfish is a tool that is to be used in data analysis of Next Generation Sequencing (NGS) experiments. Hagfish builds on the concept of coverage plots and aims to assist (amongst others) in quality control of&nbsp;<em style="font-size: 12.8px;">de novo</em>&nbsp;genome assembly or identification of structural variation in a genome re-sequencing experiment.</p>
<p>Hagfish requires a reference sequence and a&nbsp;<span>paired end</span>&nbsp;re-sequencing data set. Hagfish has more power the larger the insert size of the paired end library is.</p>
<p>Quick links:&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Install">Installation</a>,<a href="https://github.com/mfiers/hagfish/wiki/Operation">Operation</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/ReadMappers">Read mappers</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Scripts">Hagfish scripts</a>,&nbsp;<a href="https://github.com/mfiers/hagfish/wiki/Plots">Hagfish plots</a></p><p>Address of the bookmark: <a href="https://github.com/mfiers/hagfish" rel="nofollow">https://github.com/mfiers/hagfish</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>

</channel>
</rss>