<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/37957?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/37957?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38055/ancestral-genomes-a-resource-for-reconstructed-ancestral-genes-and-genomes-across-the-tree-of-life</guid>
	<pubDate>Fri, 02 Nov 2018 08:16:27 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38055/ancestral-genomes-a-resource-for-reconstructed-ancestral-genes-and-genomes-across-the-tree-of-life</link>
	<title><![CDATA[Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life]]></title>
	<description><![CDATA[<p><span>&nbsp;Ancestral Genomes (</span><a href="http://ancestralgenomes.org/" target="">http://ancestralgenomes.org</a><span>) is a resource for comprehensive reconstructions of these &lsquo;fossil genomes&rsquo;. Comprehensive sets of protein-coding genes have been reconstructed for 78 genomes of now-extinct species that were the common ancestors of extant species from across the tree of life.&nbsp;</span></p><p>Address of the bookmark: <a href="http://ancestralgenomes.org/" rel="nofollow">http://ancestralgenomes.org/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35619/tallymer-method-to-compute-k-mer-frequencies-and-its-application-to-annotate-large-repetitive-plant-genomes</guid>
	<pubDate>Thu, 15 Feb 2018 10:21:02 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35619/tallymer-method-to-compute-k-mer-frequencies-and-its-application-to-annotate-large-repetitive-plant-genomes</link>
	<title><![CDATA[Tallymer: method to compute K-mer frequencies and its application to annotate large repetitive plant genomes]]></title>
	<description><![CDATA[<p>Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the&nbsp;<span>k</span>-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set whole genome shotgun sequences from maize (B73) (total size 10<sup>9</sup>&nbsp;bp).&nbsp;<br>Tallymer was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available.</p>
<p>A manual can be found&nbsp;<a href="https://www.zbh.uni-hamburg.de/fileadmin/gi/tallymer/tallymer.pdf" target="_blank" title="tallymer.pdf (111 KB)">here</a>.</p><p>Address of the bookmark: <a href="https://www.zbh.uni-hamburg.de/forschung/arbeitsgruppe-genominformatik/software/tallymer.html" rel="nofollow">https://www.zbh.uni-hamburg.de/forschung/arbeitsgruppe-genominformatik/software/tallymer.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36800/genomemapper-simultaneous-alignment-of-short-reads-against-multiple-genomes</guid>
	<pubDate>Fri, 25 May 2018 09:29:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36800/genomemapper-simultaneous-alignment-of-short-reads-against-multiple-genomes</link>
	<title><![CDATA[GenomeMapper: Simultaneous alignment of short reads against multiple genomes]]></title>
	<description><![CDATA[GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. It can be used to align against multiple genomes simulanteously or against a single reference. If you are unsure which one is the appropriate GenomeMapper, you might want to use the latter

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768987/<p>Address of the bookmark: <a href="http://1001genomes.org/software/genomemapper.html" rel="nofollow">http://1001genomes.org/software/genomemapper.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41592/refka-a-fast-and-efficient-long-read-genome-assembly-approach-for-large-and-complex-genomes</guid>
	<pubDate>Fri, 01 May 2020 03:00:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41592/refka-a-fast-and-efficient-long-read-genome-assembly-approach-for-large-and-complex-genomes</link>
	<title><![CDATA[RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes]]></title>
	<description><![CDATA[<p><span>RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k-mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step.</span></p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/AppliedBioinformatics/RefKA" rel="nofollow">https://github.com/AppliedBioinformatics/RefKA</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36884/halc-high-throughput-algorithm-for-long-read-error-correction</guid>
	<pubDate>Fri, 08 Jun 2018 10:47:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36884/halc-high-throughput-algorithm-for-long-read-error-correction</link>
	<title><![CDATA[HALC: High throughput algorithm for long read error correction]]></title>
	<description><![CDATA[HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach)

HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.<p>Address of the bookmark: <a href="https://github.com/lanl001/halc" rel="nofollow">https://github.com/lanl001/halc</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37643/lorma-a-tool-for-correcting-sequencing-errors-in-long-reads</guid>
	<pubDate>Thu, 06 Sep 2018 16:21:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37643/lorma-a-tool-for-correcting-sequencing-errors-in-long-reads</link>
	<title><![CDATA[LoRMA: A tool for correcting sequencing errors in long reads]]></title>
	<description><![CDATA[<p><span>An error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of&nbsp;</span><em>k</em><span>-mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75&times;, the throughput of the new method is at least 20% higher.</span></p>
<blockquote>
<p><span>conda install -c atgc-montpellier lorma</span></p>
</blockquote><p>Address of the bookmark: <a href="https://gite.lirmm.fr/lorma/lorma-releases/wikis/home" rel="nofollow">https://gite.lirmm.fr/lorma/lorma-releases/wikis/home</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40792/haslr-a-tool-for-rapid-genome-assembly-of-long-sequencing-reads</guid>
	<pubDate>Fri, 31 Jan 2020 05:50:15 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40792/haslr-a-tool-for-rapid-genome-assembly-of-long-sequencing-reads</link>
	<title><![CDATA[HASLR: a tool for rapid genome assembly of long sequencing reads]]></title>
	<description><![CDATA[<p><span>HASLR is a tool for rapid genome assembly of long sequencing reads. HASLR is a hybrid tool which means it requires long reads generated by Third Generation Sequencing technologies (such as PacBio or Oxford Nanopore) together with Next Generation Sequencing reads (such as Illumina) from the same sample.&nbsp;</span></p><p>Address of the bookmark: <a href="https://github.com/vpc-ccg/haslr" rel="nofollow">https://github.com/vpc-ccg/haslr</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35055/jabba-hybrid-error-correction-for-long-sequencing-reads</guid>
	<pubDate>Fri, 05 Jan 2018 03:58:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35055/jabba-hybrid-error-correction-for-long-sequencing-reads</link>
	<title><![CDATA[Jabba: Hybrid Error Correction for Long Sequencing Reads]]></title>
	<description><![CDATA[<p>Jabba is a hybrid error correction tool to correct third generation (PacBio / ONT) sequencing data, using second generation (Illumina) data.</p>
<p>Input</p>
<p>Jabba takes as input a concatenated de Bruijn graph and a set of sequences:</p>
<p>the de Bruijn graph should appear in fasta format with 1 entry per node, the meta information should be in the format:<br>&gt;NODE <br>the set of sequences should be in fasta or fastq format. These sequences will be corrected (e.g. PacBio reads). The corrections will be written to a file Jabba fasta.<br>The output is a file in fasta format with corrections of the long reads, and additionally a file in the input format containing uncorrected reads.</p>
<p>https://github.com/biointec/jabba/wiki</p>
<p>https://almob.biomedcentral.com/articles/10.1186/s13015-016-0075-7</p><p>Address of the bookmark: <a href="https://github.com/biointec/jabba" rel="nofollow">https://github.com/biointec/jabba</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26453/stacks</guid>
	<pubDate>Wed, 24 Feb 2016 15:52:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26453/stacks</link>
	<title><![CDATA[Stacks]]></title>
	<description><![CDATA[<p>Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.</p>
<p>More at http://catchenlab.life.illinois.edu/stacks/</p><p>Address of the bookmark: <a href="http://catchenlab.life.illinois.edu/stacks/" rel="nofollow">http://catchenlab.life.illinois.edu/stacks/</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>

</channel>
</rss>