<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/40940?offset=230</link>
	<atom:link href="https://bioinformaticsonline.com/related/40940?offset=230" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32129/lordec-a-hybrid-error-correction-program-for-long-pacbio-reads</guid>
	<pubDate>Mon, 10 Apr 2017 04:16:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32129/lordec-a-hybrid-error-correction-program-for-long-pacbio-reads</link>
	<title><![CDATA[LoRDEC: a hybrid error correction program for long, PacBio reads]]></title>
	<description><![CDATA[<p>LoRDEC is a program to correct sequencing errors in long reads from 3rd generation sequencing with high error rate, and is especially intended for PacBio reads. It uses a hybrid strategy, meaning that it uses two sets of reads: the reference read set, whose error rate is assumed to be small, and the PacBio read set, which is then corrected using the reference set. Typically, the reference set contains Illumina reads.</p>
<p><br> Usually, errors in PacBio reads include many insertions and deletions, and comparatively less substitutions. LoRDEC can correct errors of all these types.<br> After correction, a larger portion of the sequence of PacBio reads is usable for detection of region of similarity with other sequences, for aligning them to the contigs of an assembly, etc.</p>
<p>Why is LoRDEC different?</p>
<ul>
<li>It is efficient and can process large read data sets, included from eukaryotic or vertebrate species, on a usual computing server, and even works on desktop/laptop computers.</li>
<li>It adopts a novel graph based approach: it builds a succinct De Bruijn Graph (DBG) representing the short reads, and seeks a corrective sequence for each erroneous region of a long read by traversing chosen paths in the graph.</li>
</ul><p>Address of the bookmark: <a href="http://www.atgc-montpellier.fr/lordec/" rel="nofollow">http://www.atgc-montpellier.fr/lordec/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40893/quorum-an-error-corrector-for-illumina-reads</guid>
	<pubDate>Tue, 04 Feb 2020 23:26:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40893/quorum-an-error-corrector-for-illumina-reads</link>
	<title><![CDATA[QuorUM: An Error Corrector for Illumina Reads]]></title>
	<description><![CDATA[<p><span>We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core)</span></p><p>Address of the bookmark: <a href="http://www.genome.umd.edu/" rel="nofollow">http://www.genome.umd.edu/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31205/yasra-reference-based-assembler</guid>
	<pubDate>Wed, 01 Mar 2017 08:32:45 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31205/yasra-reference-based-assembler</link>
	<title><![CDATA[YASRA: Reference based assembler]]></title>
	<description><![CDATA[<p>YASRA (Yet Another Short Read Assembler) performs comparative assembly of short reads using a reference genome, which can differ substantially from the genome being sequenced. Mapping reads to reference genomes makes use of LASTZ (Harris et al), a pairwise sequence aligner compatible with BLASTZ. Special scoring sets were derived to improve the performance, both in runtime and quality for 454 and Illumina sequence reads.</p>
<p>YASRA uses LASTZ (<a href="http://bx.psu.edu/miller_lab">http://bx.psu.edu/miller_lab</a> for released version and <a href="http://www.bx.psu.edu/%7Ersharris/lastz/newer">http://www.bx.psu.edu/~rsharris/lastz/newer</a> for newer version) for aligning the sequences to the reference genome. Please install LASTZ (the newest version on <a href="http://www.bx.psu.edu/%7Ersharris/lastz/newer">http://www.bx.psu.edu/~rsharris/lastz/newer</a>) and add the LASTZ binary in your executable/binary search path before installing YASRA.</p><p>Address of the bookmark: <a href="https://github.com/aakrosh/YASRA" rel="nofollow">https://github.com/aakrosh/YASRA</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</guid>
	<pubDate>Tue, 18 Feb 2020 07:43:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</link>
	<title><![CDATA[Carefully opt for human reference genome]]></title>
	<description><![CDATA[<p><a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" target="_blank">Heng Li posted several issues with the human reference genomes given in these resources</a> and suggests the following compressed FASTA file to be used as hg38/GRCh38 human reference genome.</p>
<p>if you map reads to GRCh38 or hg38, use the following:</p>
<div>
<div>
<pre><code>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
</code></pre>
</div>
</div>
<p>There are several other versions of GRCh37/GRCh38. What&rsquo;s wrong with them? Here are a collection of potential issues:</p>
<p>More at http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</p><p>Address of the bookmark: <a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" rel="nofollow">http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27333/satsuma-highly-sensitive-whole-genome-synteny-alignments</guid>
	<pubDate>Fri, 13 May 2016 05:25:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27333/satsuma-highly-sensitive-whole-genome-synteny-alignments</link>
	<title><![CDATA[SATSUMA : Highly sensitive whole-genome synteny alignments.]]></title>
	<description><![CDATA[<p>Satsuma is a whole-genome synteny alignment program. It takes two genomes, computes alignments, and then keeps only the parts that are orthologous, i.e. following the conserved order and orientation of features, such as protein coding genes, non-coding genes, or neutral sequences. Satsuma does not require any pre-processing, such as repeat masking, since it will automatically detect ambiguous mappings.<br> <br> Satsuma has parallelization built-in and is designed to run on multi-core architectures. The run-time for aligning two bird-size genomes (~1.2 Gb) is around two days on 24 CPUs. <br> <br> You can find the manual <a href="http://satsuma.sourceforge.net/manual.html">here</a>.<br> Download the latest source code from <a href="https://sourceforge.net/projects/satsuma/">here.</a><br> Stable versions can also be downloaded from the <a href="https://www.broadinstitute.org/science/programs/genome-biology/spines">Broad Institute's</a> web site.<br> <br> An incomplete list of questions and answers (yes, these have really been asked by our users! Please feel free to add your own by e-mailing us) is <a href="http://satsuma.sourceforge.net/faq.html">here</a>.<br> <br> If you use Satsuma in your research, please cite:<br> <a href="http://bioinformatics.oxfordjournals.org/content/26/9/1145.long">Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alf&ouml;ldi, J., Di Palma, F., &amp; Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics, 26(9), 1145-51</a>.</p>
<p><strong>Tutorial at http://evomics.org/learning/genomics/satsuma/</strong></p><p>Address of the bookmark: <a href="http://satsuma.sourceforge.net/" rel="nofollow">http://satsuma.sourceforge.net/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28119/kraken-ultrafast-metagenomic-sequence-classification-using-exact-alignments</guid>
	<pubDate>Mon, 27 Jun 2016 11:01:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28119/kraken-ultrafast-metagenomic-sequence-classification-using-exact-alignments</link>
	<title><![CDATA[Kraken: ultrafast metagenomic sequence classification using exact alignments]]></title>
	<description><![CDATA[<p>Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of <em>k</em>-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at <a href="http://ccb.jhu.edu/software/kraken/" target="pmc_ext">http://ccb.jhu.edu/software/kraken/</a>.</p>
<p>Krona</p>
<p>https://sourceforge.net/p/krona/home/krona/</p><p>Address of the bookmark: <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/" rel="nofollow">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053813/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33479/novelseq-novel-sequence-insertion-detection</guid>
	<pubDate>Fri, 09 Jun 2017 04:31:30 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33479/novelseq-novel-sequence-insertion-detection</link>
	<title><![CDATA[NovelSeq: Novel Sequence Insertion Detection]]></title>
	<description><![CDATA[<p><span>The NovelSeq framework is designed to detect novel sequence insertions using high throughput paired-end whole genome sequencing data.</span></p>
<p>http://novelseq.sourceforge.net/Home</p>
<p>Paper at&nbsp;https://www.ncbi.nlm.nih.gov/pubmed/20385726</p><p>Address of the bookmark: <a href="http://novelseq.sourceforge.net/Home" rel="nofollow">http://novelseq.sourceforge.net/Home</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37257/asar-advanced-metagenomic-sequence-analysis-in-r</guid>
	<pubDate>Mon, 09 Jul 2018 05:20:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37257/asar-advanced-metagenomic-sequence-analysis-in-r</link>
	<title><![CDATA[ASAR: Advanced metagenomic Sequence Analysis in R]]></title>
	<description><![CDATA[<p><span>An interactive data analysis tool for selection, aggregation and visualization of metagenomic data is presented. Functional analysis with a SEED hierarchy and pathway diagram based on KEGG orthology based upon MG-RAST annotation results is available.</span></p>
<p><span><span>To read the manual, please click the link&nbsp;</span><a href="https://askarbek-orakov.github.io/ASAR/">https://askarbek-orakov.github.io/ASAR/</a></span></p><p>Address of the bookmark: <a href="https://github.com/Askarbek-orakov/ASAR" rel="nofollow">https://github.com/Askarbek-orakov/ASAR</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41991/sequence-ontology-bioinformatics-analysis-soba-tool-to-provide-a-simple-statistical-and-graphical-summary-of-an-annotated-genome</guid>
	<pubDate>Wed, 22 Jul 2020 10:11:13 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41991/sequence-ontology-bioinformatics-analysis-soba-tool-to-provide-a-simple-statistical-and-graphical-summary-of-an-annotated-genome</link>
	<title><![CDATA[Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome]]></title>
	<description><![CDATA[<p><span>We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.</span></p>
<p><span>More at <a href="https://pubmed.ncbi.nlm.nih.gov/20494974/">https://pubmed.ncbi.nlm.nih.gov/20494974/</a></span></p><p>Address of the bookmark: <a href="http://www.sequenceontology.org/cgi-bin/soba.cgi" rel="nofollow">http://www.sequenceontology.org/cgi-bin/soba.cgi</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>