<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36867?offset=280</link>
	<atom:link href="https://bioinformaticsonline.com/related/36867?offset=280" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39104/hipstr-haplotype-inference-and-phasing-for-short-tandem-repeats</guid>
	<pubDate>Thu, 07 Mar 2019 21:13:06 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39104/hipstr-haplotype-inference-and-phasing-for-short-tandem-repeats</link>
	<title><![CDATA[HipSTR: Haplotype inference and phasing for Short Tandem Repeats]]></title>
	<description><![CDATA[<p><span>HipSTR</span>&nbsp;was specifically developed to deal with these errors in the hopes of obtaining more robust STR genotypes. In particular, it accomplishes this by:</p>
<ol>
<li>Learning locus-specific PCR stutter models using an&nbsp;<a href="http://en.wikipedia.org/wiki/Expectation-maximization_algorithm">EM algorithm</a></li>
<li>Mining candidate STR alleles from population-scale sequencing data</li>
<li>Employing a specialized hidden Markov model to align reads to candidate alleles while accounting for STR artifacts</li>
<li>Utilizing phased SNP haplotypes to genotype and phase STRs</li>
</ol><p>Address of the bookmark: <a href="https://github.com/tfwillems/HipSTR" rel="nofollow">https://github.com/tfwillems/HipSTR</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34398/ont-assembly-and-illumina-polishing-pipeline</guid>
	<pubDate>Thu, 23 Nov 2017 10:13:42 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34398/ont-assembly-and-illumina-polishing-pipeline</link>
	<title><![CDATA[ONT assembly and Illumina polishing pipeline]]></title>
	<description><![CDATA[<p>This pipeline performs the following steps:</p>
<ul>
<li>Assembly of nanopore reads using&nbsp;<a href="http://canu.readthedocs.io/">Canu</a>.</li>
<li>Polish canu contigs using&nbsp;<a href="https://github.com/isovic/racon">racon</a>&nbsp;(<em>optional</em>).</li>
<li>Map a paired-end Illumina dataset onto the contigs obtained in the previous steps using&nbsp;<a href="http://bio-bwa.sourceforge.net/">BWA</a>&nbsp;mem.</li>
<li>Perform correction of contigs using&nbsp;<a href="https://github.com/broadinstitute/pilon/wiki">pilon</a>&nbsp;and the Illumina dataset.</li>
</ul><p>Address of the bookmark: <a href="https://github.com/nanoporetech/ont-assembly-polish" rel="nofollow">https://github.com/nanoporetech/ont-assembly-polish</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34501/dnapipete-de-novo-assembly-annotation-pipeline-for-transposable-elements</guid>
	<pubDate>Sat, 02 Dec 2017 18:25:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34501/dnapipete-de-novo-assembly-annotation-pipeline-for-transposable-elements</link>
	<title><![CDATA[dnaPipeTE: de-novo assembly &amp; annotation Pipeline for Transposable Elements]]></title>
	<description><![CDATA[<p>dnaPipeTE (for de-novo assembly &amp; annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (&lt; 1X).</p>
<ul>
<li>
<p>dnaPipeTE is developped by Cl&eacute;ment Goubert, Laurent Modolo and the TREEP team of the LBBE:&nbsp;<a href="http://lbbe.univ-lyon1.fr/-Equipe-Elements-transposables-.html?lang=en">http://lbbe.univ-lyon1.fr/-Equipe-Elements-transposables-.html?lang=en</a></p>
</li>
<li>
<p>You can find the original publication in GBE here:&nbsp;<a href="https://academic.oup.com/gbe/article/7/4/1192/533768">https://academic.oup.com/gbe/article/7/4/1192/533768</a></p>
</li>
</ul>
<p><a href="https://github.com/clemgoub/dnaPipeTE/blob/dev/dnaPipefront.png" target="_blank"><img src="https://github.com/clemgoub/dnaPipeTE/raw/dev/dnaPipefront.png" alt="Front" style="border: 0px;"></a><em>output examples of quantification and TE landscape (relative age) produced by dnaPipeTE</em></p>
<p><em>&nbsp;</em></p><p>Address of the bookmark: <a href="https://github.com/clemgoub/dnaPipeTE" rel="nofollow">https://github.com/clemgoub/dnaPipeTE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34914/ra-assembler-a-de-novo-dna-assembler-for-third-generation-sequencing-data</guid>
	<pubDate>Wed, 27 Dec 2017 20:36:54 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34914/ra-assembler-a-de-novo-dna-assembler-for-third-generation-sequencing-data</link>
	<title><![CDATA[Ra assembler - a de novo DNA assembler for third generation sequencing data]]></title>
	<description><![CDATA[<p>Integration of the Ra assembler - a de novo DNA assembler for third generation sequencing data developed on Faculty of Electrical Engineering and Computing (FER), Ruder Boskovic Institute (RBI) and Genome Institute of Singapore (GIS).</p>
<p>Ra is in development since 2014 in the form of several separate components that used to be run individually.<br>This project aims to ease the usage of Ra by integrating it into a complete de novo assembly tool.</p>
<p>Unlike other state-of-the-art assemblers,&nbsp;<span>Ra does not have an error correction step.</span>&nbsp;Instead, it relies on detecting overlaps using a very sensitive and specific overlapper ("graphmap -w owler",&nbsp;<a href="https://github.com/isovic/graphmap">https://github.com/isovic/graphmap</a>) and constructing and reducing an overlap graph (Ra layout,&nbsp;<a href="https://github.com/mariokostelac/ra">https://github.com/mariokostelac/ra</a>).</p><p>Address of the bookmark: <a href="https://github.com/mariokostelac/ra-integrate/" rel="nofollow">https://github.com/mariokostelac/ra-integrate/</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36257/aligngraph-algorithm-for-secondary-de-novo-genome-assembly-guided-by-closely-related-references</guid>
	<pubDate>Tue, 17 Apr 2018 16:21:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36257/aligngraph-algorithm-for-secondary-de-novo-genome-assembly-guided-by-closely-related-references</link>
	<title><![CDATA[AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references]]></title>
	<description><![CDATA[<p>AlignGraph is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.</p>
<p>Using AlignGraph</p>
<pre><code>AlignGraph --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --coverage coverage --part p --fastMap --ratioCheck --iterativeMap --misassemblyRemoval --resume]</code></pre>
<h3>&nbsp;</h3><p>Address of the bookmark: <a href="https://github.com/baoe/AlignGraph" rel="nofollow">https://github.com/baoe/AlignGraph</a></p>]]></description>
	<dc:creator>Manisha Mishra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37221/asplice-a-scalable-and-memory-efficient-algorithm-for-de-novo-transcriptome-assembly</guid>
	<pubDate>Tue, 03 Jul 2018 04:09:46 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37221/asplice-a-scalable-and-memory-efficient-algorithm-for-de-novo-transcriptome-assembly</link>
	<title><![CDATA[ASplice: a scalable and memory-efficient algorithm for de novo transcriptome assembly]]></title>
	<description><![CDATA[With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries.

Texas A&amp;M University researchers develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory.

Availability – A software program that implements the algorithm is available at: http://faculty.cse.tamu.edu/shsze/asplice.

Sze SH, Pimsler ML, Tomberlin JK, Jones CD, Tarone AM. (2017) A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. BMC Genomics 18(Suppl 4):387.<p>Address of the bookmark: <a href="http://faculty.cse.tamu.edu/shsze/asplice/" rel="nofollow">http://faculty.cse.tamu.edu/shsze/asplice/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</guid>
	<pubDate>Thu, 25 Oct 2018 10:46:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</link>
	<title><![CDATA[QUAST-LG: Versatile genome assembly evaluation]]></title>
	<description><![CDATA[<p>QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.</p>
<h4>AVAILABILITY AND IMPLEMENTATION:</h4>
<p>http://cab.spbu.ru/software/quast-lg</p><p>Address of the bookmark: <a href="http://cab.spbu.ru/software/quast-lg/" rel="nofollow">http://cab.spbu.ru/software/quast-lg/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38212/megahit-an-ultra-fast-single-node-solution-for-large-and-complex-metagenomics-assembly-via-succinct-de-bruijn-graph</guid>
	<pubDate>Wed, 14 Nov 2018 04:50:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38212/megahit-an-ultra-fast-single-node-solution-for-large-and-complex-metagenomics-assembly-via-succinct-de-bruijn-graph</link>
	<title><![CDATA[MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph]]></title>
	<description><![CDATA[<p><span>MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct&nbsp;</span><em>de Bruijn</em><span>&nbsp;graph (SdBG) to achieve low memory assembly. MEGAHIT can&nbsp;</span><span>optionally</span><span>&nbsp;utilize a CUDA-enabled GPU to accelerate its SdBG contstruction. The GPU-accelerated version of MEGAHIT has been tested on NVIDIA GTX680 (4G memory) and Tesla K40c (12G memory) with CUDA 5.5, 6.0 and 6.5. MEGAHIT v1.0 or greater also supports IBM Power PC and has been tested on IBM POWER8.</span></p>
<p><span>https://academic.oup.com/bioinformatics/article/31/10/1674/177884</span></p><p>Address of the bookmark: <a href="https://github.com/voutcn/megahit" rel="nofollow">https://github.com/voutcn/megahit</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/38618/canu-genome-assembly-parameters</guid>
	<pubDate>Mon, 07 Jan 2019 08:40:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/38618/canu-genome-assembly-parameters</link>
	<title><![CDATA[CANU genome assembly parameters !]]></title>
	<description><![CDATA[<p>Choose the appropriate parameters to run Canu and run it. The assembly will take about an hour. You can use two cores (parameter&nbsp;<code>-maxThreads=2</code>) and you would like to disable cluster option, since we compute on a single Amazon server set off the option to compute on cluster&nbsp;<code>useGrid=false</code>. This specifications should be for your project discussed with a local computing guru. The parameters that are in square brackets&nbsp;<code>[]</code>&nbsp;are optional, symbol&nbsp;<code>|</code>&nbsp;stands for "or".</p><pre><code>usage:   canu [-correct | -trim | -assemble | -trim-assemble] \
              [-s ] \
               -p  \
               -d  \
               genomeSize=[g|m|k] \
               -maxThreads=2 \
               useGrid=false \
              [other-options] \
               read_file.fastq.gz
</code></pre><p>A default&nbsp;<code>Canu</code>&nbsp;run produces usually high quality assembly, example of a command that was used for testing can be found below. However, there are still a lot of parameters that are possible to tweak. For example if we desire to assemble haplotypes separately of if we want to smash them together, we can alternate the error correction process.</p><pre><code>canu -p test_asmbl \
     -d asm_test3 \
     genomeSize=2m \
     -maxThreads=2 useGrid=false \
     -pacbio-raw \ ~/pacbio/dna/sample_reads.fastq.gz</code></pre><p>There is a brilliant&nbsp;<a href="http://canu.readthedocs.io/en/latest/faq.html#what-parameters-can-i-tweak">section in documentation</a>&nbsp;about parameter tweaking.</p><p>The output directory contains will contain many files. The most interesting ones are:</p><ul>
<li><code>*.correctedReads.fasta.gz</code>&nbsp;: file containing the input sequences after correction, trim and split based on consensus evidence.</li>
<li><code>*.trimmedReads.fastq</code>&nbsp;: file containing the sequences after correction and final trimming</li>
<li><code>*.layout</code>&nbsp;: file containing informations about read inclusion in the final assembly</li>
<li><code>*.gfa</code>&nbsp;: file containing the assembly graph by Canu</li>
<li><code>*.contigs.fasta</code>&nbsp;: file containing everything that could be assembled and is part of the primary assembly</li>
</ul><p>The basic stats of assembly can be read from reports generated by the assembler, or calculated using standard UNIX command line tools.</p><p>More at&nbsp;https://canu.readthedocs.io/en/latest/faq.html</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>