<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/40208?offset=490</link>
	<atom:link href="https://bioinformaticsonline.com/related/40208?offset=490" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37835/variantbam-filtering-and-profiling-of-next-generational-sequencing-data-using-region-specific-rules</guid>
	<pubDate>Thu, 04 Oct 2018 16:30:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37835/variantbam-filtering-and-profiling-of-next-generational-sequencing-data-using-region-specific-rules</link>
	<title><![CDATA[VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules]]></title>
	<description><![CDATA[<p>VariantBam is a tool to extract/count specific sets of sequencing reads from next-generational sequencing files. To save money, disk space and I/O, one may not want to store an entire BAM on disk. In many cases, it would be more efficient to store only those read-pairs or reads who intersect some region around the variant locations. Alternatively, if your scientific question is focused on only one aspect of the data (e.g. breakpoints), many reads can be removed without losing the information relevant to the problem.</p>
<h5>&nbsp;</h5><p>Address of the bookmark: <a href="https://github.com/broadinstitute/VariantBam" rel="nofollow">https://github.com/broadinstitute/VariantBam</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40544/ngs-bits-short-read-sequencing-tools</guid>
	<pubDate>Thu, 16 Jan 2020 23:14:00 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40544/ngs-bits-short-read-sequencing-tools</link>
	<title><![CDATA[ngs-bits - Short-read sequencing tools]]></title>
	<description><![CDATA[<p>Binaries of&nbsp;<em>ngs-bits</em>&nbsp;are available via Bioconda. Alternatively,&nbsp;<em>ngs-bits</em>&nbsp;can be built from sources:</p>
<ul>
<li><span>Binaries</span>&nbsp;for&nbsp;<a href="https://github.com/imgag/ngs-bits/blob/master/doc/install_bioconda.md">Linux/macOS</a></li>
<li>From&nbsp;<span>sources</span>&nbsp;for&nbsp;<a href="https://github.com/imgag/ngs-bits/blob/master/doc/install_unix.md">Linux/macOS</a></li>
<li>From&nbsp;<span>sources</span>&nbsp;for&nbsp;<a href="https://github.com/imgag/ngs-bits/blob/master/doc/install_win.md">Windows</a></li>
</ul><p>Address of the bookmark: <a href="https://github.com/imgag/ngs-bits" rel="nofollow">https://github.com/imgag/ngs-bits</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/11609/bioinformatician%E2%80%99s-pocket-reference</guid>
	<pubDate>Sun, 08 Jun 2014 09:56:58 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/11609/bioinformatician%E2%80%99s-pocket-reference</link>
	<title><![CDATA[Bioinformatician’s Pocket Reference !!]]></title>
	<description><![CDATA[<p><span>It is amusing how brain of bioinformaticians work! Learning a new programming language for days feels so much of fun that making 5 minute discussion with neighbours (unless under special circumstances!) in our own mother-tongue. Today every bioinformatician keeps more than few languages and core IT toolkits on their plate. It has become mandatory to be able to mould different code snippets to build our own custom workflows, and thus keeping syntax at our fingertips has become essential.Although Google is best way to get syntax problem solved, it is not a bad idea to keep reference sheets is our smartphones or stick out some printed sheets on the back of your door, in the old fashion way!!</span></p><p>Address of the bookmark: <a href="http://infoplatter.wordpress.com/2014/04/06/bioinformaticians-pocket-reference/" rel="nofollow">http://infoplatter.wordpress.com/2014/04/06/bioinformaticians-pocket-reference/</a></p>]]></description>
	<dc:creator>RAJESH DETROJA</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/41230/curated-set-of-ribosomal-rna-rrna-reference-sequences-targeted-loci-with-verifiable-organism</guid>
	<pubDate>Sun, 23 Feb 2020 02:17:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/41230/curated-set-of-ribosomal-rna-rrna-reference-sequences-targeted-loci-with-verifiable-organism</link>
	<title><![CDATA[Curated set of ribosomal RNA (rRNA) reference sequences (targeted loci) with verifiable organism]]></title>
	<description><![CDATA[<p>MCBI have a curated set of ribosomal RNA (rRNA) reference sequences (targeted loci) with verifiable organism sources and current names. This set is critical for correctly identifying and classifying prokaryotic (bacteria and archaea) and fungal samples. To provide easy access to these sequences, we recently added a separate rRNA/ITS databases section on the nucleotide BLAST page for these targeted sequences that makes it convenient to quickly identify source organisms. The new databases are: </p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *16S ribosomal RNA (Bacteria and Archaea)</p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *18S ribosomal RNA sequences (SSU) from Fungi type and reference material&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *28S ribosomal RNA sequences (LSU) from Fungi type and reference material</p><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *Internal transcribed spacer region (ITS) from Fungi type and reference material</p><p>You can also download these from the BLAST db FTP area.&nbsp; See the <a href="https://go.usa.gov/xdEBX" target="_blank">NCBI Insights post</a> for more detail. </p><p>Useful links</p><p>-----------------</p><p><a href="https://go.usa.gov/xdEj5" target="_blank">BLAST form with rRNA/ITS databases</a></p><p><a href="https://ftp.ncbi.nlm.nih.gov/blast/db/" target="_blank">BLAST db download</a></p><p><a href="https://www.ncbi.nlm.nih.gov/refseq/targetedloci/" target="_blank">Targeted loci</a></p><p><span style="color: black;">If you have any questions or concerns, please contact <a href="mailto:blast-help@ncbi.nlm.nih.gov" target="_blank" title="Follow link">blast-help@ncbi.nlm.nih.gov<sup><span style="color: black; text-decoration: none;"><img src="https://mail.google.com/mail/u/0?ui=2&amp;ik=024a8aa0b9&amp;attid=0.1&amp;permmsgid=msg-f:1659255165855446848&amp;th=1706dbc8408bb740&amp;view=fimg&amp;sz=s0-l75-ft&amp;attbid=ANGjdJ_drW2ArYDNLoHrQh36gm6rp2Std8ZUSplCzP6bYQSQYBsQfZ_85vOujXOdTRdaLxrR7QeEBVUbyACPBJHhFUeIglX8G7Ew7TcclzhvO7fJhiz7sIdkkDgZ7QA&amp;disp=emb" alt="https://jira.ncbi.nlm.nih.gov/images/icons/mail_small.gif" width="13" height="12" style="border: 0px;"></span></sup></a></span></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36015/repeat-aware-repeat-aware-scaffolding-evaluation-framework-by-igor-mandric</guid>
	<pubDate>Wed, 21 Mar 2018 18:10:00 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36015/repeat-aware-repeat-aware-scaffolding-evaluation-framework-by-igor-mandric</link>
	<title><![CDATA[repeat-aware: Repeat aware scaffolding evaluation framework by Igor Mandric]]></title>
	<description><![CDATA[<p>Genome scaffolding is a classical challenging problem in bioinformatics. It refers to joining assembly contigs into chains (called scaffolds). The join between two contigs A and B is considered correct if:</p>
<ul>
<li>Their relative orientation is correct</li>
<li>Their relative order is correct</li>
<li>The gap estimate is similar to the true distance on the reference</li>
</ul>
<p>The problem of scaffolding validation is also a challenging one. One of the main issues which hinders from an adequate scaffolding evaluation are genome repeats. The previous standard for evaluation&nbsp;<a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r42">(Hunt et al.,&nbsp;<em>Genome Biology</em>, 2014)</a>&nbsp;did not take into account repeats. In this evaluation framework, repeats are taken into account.</p>
<p style="text-align: center;"><a href="https://camo.githubusercontent.com/9675b90205e5bc0dc0b6b84b321b00bc87d8d88e/687474703a2f2f616c616e2e63732e6773752e6564752f7265706561742d61776172652f6669677572652e706e67" target="_blank"><img src="https://camo.githubusercontent.com/9675b90205e5bc0dc0b6b84b321b00bc87d8d88e/687474703a2f2f616c616e2e63732e6773752e6564752f7265706561742d61776172652f6669677572652e706e67" width="75%" alt="image" style="border: 0px;"></a></p>
<p>The new evaluation framework considers the optimal assignment of contigs in the output scaffolding to contigs in the reference scaffolding in the sense of the number of correct links.</p>
<p>&nbsp;</p>
<p>https://github.com/mandricigor/repeat-aware</p><p>Address of the bookmark: <a href="https://github.com/mandricigor/repeat-aware" rel="nofollow">https://github.com/mandricigor/repeat-aware</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34246/unicycler-hybrid-assembly-pipeline-for-bacterial-genomes</guid>
	<pubDate>Fri, 10 Nov 2017 03:58:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34246/unicycler-hybrid-assembly-pipeline-for-bacterial-genomes</link>
	<title><![CDATA[Unicycler: Hybrid assembly pipeline for bacterial genomes]]></title>
	<description><![CDATA[<p><span>Unicycler is an assembly pipeline for bacterial genomes. It can assemble&nbsp;</span><a href="http://www.illumina.com/">Illumina</a><span>-only read sets where it functions as a&nbsp;</span><a href="http://cab.spbu.ru/software/spades/">SPAdes</a><span>-optimiser. It can also assembly long-read-only sets (</span><a href="http://www.pacb.com/">PacBio</a><span>&nbsp;or&nbsp;</span><a href="https://nanoporetech.com/">Nanopore</a><span>) where it runs a&nbsp;</span><a href="https://github.com/lh3/miniasm">miniasm</a><span>+</span><a href="https://github.com/isovic/racon">Racon</a><span>&nbsp;pipeline. For the best possible assemblies, give it both Illumina reads&nbsp;</span><em>and</em><span>&nbsp;long reads, and it will conduct a hybrid assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/rrwick/Unicycler" rel="nofollow">https://github.com/rrwick/Unicycler</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36621/hapcut2-robust-and-accurate-haplotype-assembly-for-diverse-sequencing-technologies</guid>
	<pubDate>Tue, 15 May 2018 07:35:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36621/hapcut2-robust-and-accurate-haplotype-assembly-for-diverse-sequencing-technologies</link>
	<title><![CDATA[HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies]]></title>
	<description><![CDATA[HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads, designed to "just work" with excellent speed and accuracy. We found that previously described haplotype assembly methods are specialized for specific read technologies or protocols, with slow or inaccurate performance on others. With this in mind, HapCUT2 is designed for speed and accuracy across diverse sequencing technologies, including but not limited to:

NGS short reads (Illumina HiSeq)
clone-based sequencing (Fosmid or BAC clones)
SMRT reads (PacBio)
Oxford Nanopore reads
10X Genomics Linked-Reads
proximity-ligation (Hi-C) reads
high-coverage sequencing (&gt;40x coverage-per-SNP) using above technologies
combinations of the above technologies (e.g. scaffold long reads with Hi-C reads)
See below for specific examples of command line options and best practices for some of these technologies.

NOTE: At this time HapCUT2 is for diploid organisms only. VCF input should contain diploid variants.

If you use HapCUT2 in your research, please cite:

Edge, P., Bafna, V. &amp; Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. gr.213462.116 (2016). doi:10.1101/gr.213462.116<p>Address of the bookmark: <a href="https://github.com/vibansal/HapCUT2" rel="nofollow">https://github.com/vibansal/HapCUT2</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37291/transrate-understanding-your-transcriptome-assembly</guid>
	<pubDate>Fri, 13 Jul 2018 07:49:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37291/transrate-understanding-your-transcriptome-assembly</link>
	<title><![CDATA[transrate: Understanding your transcriptome assembly]]></title>
	<description><![CDATA[<p><span>Transrate is software for&nbsp;</span><em>de-novo</em><span>&nbsp;transcriptome assembly quality analysis. It examines your assembly in detail and compares it to experimental evidence such as the sequencing reads, reporting quality scores for contigs and assemblies. This allows you to choose between assemblers and parameters, filter out the bad contigs from an assembly, and help decide when to stop trying to improve the assembly.</span></p><p>Address of the bookmark: <a href="http://hibberdlab.com/transrate/index.html" rel="nofollow">http://hibberdlab.com/transrate/index.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38755/svaba-genome-wide-detection-of-structural-variants-and-indels-by-local-assembly</guid>
	<pubDate>Mon, 21 Jan 2019 17:58:56 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38755/svaba-genome-wide-detection-of-structural-variants-and-indels-by-local-assembly</link>
	<title><![CDATA[SvABA: Genome-wide detection of structural variants and indels by local assembly]]></title>
	<description><![CDATA[<p><span>SvABA is a method for detecting structural variants in sequencing data using genome-wide local assembly. Under the hood, SvABA uses a custom implementation of&nbsp;</span><a href="https://github.com/jts/sga">SGA</a><span>&nbsp;(String Graph Assembler) by Jared Simpson, and&nbsp;</span><a href="https://github.com/lh3/bwa">BWA-MEM</a><span>&nbsp;by Heng Li. Contigs are assembled for every 25kb window (with some small overlap) for every region in the genome. The default is to use only clipped, discordant, unmapped and indel reads, although this can be customized to any set of reads at the command line using&nbsp;</span><a href="https://github.com/walaj/VariantBam">VariantBam</a><span>&nbsp;rules. These contigs are then immediately aligned to the reference with BWA-MEM and parsed to identify variants. Sequencing reads are then realigned to the contigs with BWA-MEM, and variants are scored by their read support.</span></p><p>Address of the bookmark: <a href="https://github.com/walaj/svaba" rel="nofollow">https://github.com/walaj/svaba</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

</channel>
</rss>