<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41831?offset=350</link>
	<atom:link href="https://bioinformaticsonline.com/related/41831?offset=350" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42188/tools-and-method-for-haplotype-phasing</guid>
	<pubDate>Fri, 04 Sep 2020 20:41:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42188/tools-and-method-for-haplotype-phasing</link>
	<title><![CDATA[Tools and Method for Haplotype phasing !]]></title>
	<description><![CDATA[<div>Huge amounts of genotype data are being produced with recent technological advances, both from increasingly&nbsp; comprehensive and inexpensive genome-wide SNP microarrays and from ever more accessible whole-genome and whole-exome sequencing methods. The vast amount of knowledge contained in these results, however, is best&nbsp; exploited through phased haplotypes, which classify the alleles co-located on the same chromosome. Since sequence and SNP array data normally take the form of unphased genotypes, one does not specifically observe which of the two parental chromosomes, or haplotypes, falls on a specific allele. Fortunately, new advances in both computational and laboratory methods promise improved determination of haplotype phase. Following are useful tools :</div><div>&nbsp;</div><p><strong>Arlequin:</strong>&nbsp;<a href="http://cmpg.unibe.ch/software/arlequin3/" target="_blank">http://cmpg.unibe.ch/software/arlequin3/</a></p><p><strong>BEAGLE:</strong>&nbsp;<a href="http://faculty.washington.edu/browning/beagle/beagle.html" target="_blank">http://faculty.washington.edu/browning/beagle/beagle.html</a></p><p><strong>fastPHASE:</strong>&nbsp;<a href="http://stephenslab.uchicago.edu/software.html" target="_blank">http://stephenslab.uchicago.edu/software.html</a></p><p><strong>GENEHUNTER:</strong>&nbsp;<a href="http://linkage.rockefeller.edu/soft/gh/" target="_blank">http://linkage.rockefeller.edu/soft/gh/</a></p><p><strong>The Genome Analysis Toolkit:</strong></p><p><a href="http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit" target="_blank">http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit</a></p><p><strong>IMPUTE2:</strong>&nbsp;<a href="https://mathgen.stats.ox.ac.uk/impute/impute_v2.html" target="_blank">https://mathgen.stats.ox.ac.uk/impute/impute_v2.html</a></p><p><strong>MACH:</strong>&nbsp;<a href="http://www.sph.umich.edu/csg/abecasis/MACH/" target="_blank">http://www.sph.umich.edu/csg/abecasis/MACH/</a></p><p><strong>MERLIN:</strong>&nbsp;<a href="http://www.sph.umich.edu/csg/abecasis/Merlin/" target="_blank">http://www.sph.umich.edu/csg/abecasis/Merlin/</a></p><p><strong>PHASE:</strong>&nbsp;<a href="http://stephenslab.uchicago.edu/software.html" target="_blank">http://stephenslab.uchicago.edu/software.html</a></p><p><strong>PL-EM:</strong>&nbsp;<a href="http://www.people.fas.harvard.edu/~junliu/plem/" target="_blank">http://www.people.fas.harvard.edu/~junliu/plem/</a></p><p><strong>&ldquo;Read-backed phasing&rdquo; algorithm</strong>:&nbsp;<a href="http://www.broadinstitute.org/gsa/wiki/index.php/Read-backed_phasing_algorithm" target="_blank">http://www.broadinstitute.org/gsa/wiki/index.php/Read-backed_phasing_algorithm</a></p><p><strong>SHAPE-IT:</strong>&nbsp;<a href="http://www.griv.org/shapeit/" target="_blank">http://www.griv.org/shapeit/</a></p>]]></description>
	<dc:creator>Manisha Mishra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26972/understanding-fastqc-output</guid>
	<pubDate>Fri, 15 Apr 2016 05:47:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26972/understanding-fastqc-output</link>
	<title><![CDATA[Understanding Fastqc Output]]></title>
	<description><![CDATA[<p>Understanding Following table and graphs</p>
<ol>
<li>Duplication level</li>
<li>kmer profile</li>
<li>per base GC content</li>
<li>per base N content</li>
<li>per base quality</li>
<li>per base sequence content</li>
<li>per sequence GC content</li>
<li>per sequence quality</li>
<li>sequence length distribution</li>
</ol>
<p>More at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/</p><p>Address of the bookmark: <a href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/" rel="nofollow">http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41046/iseqqc-a-tool-for-expression-based-quality-control-in-rna-sequencing</guid>
	<pubDate>Sun, 16 Feb 2020 08:47:17 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41046/iseqqc-a-tool-for-expression-based-quality-control-in-rna-sequencing</link>
	<title><![CDATA[iSeqQC: a tool for expression-based quality control in RNA sequencing]]></title>
	<description><![CDATA[<p><span>iSeqQC, an expression-based QC tool that detects outliers either produced due to variable laboratory conditions or due to dissimilarity within a phenotypic group. iSeqQC implements various statistical approaches including unsupervised clustering, agglomerative hierarchical clustering and correlation coefficients to provide insight into outliers.</span></p>
<p><a href="http://cancerwebpa.jefferson.edu/iSeqQC/">http://cancerwebpa.jefferson.edu/iSeqQC/</a></p>
<p><a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3399-8">https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3399-8</a></p><p>Address of the bookmark: <a href="https://github.com/gkumar09/iSeqQC" rel="nofollow">https://github.com/gkumar09/iSeqQC</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</guid>
	<pubDate>Thu, 16 Jun 2016 17:55:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</link>
	<title><![CDATA[CNIDARIA: fast, reference-free phylogenomic clustering]]></title>
	<description><![CDATA[<p>Motivation: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but these do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.</p>
<p>Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on ge-nome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at supra-species level and 78% accuracy for species level.</p>
<p>Availability and Implementation: Cnidaria is written in C++ and Python and is available at http://www.ab.wur.nl/cnidaria.</p>
<p>Contact: Saulo Aflitos - sauloal@gmail.com</p>
<p>Supplementary information: Supplementary data are available at Bioinformatics online.</p><p>Address of the bookmark: <a href="https://github.com/sauloal/cnidaria/wiki" rel="nofollow">https://github.com/sauloal/cnidaria/wiki</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39837/cactus-a-reference-free-whole-genome-multiple-alignment-program</guid>
	<pubDate>Mon, 12 Aug 2019 07:52:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39837/cactus-a-reference-free-whole-genome-multiple-alignment-program</link>
	<title><![CDATA[Cactus: a reference-free whole-genome multiple alignment program]]></title>
	<description><![CDATA[<p>Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here:&nbsp;<a href="https://doi.org/10.1101/gr.123356.111">https://doi.org/10.1101/gr.123356.111</a></p>
<p><span>Cactus uses substantial resources. For primate-sized genomes (3 gigabases each), you should expect Cactus to use approximately 120 CPU-days of compute per genome, with about 120 GB of RAM used at peak. The requirements scale roughly quadratically, so aligning two 1-megabase bacterial genomes takes only 1.5 CPU-hours and 14 GB RAM.</span>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/ComparativeGenomicsToolkit/cactus" rel="nofollow">https://github.com/ComparativeGenomicsToolkit/cactus</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33003/surankco-supervised-ranking-of-contigs-in-de-novo-assemblies</guid>
	<pubDate>Wed, 24 May 2017 04:46:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33003/surankco-supervised-ranking-of-contigs-in-de-novo-assemblies</link>
	<title><![CDATA[SuRankCo: supervised ranking of contigs in de novo assemblies]]></title>
	<description><![CDATA[<p><span>SuRankCo is a machine learning based software to score and rank contigs from de novo assemblies of next generation sequencing data. It trains with alignments of contigs with known reference genomes and predicts scores and ranking for contigs which have no related reference genome yet.</span></p>
<p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0644-7</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/surankco/" rel="nofollow">https://sourceforge.net/projects/surankco/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40359/minipolish-a-tool-for-racon-polishing-of-miniasm-assemblies</guid>
	<pubDate>Tue, 03 Dec 2019 02:40:54 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40359/minipolish-a-tool-for-racon-polishing-of-miniasm-assemblies</link>
	<title><![CDATA[Minipolish: A tool for Racon polishing of miniasm assemblies]]></title>
	<description><![CDATA[<p><a href="https://github.com/lh3/miniasm">Miniasm</a>&nbsp;is a great long-read assembly tool: straight-forward, effective and very fast. However, it does not include a polishing step, so its assemblies have a high error rate &ndash; they are essentially made of stitched-together pieces of long reads.</p>
<p><a href="https://github.com/isovic/racon">Racon</a>&nbsp;is a great polishing tool that can be used to clean up assembly errors. It's also very fast and well suited for long-read data. However, it operates on FASTA files, not the&nbsp;<a href="https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md">GFA graphs</a>&nbsp;that miniasm makes.</p>
<p>That's where Minipolish comes in. With a single command, it will use Racon to polish up a miniasm assembly, while keeping the assembly in graph form.</p>
<p>It also takes care of some of the other nuances of polishing a miniasm assembly:</p>
<ul>
<li>Adding read depth information to contigs</li>
<li>Fixing sequence truncation that can occur in Racon</li>
<li>Adding circularising links to circular contigs if not already present (so they display better in&nbsp;<a href="https://github.com/rrwick/Bandage">Bandage</a>)</li>
<li>'Rotating' circular contigs between polishing rounds to ensure clean circularisation</li>
</ul><p>Address of the bookmark: <a href="https://github.com/rrwick/Minipolish" rel="nofollow">https://github.com/rrwick/Minipolish</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43888/syri-compares-alignments-between-two-chromosome-level-assemblies-and-identifies-synteny-and-structural-rearrangements</guid>
	<pubDate>Wed, 01 Jun 2022 02:01:13 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43888/syri-compares-alignments-between-two-chromosome-level-assemblies-and-identifies-synteny-and-structural-rearrangements</link>
	<title><![CDATA[Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.]]></title>
	<description><![CDATA[<p><span>Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.</span></p>
<p><span><img src="https://github.com/schneebergerlab/syri/raw/master/example/ampril_col0_chr3_6600000_10000000.png" alt="image" style="border: 0px;"></span></p><p>Address of the bookmark: <a href="https://github.com/schneebergerlab/syri" rel="nofollow">https://github.com/schneebergerlab/syri</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/4590/tigers-genome-sequenced</guid>
	<pubDate>Tue, 17 Sep 2013 16:48:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/4590/tigers-genome-sequenced</link>
	<title><![CDATA[Tigers genome sequenced]]></title>
	<description><![CDATA[<p>Fifteen scientists led by Dr Jong Bhak of Genome Research Foundation, South Korea, decoded as many as 3 billion nucleotides (organic molecules that form the basic building blocks of nucleic acids, such as DNA). They identified 20,000 genes related to various functions of the tiger.&nbsp;</p><p>The biggest and perhaps most fearsome of the world's big cats, the tiger, shares 95.6 percent of its DNA with humans' cute and furry companions, domestic cats.</p><p>The new research showed that big cats have genetic mutations that enabled them to be carnivores. The team also identified mutations that allow snow leopards to thrive at high altitudes.</p><p>Reference:</p><p><a href="http://www.nbcnews.com/science/your-cat-ferocious-tigers-share-lot-95-6-percent-their-4B11182690">http://www.nbcnews.com/science/your-cat-ferocious-tigers-share-lot-95-6-percent-their-4B11182690</a></p><p><a href="http://timesofindia.indiatimes.com/home/environment/flora-fauna/Gene-mapping-of-tiger-completed/articleshow/22671681.cms">http://timesofindia.indiatimes.com/home/environment/flora-fauna/Gene-mapping-of-tiger-completed/articleshow/22671681.cms</a></p><p>Paper:</p><p><a href="http://www.nature.com/ncomms/2013/130917/ncomms3433/full/ncomms3433.html">http://www.nature.com/ncomms/2013/130917/ncomms3433/full/ncomms3433.html</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>

</channel>
</rss>