<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/30557?offset=120</link>
	<atom:link href="https://bioinformaticsonline.com/related/30557?offset=120" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</guid>
	<pubDate>Fri, 19 Oct 2018 07:25:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</link>
	<title><![CDATA[BASE: a practical de novo assembler for large genomes using long NGS reads]]></title>
	<description><![CDATA[<p><span>new&nbsp;</span><em>de novo</em><span>&nbsp;assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.</span></p><p>Address of the bookmark: <a href="https://github.com/dhlbh/BASE" rel="nofollow">https://github.com/dhlbh/BASE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/11399/next-generation-sequencing-in-r-or-bioconductor-environment</guid>
	<pubDate>Mon, 02 Jun 2014 18:03:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/11399/next-generation-sequencing-in-r-or-bioconductor-environment</link>
	<title><![CDATA[Next generation sequencing in R or bioconductor environment]]></title>
	<description><![CDATA[<p>There are many R software and bioconductor packages for NGS data analysis, some of them are as follows</p><h3><a name="TOC-Biostrings" id="TOC-Biostrings"></a>Biostrings</h3><p>The Biostrings package from Bioconductor provides an advanced environment for efficient sequence management and analysis in R. It contains many speed and memory effective string containers, string matching algorithms, and other utilities, for fast manipulation of large sets of biological sequences. The objects and functions provided by Biostrings form the basis for many other sequence analysis packages. <a href="http://bioconductor.org/packages/release/bioc/html/Biostrings.html">Documentation</a></p><div><div style="text-align: left;"><div style="color: #000000;"><h4><a name="TOC-IRanges-Overview" id="TOC-IRanges-Overview"></a>IRanges Overview</h4><p>IRanges provides the low-level infrastructure and containers for handling sets of integer ranges within Bioconductor's BioC-Seq domain. Its classes and methods provide support for many more high-level packages like GenomicRanges, ShortRead, Rsamtools, etc. <a href="http://bioconductor.org/packages/release/bioc/html/IRanges.html">Documentation</a></p><div style="text-align: right;"><div style="text-align: left;"><h4><a name="TOC-GenomicRanges-Overview" id="TOC-GenomicRanges-Overview"></a>GenomicRanges Overview</h4><p>The <em>GenomicRanges</em> package serves as the foundation for representing genomic locations within the Bioconductor project. It is built upon the <em>IRanges</em> infrastructure and defines three major data containers - <em>GRanges, GRangesList</em> and <em>GappedAlignments</em> - which are supporting other important BioC-Seq packages including <em>ShortRead, Rsamtools, rtracklayer, GenomicFeatures</em> and <em>BSgenome</em>.&nbsp; Compared to the IRanges container, the GRanges/<em>GRangesList</em> classes are more flexible and extensible to store additional information about sequence ranges, such as chromosome identifiers (sequence space), strand information and annotation data. <a href="http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html">Documentation</a></p></div></div></div></div><h3><a name="TOC-Motif-Discovery" id="TOC-Motif-Discovery"></a>Motif Discovery</h3><h4><a name="TOC-cosmo" id="TOC-cosmo"></a>cosmo</h4><p>The cosmo package allows to search a set of unaligned DNA sequences for a shared motif that may function as transcription factor binding site. The algorithm extends the popular motif discovery tool MEME (Bailey and Elkan, 1995) in that it allows the search to be supervised by specifying a set of constraints that the motif to be discovered must satisfy. <a href="http://bioconductor.org/packages/release/bioc/html/cosmo.html">Documentation</a></p></div><div>
<p><span></span><span></span></p>
<div style="color: #0000ff;"><h4><a name="TOC-BCRANK" id="TOC-BCRANK"></a>BCRANK</h4><p>BCRANK is a method that takes a ranked list of genomic regions as input and outputs short DNA sequences that are overrepresented in some part of the list. The algorithm was developed for detecting transcription factor (TF) binding sites in a large number of enriched regions from high-throughput ChIP-chip or ChIP-seq experiments, but it can be applied to any ranked list of DNA sequences. Documentation</p>
<p><a href="http://bioconductor.org/packages/release/bioc/html/BCRANK.html"></a></p>
<p>rGADEM: <a href="http://bioconductor.org/packages/devel/bioc/html/rGADEM.html">Documentation</a></p><p>MotIV: <a href="http://bioconductor.org/packages/devel/bioc/html/MotIV.html">Documentation</a></p></div><h3><a name="TOC-ShortRead" id="TOC-ShortRead"></a>ShortRead</h3><p>The ShortRead package provides input, quality control, filtering, parsing, and manipulation functionality for short read sequences produced by high throughput sequencing technologies. While support is provided for many sequencing technologies, this package is primairly focused on Solexa/Illumina reads. <a href="http://bioconductor.org/packages/release/bioc/html/ShortRead.html">Documentation</a></p><h3><a name="TOC-Rsamtools" id="TOC-Rsamtools"></a>Rsamtools</h3><p>Rsamtools provides functions for parsing and inspecting samtools BAM formatted binary alignment data. SAM/BAM is quickly becoming a universal standard alignment format, and is now supported by a wide variety of alignment tools. <a href="http://bioconductor.org/help/bioc-views/2.7/bioc/html/Rsamtools.html">Documentation</a></p>
<p><a href="http://samtools.sourceforge.net/">Samtools Website</a><br /> <a href="http://bio-bwa.sourceforge.net/">BWA (Burrows-Wheeler Alignment) Website</a><br /><span style="color: #0000ff;"></span></p>
<div style="color: #000000;">&nbsp;</div></div><div>
<p><span style="color: #000000;">Additional tools for SNP analysis:&nbsp;</span></p>
<p><a href="http://bioconductor.org/help/bioc-views/release/bioc/html/snpMatrix.html">snpMatrix</a></p><h3><a name="TOC-BSgenome" id="TOC-BSgenome"></a>BSgenome</h3><p>BSgenome provides an object oriented infrastructure for interacting with a Biostring based genome sequence. BSgenome packages exist for many common genomes, and can be created to represent custom genomes. See the "How to forge a BSgenome data package" Vignette for instructions to create a new BSgenome package if a prebuilt package does not exist for your organism. <a href="http://bioconductor.org/packages/release/bioc/html/BSgenome.html">Documentation</a></p><h3><a name="TOC-rtracklayer" id="TOC-rtracklayer"></a>rtracklayer</h3><p>rtracklayer provides an interface for exporting annotation feature data to various genome browsers and file formats (such as GFF). See the Small RNA Profiling exercise for an example of using rtracklayer to visualize alignment coverage. <a href="http://bioconductor.org/packages/release/bioc/html/rtracklayer.html">Documentation</a></p><h3><a name="TOC-biomaRt" id="TOC-biomaRt"></a>biomaRt</h3><p>The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite (http:// www.biomart.org). The package enables online retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas. This data is retrieved automatically via the Internet, so it's recommended that you cache the data locally, or check versions if your code will be adversely affected by updates to these data. <a href="http://bioconductor.org/packages/release/bioc/html/biomaRt.html">Documentation</a></p><h3><a name="TOC-ChIP-Seq-Analysis-Packages" id="TOC-ChIP-Seq-Analysis-Packages"></a>ChIP-Seq Analysis Packages</h3><p>Bioconductor provides various packages for analyzing and visualizing ChIP-Seq data. Only a small selection of these packages is introduced here. Additional useful introductions to this topic are: <a href="http://www.bioconductor.org/workshops/2009/SeattleJan09/ChIP-seq/">BioC ChIP-seq Case Study</a> and BioC <a href="http://www.bioconductor.org/help/course-materials/2009/SeattleNov09/ChIP-seq/">ChIP-Seq</a>.</p><h4><a name="TOC-chipseq" id="TOC-chipseq"></a>chipseq</h4><p>The chipseq package combines a variety of HT-Seq packages to a pipeline for ChIP-Seq data analysis. <a href="http://bioconductor.org/packages/release/bioc/html/chipseq.html">Documentation</a></p><h4><a name="TOC-BayesPeak" id="TOC-BayesPeak"></a>BayesPeak</h4><p>BayesPeak is a peak calling package for identifying DNA binding sites of proteins in ChIP-Seq experiments. Its algorithm uses hidden Markov models (HMM) and Bayesian statistical methods. The following sample code introduces the identification of peaks with the BayesPeak package as well as the incorporation of read coverage information obtained by the chipseq package. <a href="http://bioconductor.org/packages/release/bioc/html/BayesPeak.html">Documentation</a> [ <a href="http://www.biomedcentral.com/1471-2105/10/299">Publication</a> ]</p><h4><a name="TOC-PICS" id="TOC-PICS"></a>PICS</h4><p>The PICS package applies probabilistic inference to aligned-read ChIP-Seq data in order to identify regions bound by transcription factors. PICS identifies enriched regions by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. The following sample code uses the test data set from the above BayesPeak package in order to compare the results from both methods by identifying their consensus peak set. <a href="http://www.bioconductor.org/packages/release/bioc/html/PICS.html">Documentation</a> [ <a href="http://www.hubmed.org/display.cgi?uids=20528864">Publication</a> ]</p><h4><a name="TOC-ChIPpeakAnno" id="TOC-ChIPpeakAnno"></a>ChIPpeakAnno</h4><p>The ChIPpeakAnno package provides. batch annotation of the peaks identified from either ChIP-seq or ChIP-chip experiments. It includes functions to retrieve the sequences around peaks, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. The package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages. <a href="http://bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html">Documentation</a></p><h4><a name="TOC-Additional-ChIP-Seq-Packages" id="TOC-Additional-ChIP-Seq-Packages"></a>Additional ChIP-Seq Packages</h4><p>DiffBind: <a href="http://www.bioconductor.org/packages/release/bioc/html/DiffBind.html">Documentation</a></p><p>MOSAICS: <a href="http://bioconductor.org/packages/devel/bioc/html/mosaics.html">Documentation</a></p><p>iSeq: <a href="http://bioconductor.org/packages/release/bioc/html/iSeq.html">Documentation</a></p><p>ChIPseqR: <a href="http://bioconductor.org/packages/release/bioc/html/ChIPseqR.html">Documentation</a></p><p>ChiPsim: <a href="http://bioconductor.org/packages/release/bioc/html/ChIPsim.html">Documentation</a></p><p>CSAR: <a href="http://www.bioconductor.org/packages/devel/bioc/html/CSAR.html">Documentation</a></p><p>ChIP-Seq Pipeline: <a href="http://www.bioconductor.org/packages/release/bioc/html/PICS.html">PICS</a>, rGADEM and MotIV (<a href="http://www.rglab.org/pics-and-bioconductor/">developer web site</a>)</p><p>SPP: <a href="http://compbio.med.harvard.edu/Supplements/ChIP-seq/">ChIP-seq processing pipeline</a></p><p><a href="http://compbio.med.harvard.edu/Supplements/ChIP-seq/tutorial.html">SPP Tutorial</a></p><p><a href="http://liulab.dfci.harvard.edu/MACS/index.html">MACS</a></p><p><a href="http://gmdd.shgmo.org/Computational-Biology/ChIP-Seq/download/SIPeS">SIPeS</a></p><h3><a name="TOC-RNA-Seq-Analysis" id="TOC-RNA-Seq-Analysis"></a>RNA-Seq Analysis</h3><h4><a name="TOC-Counting-Reads-that-Overlap-with-Annotation-Ranges-" id="TOC-Counting-Reads-that-Overlap-with-Annotation-Ranges-"></a>Counting Reads that Overlap with Annotation Ranges&nbsp;</h4><p>The GenomicRanges package provides support for importing into R short read alignment data in BAM format (via Rsamtools) and associating them with genomic feature ranges, such as exons or genes. This way one can quantify the number of reads aligning to annotated genomic regions. The package defines general purpose containers for storing genomic intervals as well as more specialized containers for storing alignments against a reference genome. The two main functions for read counting provided by this infrastructure are <span>countOverlaps <span style="color: #000000;"><span>and</span></span> summarizeOverlaps</span>. For their proper usage, it is important to read the corresponding <a href="http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf">PDF manual</a>. <a href="http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html">Documentation</a></p><h4><a name="TOC-Differential-Gene-Expression-Analysis-with-DESeq" id="TOC-Differential-Gene-Expression-Analysis-with-DESeq"></a>Differential Gene Expression Analysis with DESeq</h4><p>The DESeq package contains functions to call differentially expressed genes (DEGs) in count tables based on a model using the negative binomial distribution. It expects as input a data frame with the raw read counts per region/gene of interest (rows) for each test sample (columns).&nbsp; Such a count table can be imported into R or generated from BAM alignment files using the <span>countOverlaps</span> function as introduced above. <a href="http://www.bioconductor.org/packages/release/bioc/html/DESeq.html">Documentation</a></p><h4><a name="TOC-Differential-Gene-Expression-Analysis-with-edgeR" id="TOC-Differential-Gene-Expression-Analysis-with-edgeR"></a>Differential Gene Expression Analysis with edgeR</h4><p>The edgeR package uses empirical Bayes estimation and exact tests based on the negative binomial distribution to call differentially expressed genes (DEGs) in count data.&nbsp;</p>
<p><a href="http://www.bioconductor.org/packages/release/bioc/html/edgeR.html">Documentation</a></p>
<p><span style="color: #000000;">A variety of additional R packages are available for normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG): <br /> </span></p><p><a href="http://bioconductor.org/packages/devel/bioc/html/easyRNASeq.html">easyRNASeq</a> (simplifies read counting per genome feature)</p><p><a href="http://www.bioconductor.org/packages/release/bioc/html/DEXSeq.html">DEXSeq</a> (Inference of differential exon usage);&nbsp;<a href="http://www.bioconductor.org/packages/release/data/experiment/html/parathyroidSE.html">parathyroidSE</a> explains how to generate exon read counts in R</p><p><a href="http://bioconductor.org/packages/release/bioc/html/DEGseq.html">DEGseq</a></p><p><a href="http://www.bioconductor.org/packages/release/bioc/html/baySeq.html">baySeq</a> (also see: <a href="http://www.bioconductor.org/packages/release/bioc/html/segmentSeq.html">segmentSeq</a>)</p><p><a href="http://bioconductor.org/packages/release/bioc/html/Genominator.html">Genominator</a> (<a href="http://www.hubmed.org/display.cgi?uids=20167110">Bullard et al. 2010</a>)</p><div style="text-align: right;"><div style="text-align: left;"><h4><a name="TOC-Detection-of-Alternative-Splice-Junctions" id="TOC-Detection-of-Alternative-Splice-Junctions"></a>Detection of Alternative Splice Junctions</h4>
<p><span style="color: #000000;">Another utility of RNA-Seq experiments is the analysis of splice junctions. The following software suggestions provide this utility:</span></p>
<p><a href="http://woldlab.caltech.edu/rnaseq/">ERANGE<br /> </a><a href="http://tophat.cbcb.umd.edu/">TopHat</a></p><p><a href="http://biogibbs.stanford.edu/%7Ekinfai/SpliceMap/">SpliceMap</a></p><p><a href="http://solidsoftwaretools.com/gf/project/splitseek/">SplitSeek</a></p><h3><a name="TOC-DNA-Methylation-Data-Analysis" id="TOC-DNA-Methylation-Data-Analysis"></a>DNA-Methylation Data Analysis</h3><div><ul>
<li><span style="font-size: 10pt;"><a href="http://www.bioconductor.org/help/course-materials/2012/BiocEurope2012/mattia_pelizzola_methylPipe.pdf">methylPipe</a></span></li>
<li><span style="font-size: 10pt;"><a href="http://www.bioconductor.org/packages/devel/bioc/html/bsseq.html">bsseq</a></span></li>
<li><a href="http://www.bioconductor.org/packages/devel/bioc/html/BiSeq.html">BiSeq</a></li>
<li>Much more under <a href="http://www.bioconductor.org/packages/devel/BiocViews.html#___DNAMethylation">BiocViews</a></li>
</ul></div></div></div><h3><a name="TOC-HT-Seq-Data-Visualization" id="TOC-HT-Seq-Data-Visualization"></a>HT-Seq Data Visualization</h3>
<p><a href="http://www.bioconductor.org/packages/release/bioc/html/ggbio.html">ggbio</a>: ggplot2 extension for genomics data (<a href="http://tengfei.github.com/ggbio/">online manual</a>) <a href="http://www.bioconductor.org/packages/devel/bioc/html/Gviz.html">Gviz</a>:&nbsp;Plotting data and annotation information along genomic coordinates <a href="http://bioconductor.org/packages/release/bioc/html/HilbertVis.html">HilbertVis</a>: Hilbert genome plots</p>
<p><a href="http://bioconductor.org/packages/release/bioc/html/GenomeGraphs.html">GenomeGraphs</a>: Plotting genomic information from Ensembl</p><p><a href="http://www.hubmed.org/display.cgi?uids=18507856">TileQC</a>: Flow Cell Quality Visualization</p><p><a href="http://bioconductor.org/packages/release/bioc/html/rtracklayer.html">rtracklayer</a>: R interface to genome browsers</p><p><a href="http://genoplotr.r-forge.r-project.org/">genoPlotR</a>: Plotting maps of genes and genomes</p><p><a href="http://bioconductor.org/packages/release/bioc/html/Genominator.html">Genominator</a>: Tools for storing, accessing, analyzing and visualizing genomic data.</p><p>&nbsp;</p><p>To install all packages</p><blockquote><p>source("http://bioconductor.org/biocLite.R")<br />biocLite()<br />biocLite(c("ShortRead", "Biostrings", "IRanges", "BSgenome", "rtracklayer", "biomaRt", "chipseq", "ChIPpeakAnno", "Rsamtools", "BayesPeak", "PICS", "GenomicRanges", "DESeq", "edgeR", "leeBamViews", "GenomicFeatures", "BSgenome.Celegans.UCSC.ce2"))</p></blockquote></div>]]></description>
	<dc:creator>John Parker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/12206/bioinformatics-algorithms-tutorials</guid>
	<pubDate>Tue, 24 Jun 2014 00:10:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/12206/bioinformatics-algorithms-tutorials</link>
	<title><![CDATA[Bioinformatics algorithms tutorials]]></title>
	<description><![CDATA[<p>Useful bioinformatics tutorial, such as</p>
<p>De Bruijn Graphs for NGS Assembly<br>Algorithms for PacBio Reads<br>Software and Hardware Concepts for Bioinformatics<br>Finding us in Homolog.us (Search Algorithms)<br>NGS Genome and RNAseq Assembly - a Hands on Primer<br>Introduction to PERL, Python, R and C/C++ for Bioinformatics</p><p>Address of the bookmark: <a href="http://www.homolog.us/Tutorials/" rel="nofollow">http://www.homolog.us/Tutorials/</a></p>]]></description>
	<dc:creator>John Parker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</guid>
	<pubDate>Thu, 02 Jan 2025 20:11:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44758/the-ifs-and-buts-of-ngs-quality-control-and-trimming</link>
	<title><![CDATA[The &quot;Ifs&quot; and &quot;Buts&quot; of NGS Quality Control and Trimming]]></title>
	<description><![CDATA[<p>Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.</p><h3><strong>The "Ifs" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Ensures Data Integrity</strong><br />If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.</p>
</li>
<li>
<p><strong>Removes Contaminants</strong><br />If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.</p>
</li>
<li>
<p><strong>Improves Mapping and Assembly</strong><br />If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.</p>
</li>
<li>
<p><strong>Reduces Computational Load</strong><br />If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.</p>
</li>
<li>
<p><strong>Prepares for Standardized Analyses</strong><br />If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.</p>
</li>
</ol><h3><strong>The "Buts" of NGS QC and Trimming</strong></h3><ol>
<li>
<p><strong>Risk of Over-Trimming</strong><br />But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.</p>
</li>
<li>
<p><strong>Bias Introduction</strong><br />But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.</p>
</li>
<li>
<p><strong>Loss of Context in Paired-End Reads</strong><br />But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.</p>
</li>
<li>
<p><strong>Time and Resource Intensive</strong><br />But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.</p>
</li>
<li>
<p><strong>Variable Standards</strong><br />But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.</p>
</li>
</ol><h3><strong>Balancing the "Ifs" and "Buts"</strong></h3><p>To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:</p><ul>
<li>
<p><strong>Use QC Tools Wisely:</strong> Start with tools like <strong>FastQC</strong> to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.</p>
</li>
<li>
<p><strong>Choose Reliable Trimming Tools:</strong> Tools like <strong>Trimmomatic</strong>, <strong>Cutadapt</strong>, and <strong>BBduk</strong> offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.</p>
</li>
<li>
<p><strong>Set Reasonable Parameters:</strong> Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.</p>
</li>
<li>
<p><strong>Test Downstream Effects:</strong> Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.</p>
</li>
<li>
<p><strong>Document Your Workflow:</strong> Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.</p>
</li>
</ul><h3><strong>Conclusion</strong></h3><p>NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/12944/orione-%E2%80%93-a-web-based-framework-for-ngs-analysis-in-microbiology</guid>
	<pubDate>Wed, 23 Jul 2014 06:43:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/12944/orione-%E2%80%93-a-web-based-framework-for-ngs-analysis-in-microbiology</link>
	<title><![CDATA[Orione – a web-based framework for NGS analysis in microbiology]]></title>
	<description><![CDATA[<p>End-to-end NGS microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult due to a lack of interoperability, reproducibility, and transparency. To overcome these limitations researchers at <a href="http://www.crs4.it/" target="_blank">CRS4</a>, Italy have developed Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for NGS microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics.</p>
<p>Reference</p>
<p>Cuccuru G1, Orsini M, Pinna A, Sbardellati A, Soranzo N, Travaglione A, Uva P, Zanetti G, Fotia G. (2014)<strong> Orione, a web-based framework for NGS analysis in microbiology.</strong> <em>Bioinformatics</em> [Epub ahead of print]. [<a href="http://bioinformatics.oxfordjournals.org/content/early/2014/03/10/bioinformatics.btu135.long" target="_blank">article</a>]</p><p>Address of the bookmark: <a href="http://orione.crs4.it/" rel="nofollow">http://orione.crs4.it/</a></p>]]></description>
	<dc:creator>Martin Jones</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38199/pacasus-correction-of-palindromes-in-long-reads-from-pacbio-and-nanopore</guid>
	<pubDate>Mon, 12 Nov 2018 05:26:48 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38199/pacasus-correction-of-palindromes-in-long-reads-from-pacbio-and-nanopore</link>
	<title><![CDATA[Pacasus: Correction of palindromes in long reads from PacBio and Nanopore]]></title>
	<description><![CDATA[<p><br>Tool for detecting and cleaning PacBio / Nanopore long reads after whole genome amplification. Check the poster from the Revolutionizing Next-Generation Sequencing (2nd edition) conference in the source folder:&nbsp;<a href="https://github.com/swarris/Pacasus/blob/master/vib2017.pdf">https://github.com/swarris/Pacasus/blob/master/vib2017.pdf</a>.</p>
<p>The prepint version is found on&nbsp;<a href="http://www.biorxiv.org/content/early/2017/08/09/173872">http://www.biorxiv.org/content/early/2017/08/09/173872</a></p>
<p>It uses the pyPaSWAS framework for sequence alignment (<a href="https://github.com/swarris/pyPaSWAS">https://github.com/swarris/pyPaSWAS</a>)</p><p>Address of the bookmark: <a href="https://github.com/swarris/Pacasus" rel="nofollow">https://github.com/swarris/Pacasus</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37563/colormap-correcting-long-reads-by-mapping-short-reads</guid>
	<pubDate>Mon, 20 Aug 2018 14:17:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37563/colormap-correcting-long-reads-by-mapping-short-reads</link>
	<title><![CDATA[CoLoRMap: Correcting Long Reads by Mapping short reads]]></title>
	<description><![CDATA[<p><span>Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap</span></p>
<p><span>ehaghshe@sfu.ca or cedric.chauve@sfu.ca</span></p><p>Address of the bookmark: <a href="https://github.com/sfu-compbio/colormap" rel="nofollow">https://github.com/sfu-compbio/colormap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/17504/postdoc-scientist-bioinformatics-at-ccmb</guid>
  <pubDate>Fri, 26 Sep 2014 19:58:41 -0500</pubDate>
  <link></link>
  <title><![CDATA[PostDoc Scientist Bioinformatics at CCMB]]></title>
  <description><![CDATA[
<p>1. Project Assistant/Junior Research Fellow/ Project Fellow [PA_JRF_PF]</p>

<p>a) M.Sc/or equivalent in biological sciences/related areas [Position Code: PA_JRF_PF_a]<br />b) B.E/B.Tech/ M.Sc in biotechnology/bioinformatics/computer science/Chemistry/Physics or MCA [Position Code: PA_JRF_PF_b]<br />c) M.Sc/or equivalent in wildlife sciences/ecology/environmental sciences or MBBS/BVSc/MVSc. [Position Code: PA_JRF_PF_c]</p>

<p>(Candidates with result awaited are NOT eligible to apply)</p>

<p>Upper Age limit 28years</p>

<p>Rs.12000 / Rs.16000 (as sanctioned by the funding agency)</p>

<p>2. Post Doctoral Fellow/Research Associate in multiple research areas [PDF_RA]</p>

<p>Ph.D. (submitted/awarded) in any branch of biological Sciences. Candidates with Ph.D. in other sciences are also encouraged to apply.</p>

<p>Experience in molecular biology, biochemistry, structural biology, cell biology, infectious disease, conservation genetics, veterinary science, reproductive biology, and molecular diagnostics is desired but not mandatory.</p>

<p>[Position Code: PDF_RA]</p>

<p>UpperAge limit 35years</p>

<p>Rs. 22000- 26000 (as sanctioned by the funding agency)</p>

<p>3. Post Doctoral Scientist Fellow [PDSF]</p>

<p>Ph.D in any of the following areas: bioinformatics, next generation sequencing, high throughput data analysis, proteomics, bio-statistics, computer science, information technology, computer hardware and networking/clustering, parallel processing.<br />[Position Code: PDSF]</p>

<p>Upper Age limit 40 years</p>

<p>Rs. 40000 consolidated (as sanctioned by the funding agency)</p>

<p>Download Application: Last date for apply online: 09th Oct 2014</p>

<p>Advertisement: www.ccmb.res.in//index.php?view=notifications&amp;mid=0&amp;id=71&amp;nid=38</p>

<p>Apply online http://www.ccmb.res.in/positions/temp_notif/online_form.html</p>

<p>More at http://www.ccmb.res.in//index.php?view=notifications&amp;mid=0&amp;id=71&amp;nid=38</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/17652/arraygen-bioinformatics-genomics-group</guid>
  <pubDate>Sun, 28 Sep 2014 14:09:55 -0500</pubDate>
  <link></link>
  <title><![CDATA[ArrayGen Bioinformatics Genomics Group]]></title>
  <description><![CDATA[
<p>ArrayGen is a global bioinformatics company which is a one stop solution for microarray designing and genomics data analysis. Our novel Array Design Approach Strategy (ADAS) aims to condense the time lag between demands of scientific community and manufacture industry, thereby expediting research processes.</p>

<p>ArrayGen specializes in Genomics data analysis and research, as we believe in the level of precision, predictability, benchmark-ability, and data analysis capability of genomics data over other forms of biological data. ArrayGen constantly strives to develop new solutions, and plug the existing gaps in the technological advancement of the field.</p>

<p>More http://www.arraygen.com/</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/17873/postdoc-position-in-protein-annotation-and-machine-learning-paris-france</guid>
  <pubDate>Sat, 04 Oct 2014 08:10:45 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoc position in protein annotation and machine learning - Paris, France]]></title>
  <description><![CDATA[
<p>We are interested in finding an excellent postdoc with interests in protein functional annotation, machine learning and computer grids. The position is open for 3.5 years at the Université Pierre et Marie Curie, in the heart of Paris.</p>

<p>Research topic: Protein function annotation, multiple probabilistic models, domain architecture, machine learning, combinatorial optimization, computer grid.</p>

<p>This project is run on the Laboratoire de Biologie Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin, Laboratoire d’Informatique de Paris 6 – Equipe DECISION.</p>

<p>The postdoc will be payed under a contract of Ingénieur de Recherche lasting 3.5 years and it is available from September 1st, 2014.</p>

<p>Group Web Page: http://www.lcqb.upmc.fr/AnalGenom/home.html</p>

<p>Ref. E-Mail: Alessandra Carbone alessandra.carbone@lip6.fr</p>
]]></description>
</item>

</channel>
</rss>