BOL: Related items

The "Ifs" and "Buts" of NGS Quality Control and Trimming

BioStar — Thu, 02 Jan 2025 20:11:07 -0600

Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.

The "Ifs" of NGS QC and Trimming

Ensures Data Integrity
If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.
Removes Contaminants
If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.
Improves Mapping and Assembly
If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.
Reduces Computational Load
If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.
Prepares for Standardized Analyses
If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.

The "Buts" of NGS QC and Trimming

Risk of Over-Trimming
But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.
Bias Introduction
But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.
Loss of Context in Paired-End Reads
But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.
Time and Resource Intensive
But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.
Variable Standards
But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.

Balancing the "Ifs" and "Buts"

To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:

Use QC Tools Wisely: Start with tools like FastQC to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.
Choose Reliable Trimming Tools: Tools like Trimmomatic, Cutadapt, and BBduk offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.
Set Reasonable Parameters: Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.
Test Downstream Effects: Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.
Document Your Workflow: Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.

Conclusion

NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.

Internship program with ArrayGen Technolgies

Sun, 22 Jun 2014 23:18:31 -0500

Internship Program for Bioinformatics / Biotechnology Professionals Currently we offer positions to outstanding students interested in Next Generation Sequencing (NGS) data analysis. Applications are accepted throughout the year. Accepted students will be listed on web with their schedules. Accepted students can attend our future workshops and trainings freely at the specified venue.

Interested candidates may email their resume along with a cover letter to careers@arraygen.com

Official website: http://www.arraygen.com/

genomics public data links !

Jit — Thu, 13 Feb 2020 00:20:00 -0600

List of publically available databases on google server.

More at https://software.broadinstitute.org/gatk/download/bundle

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/GATK/.

ftp://ftp.broadinstitute.org/bundle/hg38/hg38bundle/

Address of the bookmark: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1

PhD opportunity at Université de Liège - Belgium

Sat, 02 Aug 2014 01:12:43 -0500

PhD opportunity at Université de Liège - Belgium

The Bioinformatics and Systems Biology Unit of Université de Liège (Belgium) is looking for a highly motivated master student with programming skills for a PhD thesis project (4 years, fully funded) with the goal of designing computational tools that use literature, genomic and structural data in order to infer regulatory and metabolic networks.

Applicants are invited to send their resume and a recommendation letter to Prof. Patrick Meyer (more details at www.biosys.ulg.ac.be )

For more information : www.biosys.ulg.ac.be

PLAST: A fast, accurate and NGS scalable bank-to-bank sequence similarity search tool

Jit — Fri, 01 Dec 2017 04:10:54 -0600

PLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool providing significant accelerations of seeds-based heuristic comparison methods, such as the Blast suite of algorithms.

Relying on unique software architecture, PLAST takes full advantage of recent multi-core personal computers without requiring any additional hardware devices.

PLAST stands for Parallel Local Sequence Alignment Search Tool and is was published in BMC Bioinformatics.

PLAST is a general purpose sequence comparison tool providing the following benefits:

PLAST is a high-performance sequence comparison tool designed to compare two sets of sequences (query vs. reference),
Reduces the processing time of sequences comparisons while providing highest quality results,
Contains a fully integrated data filtering engine capable of selecting relevant hits with user-defined criteria (E-Value, identity, coverage, alignment length, etc.),
Does not require any additional hardware, since it is a software solution. It is easy to install, cost-effective, takes full advantage of multi-core processors and uses a small RAM footprint,
Ready to be used on desktop computer, cluster, cloud as well as within distributed system running Hadoop.

https://plast.inria.fr/

Address of the bookmark: https://plast.inria.fr/

Which math/statistics programming language/application do you most frequently use in bioinformatics?

John Parker — Thu, 04 Sep 2014 17:46:41 -0500

I'm doing a bit more statistical analysis on some bioinformatics things lately, and I'm curious if there are any programming languages that are particularly good for this NGS computation. What suggestions do you guys have? Are there any languages that have exceptionally good libraries?

NGS Online Training

Sat, 27 Sep 2014 07:42:29 -0500

ArrayGen Technologies announces to provide online NGS training through out the globe. Now analyze your own NGS datasets from anywhere.For more information contact us at training@arraygen.com

Please visit our site at www.arraygen.com

Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data

Shruti Paniwala — Thu, 25 Oct 2018 06:14:55 -0500

Platypus is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, it has been run on very large datasets as part of the Thousand Genomes and WGS500 projects, and is being used in clinical sequencing trials in the Mainstreaming Cancer Genetics programme.

Tutorial https://github.com/andyrimmer/Platypus/blob/master/misc/README.txt

Address of the bookmark: http://www.well.ox.ac.uk/platypus

JRF in Bioinformatics @ INMAS, DRDO,Delhi

Wed, 01 Oct 2014 07:01:07 -0500

Institute of Nuclear Medicine and Allied Sciences (INMAS), Delhi under the aegis of Defence Research and Development Organisation (DRDO), is engaged in research and developmental work in radiation sciences, Neuro-Computing and Medical Image Processing. INMAS is looking for meritorious young researchers for pursuing research in the frontier areas at INMAS. The Institute invites applications from young and meritorious Indian nationals who are creative, have passion and desire to pursue R&D in frontier areas. INMAS possesses ambience of a research cum academic institute coupled with an advanced R&D infrastructure in a mission mode. It provides the best infrastructure, motivation and personality development prospects for talented students, dreaming of unparalleled success in their professional endeavors. INMAS provides state of the art research facilities for undertaking pioneering research with defence applications.

JRF (Maximum Tenure‐ Five Years: 2yrs as JRF and 3yrs as SRF)
A first class Master’s Degree in Bioinformatics (likely 2 posts)
Around Rs 16,000/ Plus 30% HRA (as per rules of funding agency)

Applications are invited from candidates possessing the above qualifications. The upper age limit is as on the last date for receipt of application. (5 years relaxation to SC/ST candidates, 3 years to OBC candidates, and other entitled categories as per Govt rules). Actual No. of vacancies may vary.

Application form can be download from the website www.drdo.gov.in and E Mailed to inmashrd@gmail.com.
Last date to apply by email is 1700 hrs on 15 Oct 2014
Incomplete applications are liable to be rejected.
Confirmation will be sent to short-listed candidates through email only
Antecedents of selected candidates will be verified.
Written Test will be conducted from 0930-1030 hrs. Latecomers will not be considered.
Candidates will be required to produce certificates/testimonials in original at the time of interview.
It may please be noted that offer of Fellowship does not confer on fellows any right for absorption in DRDO.
Candidates should carry photocopy of Application form sent by email with them.
No TA/DA will be paid for attending interview & on joining.
Last date to apply by email is 1700 hrs on 15 Oct 2014

More at http://drdo.gov.in/drdo/English/jrf29092014.pdf
http://drdo.gov.in/drdo/English/index.jsp?pg=inmas29092014.jsp

Internship Program for Bioinformatics / Biotechnology Professionals (No. Of Vacancy: 2)

Wed, 08 Oct 2014 01:10:08 -0500

ArrayGen is offering an Internship Program for Post graduate Bioinformatics / Biotechnology students and professionals. ArrayGen Technologies provide an excellent opportunity to gain research experience and explore if a scientific career is right for you. Currently we offer positions to outstanding students interested in Next Generation Sequencing (NGS) data analysis. Applications are accepted throughout the year. Accepted students will be listed on web with their schedules. Accepted students can attend our future workshops and trainings freely at the specified venue.