BOL: Related items

You and your friend have similar DNA !!!

Jit — Sun, 27 Jul 2014 20:44:05 -0500

New research out of Massachusetts claims that people often choose friends that are similar to them in genetics and they are more accurate than you might suppose. A study published on PNAS http://www.pnas.org/content/111/Supplement_3/10796.full found that people are apt to pick friends who are genetically similar to themselves - so much so that friends tend to be as alike at the genetic level as a person's fourth cousin.

Scientists with a long-running Framingham Heart Study looked at 1,932 people (examination of about 1.5 million markers of genetic variations), comparing unrelated friends to unrelated strangers. They found that friends shared about 1% of their genes — a percentage much higher than those shared with strangers.This new findings made it clear that people have more DNA in common with those who are selected as friends than with strangers in the same population.

The genes that lined up the most were olfactory genes, which deal with smell. The ones that lined up the least were immune system genes. The researchers weren't sure why that happened :/. Olfactory genes might be a straightforward explanation: People who like the same smells tend to be drawn to similar environments, where they meet others with the same tendencies.

Reference:

http://www.pnas.org/content/111/Supplement_3/10796.full

Image : http://i.kinja-img.com

The "Ifs" and "Buts" of NGS Quality Control and Trimming

BioStar — Thu, 02 Jan 2025 20:11:07 -0600

Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.

The "Ifs" of NGS QC and Trimming

Ensures Data Integrity
If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.
Removes Contaminants
If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.
Improves Mapping and Assembly
If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.
Reduces Computational Load
If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.
Prepares for Standardized Analyses
If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.

The "Buts" of NGS QC and Trimming

Risk of Over-Trimming
But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.
Bias Introduction
But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.
Loss of Context in Paired-End Reads
But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.
Time and Resource Intensive
But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.
Variable Standards
But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.

Balancing the "Ifs" and "Buts"

To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:

Use QC Tools Wisely: Start with tools like FastQC to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.
Choose Reliable Trimming Tools: Tools like Trimmomatic, Cutadapt, and BBduk offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.
Set Reasonable Parameters: Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.
Test Downstream Effects: Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.
Document Your Workflow: Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.

Conclusion

NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.

Research Associate at Indian Institute of Chemical Technology (IICT), Hyderabad

Thu, 07 Aug 2014 01:55:21 -0500

Indian Institute of Chemical Technology (IICT), Hyderabad, a constituent of CSIR is a leading research Institute in the area of chemical sciences. The core strength of IICT lies in Organic Chemistry, and it continues to excel in this field for over six decades. The research efforts during these years have resulted in the development of several innovative processes for a variety of products necessary for human welfare such as drugs, agrochemicals, food, organic intermediates, adhesives etc. More than 150 technologies developed by IICT are now in commercial production.

CSIR-IICT is conducting Walk in Interview for the following position on a purely temporary basis for the sponsored project "GENESIS (BSC-0121) at 10.00 AM on 19th August 2014 at IICT, Hyderabad

Position : Research Associate
No of Post : One
Desired Profile : PhD in computation biology or M.Tech in Computational Biology with three years experience in relevant subject and atleast one research paper in SCI journal

Experience : Knowledge in vector and vector borne disease, disease modeling, GIS mapping and modeling.
Age : 35 years
Stipend : Rs 22000/- + HRA

Eligible candidate may download the application form from our website http://www.iictindia.org and appear for interview along with the duly filled in application form supported by bio-data and one set of attested photo copies of certificates of educational qualification, age, experience, caste, latest photograph and the cadndidates are required to bring all the original certificates for verification

Walk in Interview : 19.08.14

My commonly used commands in Bioinformatics

Rahul Nayak — Thu, 26 Jul 2018 04:58:45 -0500

FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:

minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta

This reports Racon's changes in a table. You can exclude indels with the -I option in show-snps.

This process (Racon -> MUMmer -> SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).

JRF position in the Faculty of Life Sciences & Biotechnology at Sauth Asian University

Wed, 13 Aug 2014 07:16:30 -0500

Opening for a Project-JRF position in the Faculty of Life Sciences & Biotechnology

Applications are invited for the post of Junior Research Fellow (JRF) in a DBT funded IYBA project entitled “Generatingaprotein-ncRNA interactome for Dorsal mediated gene regulation and dorso-ventral patterning genes in Drosophila” in the Lab. Of Molecular Biology at the Faculty of Life Sciences and Biotechnology, South Asian University, New Delhi. The project requires extensive use of molecular, genetic and genomic approaches.

POST: Junior Research Fellow (JRF)

NO. OF VACANCIE(S) - (01)

FELLOWSHIP: Rs. 16,000/- plus HRA

PROJECT DURATION: 2014-2016 (Two years)

LAST DATE FOR APPLICATION: Aug 18, 2014.

Eligibility criteria:

M.Sc./M.Tech./ in Biological Sciences/Biotechnology/Bio-Informatics. Candidates with research experience in the field of Drosophila/Yeast genetics will be preferred.

Application Procedure:

A covering letter along with your CV, copy of prior publications (if any) and proof of experience should be e-mailed to lmb_sau@aol.com. Hardcopy of the application should be brought on the day of interview along with other testimonials and marks statements for verification purpose.

IMPORTANT NOTE:

-No TA/DA will be paid for attending the interview.

-SAU may select candidates against the post depending upon qualification and experience of candidates and reserves the right to relax any of the qualifications in case the candidate is found otherwise well qualified by the Selection Committee

-The abovementioned post is temporary and will be initially offered for a period of one year and can be extended, on satisfactory performance.

More at http://www.sau.ac.in/recruitment/vacancy.html

Referee: Genome assembly quality scores

Jit — Sun, 04 Nov 2018 16:44:30 -0600

Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.

We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.

Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.

https://github.com/gwct/referee

Address of the bookmark: https://gwct.github.io/referee/#

Project Fellow at Institute of Himalayan Bioresource Technology

Fri, 15 Aug 2014 06:50:08 -0500

Research Associate/ Project FellowDate of posting:14 Aug

Eligibility : MSc, M Phil / Phd, BE/B.Tech
Location : Himachal Pradesh-other
Job Category : Govt Jobs, Research, Walkin
Last Date : 20 Aug 2014

Advertisement No.6/2014

Post : Project Fellow
Research Associate/ Project Fellow Jobs opportunity in CSIR-Institute of Himalayan Bioresource Technology
M.Sc. in Bioinformatics/Computer Science with 55% marks and (ii) M.Sc. Bioinformatics/ Computational biology/ P.G. Diploma in Bioinformatics/B.Tech. or higher Degree in Bioinformatics with 55% marks

Date of Interview: 29.08.2014.

More at http://www.ihbt.res.in/recruit/AdvtNo6_2014.pdf

jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator

Abhimanyu Singh — Fri, 26 Jul 2019 00:58:12 -0500

jackalope simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. All outputs can be written to standard file formats.

A swift, versatile phylogenomic and high-throughput sequencing simulator https://jackalope.lucasnell.com

Address of the bookmark: https://github.com/lucasnell/jackalope

Biology, Computers Collide in High-Demand Field of Bioinformatics

Mon, 25 Aug 2014 00:56:10 -0500

Dr. Shivas Amin calls bioinformatics a "collision of biology and computers." Students learn how to use computers and skills in math and biology to analyze genome and proteome projects to prepare for high-demand jobs in the life sciences. Learn more about Amin and hear from student Medina Baitemirova and alumnus Lukas Simon about the fast-growing field of bioinformatics.

Kevler: Reference-free variant discovery in large eukaryotic genomes

Jit — Tue, 28 Jan 2020 03:21:53 -0600

Welcome to kevlar, software for predicting de novo genetic variants without mapping reads to a reference genome! kevlar's k-mer abundance based method calls single nucleotide variants (SNVs), multinucleotide variants (MNVs), insertion/deletion variants (indels), and structural variants (SVs) simultaneously with a single simple model.

More at https://kevlar.readthedocs.io/en/latest/

https://www.cell.com/iscience/pdf/S2589-0042(19)30259-7.pdf

Address of the bookmark: https://github.com/kevlar-dev/kevlar