BOL: Related items

Bioinformatics software for biologists in the genomics era

Poonam Mahapatra — Sun, 22 Dec 2013 17:31:05 -0600

The genome sequencing revolution is approaching a landmark figure of 1000 completely sequenced genomes. Coupled with fast-declining, per-base sequencing costs, this influx of DNA sequence data has encouraged laboratory scientists to engage large datasets in comparative sequence analyses for making evolutionary, functional and translational inferences. However, the majority of the scientists at the forefront of experimental research are not bioinformaticians, so a gap exists between the user-friendly software needed and the scripting/programming infrastructure often employed for the analysis of large numbers of genes, long genomic segments and groups of sequences. We see an urgent need for the expansion of the fundamental paradigms under which biologist-friendly software tools are designed and developed to fulfill the needs of biologists to analyze large datasets by using sophisticated computational methods. We argue that the design principles need to be sensitive to the reality that comparatively small teams of biologists have historically developed some of the most popular biological software packages in molecular evolutionary analysis. Furthermore, biological intuitiveness and investigator empowerment need to take precedence over the current supposition that biologists should re-tool and become programmers when analyzing genome scale datasets.

Address of the bookmark: http://bioinformatics.oxfordjournals.org/content/23/14/1713.full

The "Ifs" and "Buts" of NGS Quality Control and Trimming

BioStar — Thu, 02 Jan 2025 20:11:07 -0600

Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.

The "Ifs" of NGS QC and Trimming

Ensures Data Integrity
If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.
Removes Contaminants
If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.
Improves Mapping and Assembly
If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.
Reduces Computational Load
If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.
Prepares for Standardized Analyses
If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.

The "Buts" of NGS QC and Trimming

Risk of Over-Trimming
But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.
Bias Introduction
But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.
Loss of Context in Paired-End Reads
But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.
Time and Resource Intensive
But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.
Variable Standards
But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.

Balancing the "Ifs" and "Buts"

To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:

Use QC Tools Wisely: Start with tools like FastQC to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.
Choose Reliable Trimming Tools: Tools like Trimmomatic, Cutadapt, and BBduk offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.
Set Reasonable Parameters: Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.
Test Downstream Effects: Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.
Document Your Workflow: Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.

Conclusion

NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.

JRF @ PONDICHERRY UNIVERSITY

Fri, 03 Jan 2014 16:48:56 -0600

PONDICHERRY UNIVERSITY

CENTRE FOR BIOINFORMATICS

PUDUCHERRY

Applications are invited for one Project Assistant to work in the UGC sponsored Research Award "Molecular Docking and Dynamics studies to understand the interacting mechanism of oncogenic 101 protein with its cellular proteins".

The duration for the fellowship is 12months only with consolidated pay ofRs. 5,000 per month.

Application on plain paper with following details: Name, Address, Data of Birth, Father's Name, Nationality, Educational Qualification (SSLC onwards-enclose attested copies of certificate) and Researcb Experience may be addressed to Dr. R. Krishna, Principle Investigator (PI), UGC Research Award, Centre for Bioinformatics, Pondicherry University, Pondicherry - 605 014.

Application should reach in January 261h , 2013.

Essential Qualification: M.Sc. in Bioinformatics/Biophysics with good academic record.

Qualification for Project Fellow:

M.Sc in Bioinformatics/Biophysics.

The person to be considered for appointment as Project Fellow must have second class master degree with a minimum of 55% marks in the subject concerned or a related subject.

The candidate to be appointed as Project Fellows should be below thc age of40 years at the time of appointment.

Desirable Qualification for this Project: Research Experience in Small/Macromolecule Crystallography and Structural Bioinformatics.

For more details, refer the web site: www.pondiuni.edu.in/sites/default/files/BIC-311213.pdf

My commonly used commands in Bioinformatics

Rahul Nayak — Thu, 26 Jul 2018 04:58:45 -0500

FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:

minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta

This reports Racon's changes in a table. You can exclude indels with the -I option in show-snps.

This process (Racon -> MUMmer -> SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).

FACULTY POSITIONS AT IIIT-ALLAHABAD

Thu, 23 Jan 2014 06:19:34 -0600

OPENINGS OF FACULTY POSITIONS AT IIIT-ALLAHABAD

(Under Tenure-Track Model)

Open Advt. No IIITA/DIC/16012014

IIIT-Allahabad has several Openings for the Faculty positions at the Assistant Professor level.

It is a regular tenure-track faculty positions for 3-5 years in teaching and research. A regular faculty is expected to engage heavily in research and teaching. The eligibility criteria for regular faculty positions are similar as in IITs. For an Assistant Professor position, a candidate must have a PhD (in IT/Computer Science &/or Engineering/Electrical, Electronics &/or Communication Engineering/etc; for interdisciplinary areas the PhD may be in an appropriate field), plus three years experience. However, for PhDs from a well known University/Institute (e.g. IITs/IISc/TIFR/ISI in India or well known research universities across the world), and a good research/academic record, the 3 years experience requirement may be waived.

The pay scale for faculty is same as in IITs. Other benefits include initiation research grant, travel support, book grant, professional society membership, etc., and personal benefits such as medical/LTC, on campus subsidized family housing with excellent modern infrastructural facilities.

Areas of Interest

IIIT-Allahabad aims to build strong research groups in important and emerging areas in CS/IT/ECE as well as in emerging interdisciplinary areas, and applications are invited in all these areas. Some of the areas of special interest, besides strengthening the existing research areas, are : Software Engineering, Theoretical Computer Science, Cyber Physical Systems, Robotics, Network science, Digital Media, Computational neuroscience, Machine learning, Healthcare informatics, Computational Biology, Communications networks (both at hardware and protocol levels), Circuits (including VLSI, analog, low power, etc), Energy systems and technologies, Biomedical electronics and systems, Computer Architecture, signal/image processing, Embedded and control systems.

Application Process

Interested candidates can apply by sending their detailed CV with list of publications clearly mentioning Journal names and citation index with three references through email entitled “Faculty positions at IIIT Allahabad” to faculty.applications@iiita.ac.in. Do not send your applications in any other email addresses. Applications will be considered regularly, hence there is no deadline for applying.

Important Clarifications on Eligibility

A PhD in CS/IT (or other disciplines, as announced) is the minimum expected requirement for an Assistant Professor.

Advertisement: http://iiita.ac.in/pub/Faculty-Position-IIITA1.pdf

Referee: Genome assembly quality scores

Jit — Sun, 04 Nov 2018 16:44:30 -0600

Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.

We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.

Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.

https://github.com/gwct/referee

Address of the bookmark: https://gwct.github.io/referee/#

Post-doctoral Research Assistant in Genetics

Thu, 05 Jun 2014 16:01:39 -0500

Post-doctoral Research Assistant in Genetics
Camden, North London
£31.1K per annum inclusive of London Weighting

This is a fixed term post for 36 months.

We wish to recruit a highly motivated, postdoctoral scientist to carry out a BBSRC funded project in the laboratory of Dr. Denis Larkin. The project is focused on developing and applying new algorithms to study genome and chromosome evolution in birds, mammals and other vertebrate species using whole-genome sequences and existing algorithms. The post holder will use cutting edge computational and laboratory approaches to generate chromosomal assemblies for sequenced genomes, study chromosomal structures and genome differences between bird and other vertebrate species in attempt to identify species- and clade-specific genome signatures.

Applicants must have a Ph.D. and a track record of success, as indicated by first-author publications in international journals. They must possess excellent organisation skills and be capable of individual initiative and of interacting as part of a team. Applicants with extensive practical experience in bioinformatics or computer science, programming, visualization, handling of large data sets, high-performance computing are encouraged to apply. The post will involve collaboration with a wide range of academic partners both within the UK, EU and worldwide. In addition to leading their own project the post holder will have opportunities to contribute to multiple international genome initiatives.

Experience in programming, bioinformatics and comparative genome analysis is essential. Applicants should have a minimum of a degree and preferably a higher degree in a relevant subject.

The Royal Veterinary College has the largest range of veterinary, para-veterinary and animal science undergraduate and postgraduate courses of any veterinary school in the world and is one of the largest veterinary schools in Europe.

Prospective applicants are encouraged to contact Dr. Denis Larkin, Comparative Biomedical Sciences Department on +442071211906 or email: dlarkin@rvc.ac.uk

We offer a generous reward package.

For further information and to apply on-line please visit our website: www.rvc.ac.uk
Job reference CBS-0025-14A

Closing date: 4 July 2014
Interviews are likely to be held in July 2014

We promote equality of opportunity and diversity within the workplace and welcome applications from all sections of the community.

jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator

Abhimanyu Singh — Fri, 26 Jul 2019 00:58:12 -0500

jackalope simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. All outputs can be written to standard file formats.

A swift, versatile phylogenomic and high-throughput sequencing simulator https://jackalope.lucasnell.com

Address of the bookmark: https://github.com/lucasnell/jackalope

Scientist Positions at Rajiv Gandhi Centre for Biotechnology

Thu, 06 Feb 2014 23:18:49 -0600

Rajiv Gandhi Centre for Biotechnology

An Autonomous National Institute under Government of India,
Ministry of Science & Technology
Department of Biotechnology

No: RGCB/ Advt./2014/1
January 24, 2014

Scientist Positions

Group Leader in Computational Biology/Bioinformatics
A highly motivated and innovative individual who will pursue basic research, solve biological problems with emphasis on computational and quantitative experimental methods and build active bridges to translational research. The scientist will also provide computational biology support to analyze complex data sets generated by RGCB scientists and collaborators.

Location: Thiruvananthapuram (Trivandrum)

The above positions will be at the E-II, F or equivalent levels. For senior applicants with an outstanding track record, an option of a contract career path for research excellence at Scientist G or H equivalent level can also be discussed. All positions will initially be for 5 years. Essential and desired qualifications as well as other relevant details for all the above positions are posted on the RGCB website (http://www.rgcb.res.in). The last date for receiving applications is March 14, 2014.

Sd/-
Director

Rajiv Gandhi Centre for Biotechnology
Thycaud, P.O., Poojappura,
Thiruvananthapuram, Kerala, India-695 014
Ph.: 91-471-2529400 (30 Lines), 2347975, 2348104, 2348753, 2345899
Fax: 91-471-2348096, 2346333

More at http://rgcb.res.in/jobs.html

Kevler: Reference-free variant discovery in large eukaryotic genomes

Jit — Tue, 28 Jan 2020 03:21:53 -0600

Welcome to kevlar, software for predicting de novo genetic variants without mapping reads to a reference genome! kevlar's k-mer abundance based method calls single nucleotide variants (SNVs), multinucleotide variants (MNVs), insertion/deletion variants (indels), and structural variants (SVs) simultaneously with a single simple model.

More at https://kevlar.readthedocs.io/en/latest/

https://www.cell.com/iscience/pdf/S2589-0042(19)30259-7.pdf

Address of the bookmark: https://github.com/kevlar-dev/kevlar