BOL: Related items

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Rahul Nayak — Fri, 24 Jan 2020 06:04:40 -0600

gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. They compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser.

gapFinisher can fill gaps in draft genomes quickly and reliably.

Address of the bookmark: https://github.com/kammoji/gapFinisher

Parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data

Jit — Thu, 28 May 2020 21:57:03 -0500

Parliament2 identifies structural variants in a given sample relative to a reference genome. These structural variants cover large deletion events that are called as Deletions of a region, Insertions of a sequence into a region, Duplications of a region, Inversions of a region, or Translocations between two regions in the genome.

Parliament2 runs a combination of tools to generate structural variant calls on whole-genome sequencing data. It can run the following callers: Breakdancer, Breakseq2, CNVnator, Delly2, Manta, and Lumpy. Because of synergies in how the programs use computational resources, these are all run in parallel. Parliament2 will produce the outputs of each of the tools for subsequent investigation.

Address of the bookmark: https://github.com/dnanexus/parliament2

Project Associate-I | Project Associate-II | Senior Project Associate @ IGIB

Thu, 05 Aug 2021 16:11:32 -0500

Experience in Next Generation Sequencing (NGS) application and interest in Genomics/ Clinical / Translational Applications. OR Good computational programming skills and deep interest in working on interface of Genomics and Clinical application.

Project Scientist-I
Experimental / Computation analysis experience in highthroughput genomics/ clinical application.

Project Manager
Experience in handling large biological projects involving high-throughput genomics/ clinical application.

Scientific Administrative Assistant
Lab Work.

More at https://vinodscaria.genomes.in/positionsopen

The "Ifs" and "Buts" of NGS Quality Control and Trimming

BioStar — Thu, 02 Jan 2025 20:11:07 -0600

Next-Generation Sequencing (NGS) has revolutionized biological research, providing vast amounts of data for a wide range of applications. However, the reliability of NGS analyses heavily depends on the quality of raw sequencing data. Quality control (QC) and trimming are critical preprocessing steps that can make or break your downstream analyses. In this blog, we explore the "ifs" (why you should perform QC and trimming) and the "buts" (challenges or considerations) of this vital step in NGS workflows.

The "Ifs" of NGS QC and Trimming

Ensures Data Integrity
If you want to minimize errors in downstream analyses, QC and trimming remove low-quality reads and bases, ensuring high-confidence data. This step is essential for reliable variant calling, assembly, and other applications.
Removes Contaminants
If adapter sequences or contaminants are present in the raw reads, trimming can eliminate them. This prevents issues like misalignment or incorrect biological interpretations, ensuring cleaner data for analysis.
Improves Mapping and Assembly
If your goal is better alignment to a reference genome or improved de novo assembly, trimming low-quality bases and adapters is critical. High-quality reads map more efficiently and generate more accurate assemblies.
Reduces Computational Load
If you want to save computational resources, trimming reduces the dataset size, which speeds up processing and analysis. Clean datasets mean less computational time spent on processing low-quality data.
Prepares for Standardized Analyses
If your project involves multiple datasets, QC and trimming ensure uniformity across them. This standardization makes comparisons valid and reproducible, particularly in large collaborative studies.

The "Buts" of NGS QC and Trimming

Risk of Over-Trimming
But excessive trimming can lead to the loss of informative sequences, reducing read depth and potentially discarding biologically relevant data. This is especially critical in studies with limited sequencing depth.
Bias Introduction
But trimming algorithms might introduce biases, especially if they inadvertently remove sequences with specific biological patterns. This can skew results and compromise biological insights.
Loss of Context in Paired-End Reads
But trimming one read in a pair more than the other can lead to loss of pairing information. This complicates downstream analyses that rely on paired-end data, such as structural variant detection.
Time and Resource Intensive
But running QC and trimming for large datasets can be computationally expensive and time-consuming. As sequencing depth increases, preprocessing becomes a bottleneck in the analysis pipeline.
Variable Standards
But the criteria for trimming (e.g., quality threshold, minimum read length) can vary between tools and datasets. This variability may affect reproducibility and comparability of results across studies.

Balancing the "Ifs" and "Buts"

To maximize the benefits of QC and trimming while mitigating the challenges, consider the following best practices:

Use QC Tools Wisely: Start with tools like FastQC to identify quality issues in your raw data. Visualizing quality metrics helps tailor your trimming parameters.
Choose Reliable Trimming Tools: Tools like Trimmomatic, Cutadapt, and BBduk offer adaptive and customizable trimming options. Select one that aligns with your dataset and project goals.
Set Reasonable Parameters: Avoid over-trimming by setting quality thresholds and minimum read lengths that balance data retention and quality improvement.
Test Downstream Effects: Validate the impact of QC and trimming on downstream analyses, such as alignment efficiency, variant calling accuracy, or assembly quality.
Document Your Workflow: Maintain detailed records of the parameters and tools used for QC and trimming. This ensures reproducibility and enables better troubleshooting.

Conclusion

NGS quality control and trimming are essential steps to ensure reliable and accurate data for analysis. While the "ifs" highlight the clear benefits of these steps, the "buts" remind us of the potential pitfalls. By adopting best practices and carefully balancing these considerations, you can optimize your preprocessing workflow and unlock the full potential of your sequencing data.

VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules

Rahul Nayak — Thu, 04 Oct 2018 16:30:44 -0500

VariantBam is a tool to extract/count specific sets of sequencing reads from next-generational sequencing files. To save money, disk space and I/O, one may not want to store an entire BAM on disk. In many cases, it would be more efficient to store only those read-pairs or reads who intersect some region around the variant locations. Alternatively, if your scientific question is focused on only one aspect of the data (e.g. breakpoints), many reads can be removed without losing the information relevant to the problem.

Address of the bookmark: https://github.com/broadinstitute/VariantBam

jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator

Abhimanyu Singh — Fri, 26 Jul 2019 00:58:12 -0500

jackalope simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. It can either read reference genomes from FASTA files or simulate new ones. Genomic variants can be simulated using summary statistics, phylogenies, Variant Call Format (VCF) files, and coalescent simulations—the latter of which can include selection, recombination, and demographic fluctuations. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. All outputs can be written to standard file formats.

A swift, versatile phylogenomic and high-throughput sequencing simulator https://jackalope.lucasnell.com

Address of the bookmark: https://github.com/lucasnell/jackalope

genomics public data links !

Jit — Thu, 13 Feb 2020 00:20:00 -0600

List of publically available databases on google server.

More at https://software.broadinstitute.org/gatk/download/bundle

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/GATK/.

ftp://ftp.broadinstitute.org/bundle/hg38/hg38bundle/

Address of the bookmark: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1

Online resources on must-read papers in evolutionary biology, for a literature club

Shruti Paniwala — Tue, 28 Jun 2022 07:29:08 -0500

1.       *Nick Barton:*

- The textbook "Evolution" by Nick Barton, with resources for
  exploring the literature: Barton, N. H., Briggs, D. E. G., Eisen, J.
  A., Goldstein, D. B., & Patel, N. H. (2007). Evolution. Cold Spring
  Harbor Laboratory Press.

- Papers from a course named "Classics in Evolutionary Biology":

Evolutionary Synthesis
1. Haldane, J. B. S. 1932. The causes of evolution. Longmans. New York.
   (esp. Ch. IV).
2. Fisher, R. A. 1930. The genetical theory of natural selection. Oxford
   University Press, Oxford. Selected Sections - Fundamental Theorem.

Genetic Variation
1a. Lewontin, R. C., and J. L. Hubby. 1966. A molecular approach to
the study of genic heterozygosity in natural populations. II. Amount
of variation and degree of heterozygosity in natural populations of
Drosophila pseudoobscura. Genetics. 54:595-609.

1b. Sachidandam et al. 2001. A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. 409: 928-33.

2. Wright S., Dobzhansky T., Hovanitz W. 1942 Genetics of natural
populations VII The allelism of lethals in the third chromosome of
Drosophila pseudoobscura. Genetics 27: 363-394.

Recombination and evolution
1. Hill, W. G., and A. Robertson. 1966. The effect of linkage on limits
to artificial selection. Genet. Res. 8:269-294.

2. Maynard Smith and Haigh. 1974. The hitch-hiking effect of a favourable
gene. Genet. Res. 23: 23-35.

Understanding sequence variation
1. Begun D. J., Aquadro C. F., 1992 Levels of naturally occurring DNA
polymorphism correlate with recombination rate in Drosophila melanogaster.
Nature 356: 519-520.

2. Green R. E., Reich D., Pääbo S., 2010 A draft sequence of the
Neandertal genome. Science 328: 710-722.

Quantitative Genetics:  variation in complex traits
1. Galton F., 1877 Typical laws of heredity. Nature 15: 492-495-
512-514- 532-533.

2. Turelli M., 1984 Heritable genetic variation via
mutation-selection balance: Lerch's Zeta meets the abdominal
bristle. Theor. Popul. Biol. 25: 138-193.

Quantitative Genetics:  finding the genes
1. Shrimpton A. E., Robertson A., 1988 The Isolation of polygenic factors
controlling bristle score in Drosophila melanogaster II Distribution of
third chromosome bristle effects within chromosome sections. Genetics
118: 445-459.

2. Boyle E. A., Li Y. I., Pritchard J. K., 2017 An expanded view of
complex traits: from polygenic to omnigenic. Cell 169: 1177-1186.

Neutral Evolution
1. Kimura, M. 1968. Evolutionary rate at the molecular level. Science.
217:624-626.

2a. Kern A. D., Hahn M. W., 2018 The Neutral Theory in Light of Natural
Selection. Molecular Biology and Evolution 110: 21077-6.

2b. Jensen J. D., Payseur B. A., Stephan W., Aquadro C. F., Lynch M.,
Charlesworth D., Charlesworth B., 2018 The importance of the Neutral Theory
in 1968 and 50 years on: a response to Kern and Hahn 2018. Evolution 112:
2109-4.

2c. Ellegren & Galtier. 2016. Determinants of genetic diversity. Nature
Reviews Genetics.

Mutation and Genetic Variability
1. Luria, S. E., and M. Delbrück. 1943. Mutations of Bacteria from Virus
Sensitivity to Virus Resistance. Genetics. 28(6):491-511.

2. Hill, W G. 1982. "Rates of Change in Quantitative Traits From Fixation
of New Mutations." Proceedings of the National Academy of Sciences (U.S.A.)
79: 142-45.

Testing for selection
1. McDonald & Kreitman. 1991. Adaptive protein evolution at the Adh locus
in Drosophila. Nature.

2. Begun, et al. Mol. Biol. Evol. 16, 1816-1819 (1999).

3. Siddiq et al. 2016. Experimental test and refutation of a classic case
of molecular adaptation in Drosophila melanogaster.  Nature Ecology &
Evolution.

The shifting balance
1. Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and
selection in evolution. Proceedings of the VI International Congress of
Genetics: 1. pp 356-366.

2. Coyne, J.A., N.H. Barton, and M. Turelli. 1997. A critique of Wright's
shifting balance theory of evolution.  Evolution 51: 643-671.

3. Barton. 2016. Sewall Wright on Evolution in Mendelian Populations and
the "Shifting Balance". Genetics.

Evolution of Sex
1.  Muller, H.J. 1964. The relation of recombination to mutational advance.
Mutation Res. 1(1):2-9

2. McDonald et al. 2016. Sex speeds adaptation by altering the dynamics of
molecular evolution. Nature.

Kin Selection, Cooperation, and Conflict
1. Hamilton, W. D. 1964. The genetical evolution of social behaviour I.
Journal of Theoretical Biology. 7:1-52.

2. Trivers, R. L. 1974 Parent-offspring conflict. American Zoologist.
14(1):249-264.

Sexual Selection
1. Zahavi, A. 1975. Mate selection - a selection of a handicap. J. Theor.
Biol. 53:205-214.

2. Kirkpatrick, M., and Ryan, M.J. 1991. The evolution of mating
preferences and the paradox of the lek. Nature. 350:33-38.

Fitness Landscapes
1. Dean, A. 1995. A Molecular Investigation of Genotype by Environment
Interactions. Genetics. 139:19-33.

2. Costanzo et al. 2010. The Genetic Landscape of a Cell. Science.

Speciation
1. Coyne, J. A., and H. A. Orr. 1989. Patterns of speciation in Drosophila.
Evolution. 43:362-381.

2. Corbett-Detig et al. 2013. Genetic incompatibilities are widespread
within species. Nature.

2.       *Marcos Antezana:*

Valen, L. v. 1975. Energy and Evolution. University of Chicago, Department
of Biology.

3.       *Remco Folkertsma:*

1. The work by Hopi Hoekstra on local adaptation and oldfield mice

2. Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., Müller,
I., ... & Wolf, J. B. (2014). The genomic landscape underlying phenotypic
integrity in the face of gene flow in crows. Science, 344(6190), 1410-1414.

4.       *Joshka Kaufmann and Leslie Turner*

They offer us a link to 'papers every evolutionary biologist should read',
the papers are collected by Leslie Turner.
https://static1.squarespace.com/static/53e8cb7ce4b02c4bc3aeeee4/t/5ab8fcb670a6ad55c67fcdf4/1522072758665/EvoBioClassicsRefList.pdf

5.       *Sarah Stockwell*

Matt Ridley collected classic papers in evolutionary biology and printed
part of these papers in his book Evolution (see Matt Ridley. Evolution
(Univ. of Oxford Press, 2nd edition, 2004))

NCBI Magic-BLAST

Jit — Tue, 14 Aug 2018 18:11:11 -0500

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.

Address of the bookmark: https://ncbi.github.io/magicblast/

LncPipe:A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets

LEGE — Fri, 17 Sep 2021 01:57:02 -0500

The pipeline was developed based on a popular workflow framework Nextflow, composed of four core procedures including reads alignment, assembly, identification and quantification. It contains various unique features such as well-designed lncRNAs annotation strategy, optimized calculating efficiency, diversified classification and interactive analysis report. LncPipe allows users additional control in interuppting the pipeline, resetting parameters from command line, modifying main script directly and resume analysis from previous checkpoint.

Ref https://www.lncrnablog.com/lncpipe-a-nextflow-based-pipeline-for-identification-and-analysis-of-long-non-coding-rnas-from-rna-seq-data/

Address of the bookmark: https://github.com/likelet/LncPipe