BOL: Related items

MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping

Neel — Fri, 20 May 2016 18:53:49 -0500

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery.

Address of the bookmark: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581

MetaSim A Sequencing Simulator for Genomics and Metagenomics.

Jit — Mon, 04 Dec 2017 07:18:20 -0600

Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.

Address of the bookmark: http://ab.inf.uni-tuebingen.de/software/metasim/

Bioinformatics jobs at NIBMG

Wed, 16 May 2018 02:57:15 -0500

NIBMG are looking for bright and motivated people in our big projects on cutting edge biomedical genomics research

http://www.nibmg.ac.in/academic/SyMeC-ICGC/SyMeC%20&%20ICGC_May%202018.pdf

http://www.nibmg.ac.in/academic/plp/15_05_2018/AdvertisementMay2018.pdf

The Clark Lab

Fri, 07 Feb 2020 13:57:24 -0600

Study the process of Adaptive Evolution, during which species adopt novel traits to overcome challenges. We retrace the evolutionary histories of genomic elements to determine the changes underlying adaptation and to discover previously unknown genetic networks. These discoveries have already led to advances in human health, species conservation, and molecular biology.

More at http://clark.genetics.utah.edu/

The new corona variant has 23 mutations in all, which is unusually huge !

Shruti Paniwala — Wed, 23 Dec 2020 03:50:50 -0600

The new SARS-CoV-2 version, B.1.1.7, which was first seen in the third week of September in Kent and Greater London, has since spread to other locations in the UK. According to the COVID-19 Genomics UK Consortium (COG-UK Consortium) that analysed the genome data of the virus and identified the variant, the new variant has been spreading "rapidly" over the last four weeks and has now been detected in other locations in the UK, suggesting further spread of the variant in the region.

According to a preliminary report posted on December 19 by the COG-UK Consortium scientists, as of December 15, 1,623 variant genomes have been sequenced. In a December 21 tweet, COG-UK Consortium said that it added 2,963 more genome sequences of SARS-CoV-2, of which 942 (32%) belong to the new variant. The Consortium intends to sequence 20,000 more SARS-CoV-2 genomes in the next two weeks to further ascertain the spread of the variant.

There is no clear proof, at least not yet, that it does cause severe pandemic. But there is a justification for seriously taking the possibility. Another coronavirus lineage in South Africa has acquired one specific mutation that is also present in B.1.1.7. This variant is increasingly spreading across South Africa's coastal regions. And doctors have observed in preliminary research that individuals infected with this variant bear a higher viral load-a higher concentration of the virus in their upper respiratory tract. In many viral diseases, this is associated with more severe symptoms.

Nigerian Bioinformatics and Genomics Network (NBGN)

Tue, 31 Aug 2021 08:29:40 -0500

This is to announce the second official conference of the Nigerian Bioinformatics and Genomics Network (NBGN). October 11-13,2021 at Landmark University, Omu-Aran, Kwara State and Zoom ( conference link to be announced soon

#NBGN21

www.nbgn21conference.com

The Sheppard Lab

Fri, 09 Aug 2024 02:48:34 -0500

Ineos Oxford Institute of Antimicrobial Research – Department of Biology – University of Oxford

Our research centres on the use of genetics/genomics and phenotypic studies to address complex questions in the ecology, epidemiology and evolution of microbes. Our most recent interest focuses upon comparative genome analysis to describe the core and flexible genome of pathogenic bacteria (Campylobacter, Acinetobacter, Escherichia coli, Helicobacter, Staphylococcus and Streptococcus suis) and how this is related to population genetic structuring, the maintenance of species, and the evolution of host/niche adaptation and virulence.

More at https://sheppardlab.com/research/

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Neel — Wed, 11 Dec 2024 00:59:15 -0600

RNA sequencing (RNA-Seq) is a powerful technology used to study transcriptomes, providing insights into gene expression levels. However, raw RNA-Seq data requires normalization to account for sequencing depth and gene length, enabling accurate comparisons between genes and samples. Among the most widely used normalization methods are TPM (Transcripts Per Million), FPKM (Fragments Per Kilobase Million), and CPM (Counts Per Million). Each method has its unique principles and applications, which we’ll explore in this blog.

Why Normalize RNA-Seq Data?

Normalization is a crucial step in RNA-Seq analysis for the following reasons:

Sequencing depth: Different RNA-Seq experiments produce varying numbers of reads, making direct comparisons between samples misleading.
Gene length: Longer genes inherently generate more reads, irrespective of their actual expression level.
Bias reduction: Normalization mitigates technical biases, enabling meaningful biological interpretation.

TPM (Transcripts Per Million)

TPM measures the proportion of reads mapped to a transcript, normalized by transcript length and sequencing depth. It is calculated as:

Key Features:

Proportionality: TPM values sum to 1,000,000 across all transcripts in a sample, making it easier to compare between samples.
Intuitive interpretation: TPM values directly represent the abundance of transcripts in a sample.
Preferred for comparisons: TPM facilitates between-sample comparisons better than FPKM.

FPKM (Fragments Per Kilobase Million)

FPKM normalizes read counts by transcript length and sequencing depth, but without enforcing proportionality like TPM. It is defined as:

Key Features:

Historical significance: FPKM was one of the first normalization methods used for RNA-Seq.
Single-end vs. paired-end: In paired-end sequencing, FPKM becomes RPKM (Reads Per Kilobase Million).
Limited utility: FPKM values are not as robust as TPM for cross-sample comparisons due to lack of proportionality.

CPM (Counts Per Million)

CPM normalizes raw read counts by sequencing depth, without considering gene length. It is expressed as:

Key Features:

Simplicity: CPM is straightforward and computationally less intensive.
Application: Suitable for non-length-dependent analyses, such as comparing total expression levels or differential expression analysis.
Gene length agnostic: CPM does not correct for gene length, making it less ideal for measuring expression levels.

When to Use Each Method

TPM: Best for comparing expression levels between samples, especially when transcript length and sequencing depth vary.
FPKM: Useful for historical consistency but generally replaced by TPM.
CPM: Ideal for differential expression analysis when gene length normalization is unnecessary.

Conclusion

Choosing the right normalization method depends on the specific objectives of your RNA-Seq analysis. TPM’s proportionality and robustness make it the preferred choice for most applications, while CPM serves well for differential expression studies. Although FPKM paved the way for RNA-Seq normalization, it has largely been supplanted by TPM in modern workflows. Understanding these methods and their nuances ensures accurate and meaningful interpretations of RNA-Seq data.

References:

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics.
Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.
Law, C. W., et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

Compressive Genomics

Rahul Agarwal — Sun, 11 Aug 2013 11:13:58 -0500

The key to finding a solution is to notice that most genomicsequences differ by very little. It may well be that the number of complete genome sequences being stored is increasing rapidly, but the actual amount of new data is very small. In other words, a single DNA sequence isn't particularly compressible but a set of sequences shares so much in common that the redundancy can be used to store them in a much smaller storage space. (Source:e-article from Alex Armstrong)

http://www.i-programmer.info/news/181-algorithms/4537-a-new-dna-sequence-search-compressive-genomics.html

http://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data

http://www.nature.com/nbt/journal/v30/n7/full/nbt.2241.html

http://bioinformatics.oxfordjournals.org/content/29/13/i283.full

http://groups.csail.mit.edu/cb/cast/

Bioinformatics course and lectures

Rahul Agarwal — Tue, 03 Sep 2013 16:41:02 -0500

http://openwetware.org/wiki/User:Jarle_Pahr/Bioinformatics

Address of the bookmark: http://gtpb.igc.gulbenkian.pt/bicourses/index.html