BOL: Related items

RNA-Seq Data Pathway and Gene-set Analysis Workflows

Jit — Fri, 25 Oct 2013 08:00:48 -0500

It describe the GAGE (Luo et al., 2009) /Pahview (Luo and Brouwer, 2013) workflows on RNA-Seq data pathway analysis and gene-set analysis. The gage package (2.12.0) now includes a new tutorial, “RNA-Seq Data Pathway and Gene-set Analysis Workflows“.

First cover a full workflow from preparation, reads counting, data preprocessing, gene set test, to pathway visualization in about 40 lines of codes. The same workflow can be used for GO analysis or other types of gene set analysis too. We also describe joint workflows, i.e. to do gene-level analysis using one of the major RNA-Seq analysis tools, DEseq/DEseq2, edgeR, limma and Cufflinks, and feed the results into GAGE/Pahview for pathway analysis or visualization. All these workflows are implemented in R/Bioconductor.

The work ows cover the most common situations and issues for RNA-Seq data pathway analysis. Issues like data quality assessment are relevant for data analysis in general yet out the scope of this tutorial. Although we focus on RNA-Seq data here, but pathway analysis work ow remains similar for microarray, particularly step 3-4 would be the same. Please check gage and pathview vigenttes for details.

Note: You need to update to current release versions of R(3.0.2)/ Bioconductor(2.13) to use all the features.

Reference:

Please check it out:
http://bioconductor.org/packages/release/bioc/html/gage.html
http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/RNA-seqWorkflow.pdf

Scallop: reference-based transcriptome assembler for RNA-seq

Rahul Nayak — Tue, 08 May 2018 04:23:27 -0500

Scallop is an accurate reference-based transcript assembler. Scallop features its high accuracy in assembling multi-exon transcripts as well as lowly expressed transcripts. Scallop achieves this improvement through a novel algorithm that can be proved preserving all phasing paths from reads and paired-end reads, while also achieves both transcripts parsimony and coverage deviation minimization.

Scallop paper has been published at Nature Biotechnology. The datasets and scripts used in this paper to compare the performance of Scallop and other assemblers are available at scalloptest.

Please also checkout the podcast about Scallop (thanks Roman Cheplyaka for the interview). It is available at both the bioinformatics chat and iTunes.

https://github.com/Kingsford-Group/scallop

Address of the bookmark: https://github.com/Kingsford-Group/scallop

Modular, efficient and constant-memory single-cell RNA-seq preprocessing

Jit — Mon, 05 Apr 2021 11:19:43 -0500

With kallisto | bustools you can

Generate a cell x gene or cell x transcript equivalence class count matrix
Perform RNA velocity and single-nuclei RNA-seq analsis
Quantify data from numerous technologies such as 10x, inDrops, and Dropseq.
Customize workflows for new technologies and protocols.
Process feature barcoding data such as CITE-seq, REAP-seq, MULTI-seq, Clicktags, and Perturb-seq.
Obtain QC reports from single-cell RNA-seq data

The kallisto | bustools workflow is described in:

Páll Melsted*, A. Sina Booeshaghi*, Lauren Liu, Fan Gao, Lambda Lu, Kyung Hoi (Joseph) Min, Eduardo da Veiga Beltrame, Kristján Eldjárn Hjörleifsson, Jase Gehring & Lior Pachter† Modular and efficient pre-processing of single-cell RNA-seq, Nature Biotechnology (2021).

Documentation and tutorials for the kallisto bustools workflow are available at http://pachterlab.github.io/kallistobustools.

https://www.nature.com/articles/s41587-021-00870-2

Address of the bookmark: https://pachterlab.github.io/kallistobustools/

GenXPro GmbH

Rahul Agarwal — Thu, 22 May 2014 07:18:35 -0500

GenXPro GMbH is service provider for entire spectrum of nucleotide-based information of any biological sample. By combining intelligent data reduction techniques and latest next generation sequencing technologies, our service portfolio provides most accurate and cost efficient solutions for transcriptomic-, genomic- or epigenomic research.

GENXPRO GMBH, ALTENHÖFERALLEE 3, 60438 FRANKFURT MAIN, GERMANY

Website: http://www.genxpro.info/products_and_services/

PHONE: +49 (0)69- 95 73 97 10, FAX: +49 (0)69- 95 73 97 06

EMAIL: info@genxpro.de

Rosalind Bioinformatics problems !!!

Abhi — Thu, 18 Dec 2014 10:32:48 -0600

Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.

http://rosalind.info/problems/list-view/

Address of the bookmark: http://rosalind.info/problems/list-view/

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

Yeshodari — Mon, 23 Oct 2017 02:39:24 -0500

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

ORLANDO, USA, Oct 17, 2017/ PRNewswire/

Strand NGS now supports large scale RNA- and small-RNA-Seq and Unique Molecular Identifiers (UMIs) for DNA-, RNA-, and small-RNA-Seq.

Strand Life Sciences announced the latest version release of its bioinformatics flagship product, Strand NGS, at the Annual Meeting of the American Society of Human Genetics today. Two major themes in Strand NGS v3.1 address recent challenges in next generation sequencing (NGS).

The first theme is large-scale RNA-Seq data analysis. Current cross-cohort RNA- and small-RNA-Seq studies span tens of replicates and batches across hundreds of samples, sometimes conducted across several different institutions. For such studies, Strand NGS v3.1 includes confounding variable analysis to eliminate technical effects, including batch effects; the t-SNE plot; profile and heat-map plots of gene-body coverage; and several other notable visual enhancements.

The second new feature is support for Unique Molecular Identifiers, or UMIs, for DNA-, RNA- and small-RNA-Seq. UMI support in Strand NGS is end-to-end, spanning alignment to variant calling in DNA-Seq, and alignment to quantification in RNA- and small-RNA-Seq. The Bioo Scientific, Qiagen, and Rubicon UMI protocols are natively supported, and an intuitive interface allows the specification of custom UMI protocols.

“For liquid biopsies and low-grade FFPE samples, UMI support in DNA-Seq enables the detection of somatic variants at low concentrations. In RNA-Seq, large-scale and UMI support can be used in single-cell-based studies that reveal tumor-cell heterogeneity, even at low concentrations”, says Dr. Vamsi Veeramachaneni, Chief Scientific Officer, Strand Life Sciences.

“At Strand, we are continuously working towards improving the accuracy and efficiency of NGS data analysis. Customers can look forward to Strand NGS becoming available on the cloud in the near future”, says Dr. Ramesh Hariharan, Chief Executive Officer, Strand Life Sciences.

Visit Strand Life Sciences at ASHG booth #1017 to know more about Strand NGS v3.1 and other products and service offerings from Strand Life Sciences. Click here to access detailed agenda and v3.1 release notes.

About Strand Life Sciences

Strand Life Sciences is a premier life science informatics innovation company. Founded in 2000, Strand is a leader in technology innovations for healthcare using genomics. By enhancing sequence-based diagnostics and clinical genomic data interpretation using a strong foundation of computational, scientific, and medical expertise, Strand is bringing individualized medicine to the world. To know more, visit www.strandls.com

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Neel — Wed, 11 Dec 2024 00:59:15 -0600

RNA sequencing (RNA-Seq) is a powerful technology used to study transcriptomes, providing insights into gene expression levels. However, raw RNA-Seq data requires normalization to account for sequencing depth and gene length, enabling accurate comparisons between genes and samples. Among the most widely used normalization methods are TPM (Transcripts Per Million), FPKM (Fragments Per Kilobase Million), and CPM (Counts Per Million). Each method has its unique principles and applications, which we’ll explore in this blog.

Why Normalize RNA-Seq Data?

Normalization is a crucial step in RNA-Seq analysis for the following reasons:

Sequencing depth: Different RNA-Seq experiments produce varying numbers of reads, making direct comparisons between samples misleading.
Gene length: Longer genes inherently generate more reads, irrespective of their actual expression level.
Bias reduction: Normalization mitigates technical biases, enabling meaningful biological interpretation.

TPM (Transcripts Per Million)

TPM measures the proportion of reads mapped to a transcript, normalized by transcript length and sequencing depth. It is calculated as:

Key Features:

Proportionality: TPM values sum to 1,000,000 across all transcripts in a sample, making it easier to compare between samples.
Intuitive interpretation: TPM values directly represent the abundance of transcripts in a sample.
Preferred for comparisons: TPM facilitates between-sample comparisons better than FPKM.

FPKM (Fragments Per Kilobase Million)

FPKM normalizes read counts by transcript length and sequencing depth, but without enforcing proportionality like TPM. It is defined as:

Key Features:

Historical significance: FPKM was one of the first normalization methods used for RNA-Seq.
Single-end vs. paired-end: In paired-end sequencing, FPKM becomes RPKM (Reads Per Kilobase Million).
Limited utility: FPKM values are not as robust as TPM for cross-sample comparisons due to lack of proportionality.

CPM (Counts Per Million)

CPM normalizes raw read counts by sequencing depth, without considering gene length. It is expressed as:

Key Features:

Simplicity: CPM is straightforward and computationally less intensive.
Application: Suitable for non-length-dependent analyses, such as comparing total expression levels or differential expression analysis.
Gene length agnostic: CPM does not correct for gene length, making it less ideal for measuring expression levels.

When to Use Each Method

TPM: Best for comparing expression levels between samples, especially when transcript length and sequencing depth vary.
FPKM: Useful for historical consistency but generally replaced by TPM.
CPM: Ideal for differential expression analysis when gene length normalization is unnecessary.

Conclusion

Choosing the right normalization method depends on the specific objectives of your RNA-Seq analysis. TPM’s proportionality and robustness make it the preferred choice for most applications, while CPM serves well for differential expression studies. Although FPKM paved the way for RNA-Seq normalization, it has largely been supplanted by TPM in modern workflows. Understanding these methods and their nuances ensures accurate and meaningful interpretations of RNA-Seq data.

References:

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics.
Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.
Law, C. W., et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

dupRadar package

Jit — Sun, 04 Feb 2018 14:28:57 -0600

The dupRadar package gives an insight into the duplication problem by graphically relating the gene expression level and the duplication rate present on it. Thus, failed experiments can be easily identified at a glance

Address of the bookmark: https://bioconductor.org/packages/3.7/bioc/vignettes/dupRadar/inst/doc/dupRadar.html