BOL: Related items

Modular, efficient and constant-memory single-cell RNA-seq preprocessing

Jit — Mon, 05 Apr 2021 11:19:43 -0500

With kallisto | bustools you can

Generate a cell x gene or cell x transcript equivalence class count matrix
Perform RNA velocity and single-nuclei RNA-seq analsis
Quantify data from numerous technologies such as 10x, inDrops, and Dropseq.
Customize workflows for new technologies and protocols.
Process feature barcoding data such as CITE-seq, REAP-seq, MULTI-seq, Clicktags, and Perturb-seq.
Obtain QC reports from single-cell RNA-seq data

The kallisto | bustools workflow is described in:

Páll Melsted*, A. Sina Booeshaghi*, Lauren Liu, Fan Gao, Lambda Lu, Kyung Hoi (Joseph) Min, Eduardo da Veiga Beltrame, Kristján Eldjárn Hjörleifsson, Jase Gehring & Lior Pachter† Modular and efficient pre-processing of single-cell RNA-seq, Nature Biotechnology (2021).

Documentation and tutorials for the kallisto bustools workflow are available at http://pachterlab.github.io/kallistobustools.

https://www.nature.com/articles/s41587-021-00870-2

Address of the bookmark: https://pachterlab.github.io/kallistobustools/

GenXPro GmbH

Rahul Agarwal — Thu, 22 May 2014 07:18:35 -0500

GenXPro GMbH is service provider for entire spectrum of nucleotide-based information of any biological sample. By combining intelligent data reduction techniques and latest next generation sequencing technologies, our service portfolio provides most accurate and cost efficient solutions for transcriptomic-, genomic- or epigenomic research.

GENXPRO GMBH, ALTENHÖFERALLEE 3, 60438 FRANKFURT MAIN, GERMANY

Website: http://www.genxpro.info/products_and_services/

PHONE: +49 (0)69- 95 73 97 10, FAX: +49 (0)69- 95 73 97 06

EMAIL: info@genxpro.de

Rosalind Bioinformatics problems !!!

Abhi — Thu, 18 Dec 2014 10:32:48 -0600

Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.

http://rosalind.info/problems/list-view/

Address of the bookmark: http://rosalind.info/problems/list-view/

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

Yeshodari — Mon, 23 Oct 2017 02:39:24 -0500

Strand Life Sciences announces the release of Strand NGS v3.1 at ASHG 2017

ORLANDO, USA, Oct 17, 2017/ PRNewswire/

Strand NGS now supports large scale RNA- and small-RNA-Seq and Unique Molecular Identifiers (UMIs) for DNA-, RNA-, and small-RNA-Seq.

Strand Life Sciences announced the latest version release of its bioinformatics flagship product, Strand NGS, at the Annual Meeting of the American Society of Human Genetics today. Two major themes in Strand NGS v3.1 address recent challenges in next generation sequencing (NGS).

The first theme is large-scale RNA-Seq data analysis. Current cross-cohort RNA- and small-RNA-Seq studies span tens of replicates and batches across hundreds of samples, sometimes conducted across several different institutions. For such studies, Strand NGS v3.1 includes confounding variable analysis to eliminate technical effects, including batch effects; the t-SNE plot; profile and heat-map plots of gene-body coverage; and several other notable visual enhancements.

The second new feature is support for Unique Molecular Identifiers, or UMIs, for DNA-, RNA- and small-RNA-Seq. UMI support in Strand NGS is end-to-end, spanning alignment to variant calling in DNA-Seq, and alignment to quantification in RNA- and small-RNA-Seq. The Bioo Scientific, Qiagen, and Rubicon UMI protocols are natively supported, and an intuitive interface allows the specification of custom UMI protocols.

“For liquid biopsies and low-grade FFPE samples, UMI support in DNA-Seq enables the detection of somatic variants at low concentrations. In RNA-Seq, large-scale and UMI support can be used in single-cell-based studies that reveal tumor-cell heterogeneity, even at low concentrations”, says Dr. Vamsi Veeramachaneni, Chief Scientific Officer, Strand Life Sciences.

“At Strand, we are continuously working towards improving the accuracy and efficiency of NGS data analysis. Customers can look forward to Strand NGS becoming available on the cloud in the near future”, says Dr. Ramesh Hariharan, Chief Executive Officer, Strand Life Sciences.

Visit Strand Life Sciences at ASHG booth #1017 to know more about Strand NGS v3.1 and other products and service offerings from Strand Life Sciences. Click here to access detailed agenda and v3.1 release notes.

About Strand Life Sciences

Strand Life Sciences is a premier life science informatics innovation company. Founded in 2000, Strand is a leader in technology innovations for healthcare using genomics. By enhancing sequence-based diagnostics and clinical genomic data interpretation using a strong foundation of computational, scientific, and medical expertise, Strand is bringing individualized medicine to the world. To know more, visit www.strandls.com

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Neel — Wed, 11 Dec 2024 00:59:15 -0600

RNA sequencing (RNA-Seq) is a powerful technology used to study transcriptomes, providing insights into gene expression levels. However, raw RNA-Seq data requires normalization to account for sequencing depth and gene length, enabling accurate comparisons between genes and samples. Among the most widely used normalization methods are TPM (Transcripts Per Million), FPKM (Fragments Per Kilobase Million), and CPM (Counts Per Million). Each method has its unique principles and applications, which we’ll explore in this blog.

Why Normalize RNA-Seq Data?

Normalization is a crucial step in RNA-Seq analysis for the following reasons:

Sequencing depth: Different RNA-Seq experiments produce varying numbers of reads, making direct comparisons between samples misleading.
Gene length: Longer genes inherently generate more reads, irrespective of their actual expression level.
Bias reduction: Normalization mitigates technical biases, enabling meaningful biological interpretation.

TPM (Transcripts Per Million)

TPM measures the proportion of reads mapped to a transcript, normalized by transcript length and sequencing depth. It is calculated as:

Key Features:

Proportionality: TPM values sum to 1,000,000 across all transcripts in a sample, making it easier to compare between samples.
Intuitive interpretation: TPM values directly represent the abundance of transcripts in a sample.
Preferred for comparisons: TPM facilitates between-sample comparisons better than FPKM.

FPKM (Fragments Per Kilobase Million)

FPKM normalizes read counts by transcript length and sequencing depth, but without enforcing proportionality like TPM. It is defined as:

Key Features:

Historical significance: FPKM was one of the first normalization methods used for RNA-Seq.
Single-end vs. paired-end: In paired-end sequencing, FPKM becomes RPKM (Reads Per Kilobase Million).
Limited utility: FPKM values are not as robust as TPM for cross-sample comparisons due to lack of proportionality.

CPM (Counts Per Million)

CPM normalizes raw read counts by sequencing depth, without considering gene length. It is expressed as:

Key Features:

Simplicity: CPM is straightforward and computationally less intensive.
Application: Suitable for non-length-dependent analyses, such as comparing total expression levels or differential expression analysis.
Gene length agnostic: CPM does not correct for gene length, making it less ideal for measuring expression levels.

When to Use Each Method

TPM: Best for comparing expression levels between samples, especially when transcript length and sequencing depth vary.
FPKM: Useful for historical consistency but generally replaced by TPM.
CPM: Ideal for differential expression analysis when gene length normalization is unnecessary.

Conclusion

Choosing the right normalization method depends on the specific objectives of your RNA-Seq analysis. TPM’s proportionality and robustness make it the preferred choice for most applications, while CPM serves well for differential expression studies. Although FPKM paved the way for RNA-Seq normalization, it has largely been supplanted by TPM in modern workflows. Understanding these methods and their nuances ensures accurate and meaningful interpretations of RNA-Seq data.

References:

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics.
Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.
Law, C. W., et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

dupRadar package

Jit — Sun, 04 Feb 2018 14:28:57 -0600

The dupRadar package gives an insight into the duplication problem by graphically relating the gene expression level and the duplication rate present on it. Thus, failed experiments can be easily identified at a glance

Address of the bookmark: https://bioconductor.org/packages/3.7/bioc/vignettes/dupRadar/inst/doc/dupRadar.html

HiCdat

Jit — Fri, 12 Feb 2016 05:23:44 -0600

HiCdat: a fast and easy-to-use Hi-C data analysis tool

HiCdat is easy-to-use and provides solutions starting from aligned reads up to in-depth analyses. Importantly, HiCdat is focussed on the analysis of larger structural features of chromosomes, their correlation to genomic and epigenomic features, and on comparative studies. It uses simple input and output formats and can therefore easily be integrated into existing workflows or combined with alternative tools.

More at http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0678-x

Address of the bookmark: https://github.com/MWSchmid/HiCdat

KisSplice

Jit — Tue, 16 Aug 2016 08:34:19 -0500

KisSplice is a software that enables to analyse RNA-seq data with or without a reference genome. It is an exact local transcriptome assembler that allows to identify SNPs, indels and alternative splicing events. It can deal with an arbitrary number of biological conditions, and will quantify each variant in each condition. It has been tested on Illumina datasets of up to 1G reads. Its memory consumption is around 5Gb for 100M reads.

KisSplice is not a full-length transcriptome assembler. This means that it will output the variable regions of the transcripts, not reconstruct them entirely.

KisSplice comes as a workflow, with several possible post-treatments meant to facilitate the analysis of the results. The choice of the post-treatment depends on the availability of a reference genome/transcriptome and on the need to perform a differential analysis, as summarised in the following table.

Address of the bookmark: http://kissplice.prabi.fr/

Machine Learning !!!

Gudiya Pal — Fri, 01 Jul 2016 12:57:12 -0500

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions.

Keep scrolling. Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco.

Address of the bookmark: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

WiseScaffolder

Poonam Mahapatra — Wed, 13 Jul 2016 08:08:57 -0500

Function

WiseScaffolder is a stand-alone semi-automatic application for genome scaffolding of pre-assembled contigs using mate-pair data. It also produces editable scaffold maps, allowing either to build gapped scaffolds or usable as a common thread for the manual improvement of scaffolds.

Description

WiseScaffolder includes 4 subcommands: dumpconfig generates a configuration file that notably specifies the average insert size of the mate-pair library preprocess allows the detection and correction of chimerae, the estimation of contigs copy number and produces valuable outputs for the manual improvement of scaffolds scaffold constitutes the central scaffold-builder and comprises two modules:

i) the interative_scaffold_extender, which works with big, unambiguous contigs, or when they run out, single copy contigs, and

ii) the small_contig_inserter, which inserts the small contigs within scaffolds buildfasta converts the scaffold(s) map(s) into Fasta sequences.

Address of the bookmark: http://abims.sb-roscoff.fr/wisescaffolder