Basics of DESeq2: Differential Expression Made Simple

LEGE — Wed, 28 May 2025 06:47:32 -0500

DESeq2 is a powerful and widely-used R package that identifies differentially expressed genes (DEGs) from RNA-seq data. Whether you're comparing treated vs untreated samples, disease vs healthy conditions, or wild-type vs mutant strains, DESeq2 helps you statistically determine which genes are significantly up- or down-regulated.

What Does DESeq2 Do?
DESeq2 analyzes count data—the number of sequencing reads that map to each gene. It:

Normalizes the data to account for sequencing depth and library size.

Estimates variance (dispersion) for each gene.

Fits a model to compare groups (e.g., control vs treated).

Calculates fold-changes and p-values to determine significance.

Installing DESeq2

You can install DESeq2 via Bioconductor in R:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")

Inputs Needed

A count matrix: genes as rows, samples as columns (raw counts, not normalized).

A sample metadata table (also called colData): defines the condition/group for each sample.

Example:
# Count matrix (rows = genes, columns = samples)
counts <- read.csv("counts.csv", row.names = 1)
# Sample metadata
colData <- data.frame(
row.names = colnames(counts),
condition = c("control", "control", "treated", "treated")
)
DESeq2 Workflow
1. Load the package
library(DESeq2)
2. Create a DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = colData,
design = ~ condition)
3. Run the differential expression analysis
dds <- DESeq(dds)
4. Get the results
res <- results(dds)
head(res)
This gives a table with:
log2FoldChange: how much expression changed
pvalue: statistical significance
padj: adjusted p-value (FDR corrected)

Visualization (Optional but Powerful)

MA Plot
plotMA(res, ylim = c(-2, 2))
Volcano Plot (custom)
library(ggplot2)
res$significant <- res$padj < 0.05
ggplot(res, aes(x=log2FoldChange, y=-log10(padj), color=significant)) +
geom_point() +
theme_minimal()
Heatmap of Top Genes
library(pheatmap)
topgenes <- head(order(res$padj), 20)
vsd <- vst(dds, blind=FALSE)
pheatmap(assay(vsd)[topgenes, ])
Tips for Best Results
Use raw counts (not normalized or TPM/RPKM values).
Have replicates: DESeq2 relies on variance estimates, so at least 3 per group is ideal.
Watch out for batch effects—include them in your design if needed (e.g., ~ batch + condition).

Summary

Step Purpose
DESeqDataSetFromMatrix() Load your data into DESeq2
DESeq() Run the differential expression analysis
results() Extract the output (log fold change, p-values, etc.)
plotMA() / ggplot2 / pheatmap Visualize the results

Final Thoughts
DESeq2 is an essential tool for RNA-seq data analysis. It abstracts away much of the complexity of statistical modeling, while still giving you control when needed. Whether you're a bioinformatician or a wet-lab biologist, DESeq2 offers both ease of use and analytical power.

BOL: Basics of DESeq2: Differential Expression Made Simple

Basics of DESeq2: Differential Expression Made Simple