DESeq2 is a powerful and widely-used R package that identifies differentially expressed genes (DEGs) from RNA-seq data. Whether you're comparing treated vs untreated samples, disease vs healthy conditions, or wild-type vs mutant strains, DESeq2 helps you statistically determine which genes are significantly up- or down-regulated.
What Does DESeq2 Do?
DESeq2 analyzes count data—the number of sequencing reads that map to each gene. It:
Normalizes the data to account for sequencing depth and library size.
Estimates variance (dispersion) for each gene.
Fits a model to compare groups (e.g., control vs treated).
Calculates fold-changes and p-values to determine significance.
Installing DESeq2
You can install DESeq2 via Bioconductor in R:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")
Inputs Needed
A count matrix: genes as rows, samples as columns (raw counts, not normalized).
A sample metadata table (also called colData): defines the condition/group for each sample.
Example:
# Count matrix (rows = genes, columns = samples)
counts <- read.csv("counts.csv", row.names = 1)# Sample metadata
colData <- data.frame(
row.names = colnames(counts),
condition = c("control", "control", "treated", "treated")
)DESeq2 Workflow
1. Load the package
library(DESeq2)
2. Create a DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = colData,
design = ~ condition)
3. Run the differential expression analysis
dds <- DESeq(dds)
4. Get the results
res <- results(dds)
head(res)
This gives a table with:log2FoldChange: how much expression changed
pvalue: statistical significance
padj: adjusted p-value (FDR corrected)
Visualization (Optional but Powerful)
MA Plot
plotMA(res, ylim = c(-2, 2))Volcano Plot (custom)
library(ggplot2)
res$significant <- res$padj < 0.05
ggplot(res, aes(x=log2FoldChange, y=-log10(padj), color=significant)) +
geom_point() +
theme_minimal()Heatmap of Top Genes
library(pheatmap)
topgenes <- head(order(res$padj), 20)
vsd <- vst(dds, blind=FALSE)
pheatmap(assay(vsd)[topgenes, ])Tips for Best Results
Use raw counts (not normalized or TPM/RPKM values).Have replicates: DESeq2 relies on variance estimates, so at least 3 per group is ideal.
Watch out for batch effects—include them in your design if needed (e.g., ~ batch + condition).
Summary
Step Purpose
DESeqDataSetFromMatrix() Load your data into DESeq2
DESeq() Run the differential expression analysis
results() Extract the output (log fold change, p-values, etc.)
plotMA() / ggplot2 / pheatmap Visualize the results
Final Thoughts
DESeq2 is an essential tool for RNA-seq data analysis. It abstracts away much of the complexity of statistical modeling, while still giving you control when needed. Whether you're a bioinformatician or a wet-lab biologist, DESeq2 offers both ease of use and analytical power.