BOL: Related items

MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (v4) to detect variants.

Rahul Nayak — Wed, 27 Apr 2022 04:34:12 -0500

MUM&Co is able to detect:
Deletions, insertions, tandem duplications and tandem contractions (>=50bp & <=150kb)
Inversions (>=1kb) and translocations (>=10kb)

Address of the bookmark: https://github.com/SAMtoBAM/MUMandCo

Basics of DESeq2: Differential Expression Made Simple

LEGE — Wed, 28 May 2025 06:47:32 -0500

DESeq2 is a powerful and widely-used R package that identifies differentially expressed genes (DEGs) from RNA-seq data. Whether you're comparing treated vs untreated samples, disease vs healthy conditions, or wild-type vs mutant strains, DESeq2 helps you statistically determine which genes are significantly up- or down-regulated.

What Does DESeq2 Do?
DESeq2 analyzes count data—the number of sequencing reads that map to each gene. It:

Normalizes the data to account for sequencing depth and library size.

Estimates variance (dispersion) for each gene.

Fits a model to compare groups (e.g., control vs treated).

Calculates fold-changes and p-values to determine significance.

Installing DESeq2

You can install DESeq2 via Bioconductor in R:

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")

Inputs Needed

A count matrix: genes as rows, samples as columns (raw counts, not normalized).

A sample metadata table (also called colData): defines the condition/group for each sample.

Example:
# Count matrix (rows = genes, columns = samples)
counts <- read.csv("counts.csv", row.names = 1)
# Sample metadata
colData <- data.frame(
row.names = colnames(counts),
condition = c("control", "control", "treated", "treated")
)
DESeq2 Workflow
1. Load the package
library(DESeq2)
2. Create a DESeqDataSet object
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = colData,
design = ~ condition)
3. Run the differential expression analysis
dds <- DESeq(dds)
4. Get the results
res <- results(dds)
head(res)
This gives a table with:
log2FoldChange: how much expression changed
pvalue: statistical significance
padj: adjusted p-value (FDR corrected)

Visualization (Optional but Powerful)

MA Plot
plotMA(res, ylim = c(-2, 2))
Volcano Plot (custom)
library(ggplot2)
res$significant <- res$padj < 0.05
ggplot(res, aes(x=log2FoldChange, y=-log10(padj), color=significant)) +
geom_point() +
theme_minimal()
Heatmap of Top Genes
library(pheatmap)
topgenes <- head(order(res$padj), 20)
vsd <- vst(dds, blind=FALSE)
pheatmap(assay(vsd)[topgenes, ])
Tips for Best Results
Use raw counts (not normalized or TPM/RPKM values).
Have replicates: DESeq2 relies on variance estimates, so at least 3 per group is ideal.
Watch out for batch effects—include them in your design if needed (e.g., ~ batch + condition).

Summary

Step Purpose
DESeqDataSetFromMatrix() Load your data into DESeq2
DESeq() Run the differential expression analysis
results() Extract the output (log fold change, p-values, etc.)
plotMA() / ggplot2 / pheatmap Visualize the results

Final Thoughts
DESeq2 is an essential tool for RNA-seq data analysis. It abstracts away much of the complexity of statistical modeling, while still giving you control when needed. Whether you're a bioinformatician or a wet-lab biologist, DESeq2 offers both ease of use and analytical power.

Tools for Differential expression analysis

Abhi — Tue, 08 Nov 2022 03:40:33 -0600

apeglm - https://bioconductor.org/packages/release/bioc/html/apeglm.html

ashr - https://github.com/stephens999/ashr, https://cran.r-project.org/web/packages/ashr/index.html

consensusDE - https://bioconductor.org/packages/release/bioc/html/consensusDE.html

DESeq2 - https://bioconductor.org/packages/release/bioc/html/DESeq2.html

edgeR - https://bioconductor.org/packages/release/bioc/html/edgeR.html

limma - https://kasperdanielhansen.github.io/genbioconductor/html/limma.html https://bioconductor.org/packages/release/bioc/html/limma.html

MetaCycle - https://cran.r-project.org/web/packages/MetaCycle/index.html, https://github.com/gangwug/MetaCycle

RUVSeq - https://bioconductor.org/packages/release/bioc/html/RUVSeq.html

SARTools - https://github.com/PF2-pasteur-fr/SARTools

tximport - https://github.com/mikelove/tximport

AVID: A Global Alignment Program

Archana Malhotra — Wed, 24 May 2017 05:19:28 -0500

A new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long. We present numerous applications of the method, ranging from the comparison of assemblies to alignment of large syntenic genomic regions and whole genome human/mouse alignments. We have also performed a quantitative comparison of AVID with other popular alignment tools. To this end, we have established a format for the representation of alignments and methods for their comparison. These formats and methods should be useful for future studies. The tools we have developed for the alignment comparisons, as well as the AVID program, are publicly available. See Web Site References section for AVID Web address and Web addresses for other programs discussed in this paper.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC430967/

GffCompare: Program for processing GTF/GFF files

Jit — Tue, 09 Jul 2019 13:35:13 -0500

The program gffcompare can be used to compare, merge, annotate and estimate accuracy of one or more GFF files (the “query” files), when compared with a reference annotation (also provided as GFF).

Address of the bookmark: https://ccb.jhu.edu/software/stringtie/gffcompare.shtml

Opera: An optimal genome scaffolding program

Jit — Mon, 27 Nov 2017 10:18:20 -0600

Opera (Optimal Paired-End Read Assembler) is a sequence assembly program (http://en.wikipedia.org/wiki/Sequence_assembly ). It uses information from paired-end or long reads to optimally order and orient contigs assembled from shotgun-sequencing reads.

An updated version called OPERA-LG has been re-engineered with features for the assembly of large and complex genomes.

Song Gao, Denis Bertrand, Burton K. H. Chia and Niranjan Nagarajan. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biology, May 2016, doi: 10.1186/s13059-016-0951-y.

Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

Address of the bookmark: https://sourceforge.net/projects/operasf/

MCMCTREE: a phylogenetic program for Bayesian estimation of species divergence times

Poonam Mahapatra — Sat, 02 Jun 2018 07:40:06 -0500

MCMCTREE is a phylogenetic program for Bayesian estimation of species divergence times using soft fossil constraints under various molecular clock models. This is part of the PAML package. In this tutorial I will analyze an easy example modified from dataset of Inoue et al. (2010). Here we conduct a commonly used time estimation method, "Approximate Likelihood Method", for the datasets including more than 10 species.

Address of the bookmark: http://www.fish-evol.com/mcmctreeExampleVert6/text1Eng.html

kallisto: a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data

Jit — Mon, 07 Jan 2019 10:35:14 -0600

kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools. kallisto is described in detail in:

Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519

Address of the bookmark: https://pachterlab.github.io/kallisto/about

MFannot : a program for the annotation of mitochondrial and plastid genomes

Jit — Mon, 26 Aug 2019 11:47:56 -0500

MFannot is a program for the annotation of mitochondrial and plastid genomes

MFannot is a program for the annotation of mitochondrial and plastid genomes. It is a PERL wrapper around a set of diverse, external independent tools.

It makes intense use of RNA/intron detection tools including HMMER, Exonerate, Erpin and others.

http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl

Address of the bookmark: https://github.com/BFL-lab/Mfannot

IRscope: an online program to visualize the junction sites of chloroplast genomes

Neel — Wed, 25 Nov 2020 19:44:46 -0600

eMPRess, a software program for phylogenetic tree reconciliation under the duplication-transfer-loss model that systematically addresses the problems of choosing event costs and selecting representative solutions, enabling users to make more robust inferences.

Address of the bookmark: https://sites.google.com/g.hmc.edu/empress/home