BOL: Related items

COCACOLA (binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge)

Jit — Tue, 07 Mar 2017 08:50:57 -0600

COCACOLA is a general framework that combines different types of information: sequence COmposition, CoverAge across multiple samples, CO-alignment to reference genomes and paired-end reads LinkAge to automatically bin contigs into OTUs. Furthermore, COCACOLA seamlessly embraces customized prior knowledge to facilitate binning accuracy.

News: Python version of COCACOLA is available now!

Address of the bookmark: https://github.com/younglululu/COCACOLA

Multigenome assembly

Jit — Tue, 14 Mar 2017 04:41:23 -0500

This project contains scripts and tutorials on how to assemble individual microbial genomes from metagenomes, as described in:

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes

Mads Albertsen, Philip Hugenholtz, Adam Skarshewski, Gene W. Tyson, Kåre L. Nielsen and Per .H. Nielsen

Nature Biotechnology 2013, doi: 10.1038/nbt.2579

See the associated online guide for detailed information.

https://github.com/MadsAlbertsen/multi-metagenome

Address of the bookmark: https://github.com/MadsAlbertsen/multi-metagenome

Enrichr: a comprehensive gene set enrichment analysis

Jit — Thu, 27 Apr 2017 05:42:09 -0500

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkw377

Address of the bookmark: http://amp.pharm.mssm.edu/Enrichr/

Download assemblies from NCBI

Bulbul — Mon, 15 May 2017 06:02:32 -0500

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.

More at https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/08/genome-data-download-made-easy/

An Introduction to Applied Bioinformatics

Jit — Fri, 02 Mar 2018 04:26:38 -0600

IAB is primarily being developed by Greg Caporaso(GitHub/Twitter: @gregcaporaso) in the Caporaso Lab at Northern Arizona University. You can find information on the courses I teach on my teaching website and information on my research and lab on my lab website.

Address of the bookmark: http://readiab.org/

Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome

BioStar — Wed, 22 Jul 2020 10:11:13 -0500

We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

More at https://pubmed.ncbi.nlm.nih.gov/20494974/

Address of the bookmark: http://www.sequenceontology.org/cgi-bin/soba.cgi

RNA-Seq Analysis: A Guide for Bioinformaticians

LEGE — Sat, 07 Dec 2024 22:22:24 -0600

RNA sequencing (RNA-Seq) has revolutionized transcriptomics, offering unprecedented insights into gene expression, splicing, and transcript diversity. For bioinformaticians, RNA-Seq analysis is a gateway to exploring the complexity of RNA biology and its implications in health and disease. This blog post provides an overview of RNA-Seq analysis, key computational steps, and tools for bioinformaticians eager to delve into this powerful technique.

What is RNA-Seq?

RNA-Seq is a next-generation sequencing (NGS) technology used to study the transcriptome—the complete set of RNA molecules in a cell. It quantifies gene expression, detects novel transcripts, and captures alternative splicing events with high sensitivity and resolution.

Workflow for RNA-Seq Analysis

RNA-Seq analysis involves several stages, each requiring computational tools and expertise.

1. Experimental Design and Data Acquisition

Before diving into analysis, bioinformaticians should consider:

Biological Replicates: Ensure statistical power to detect meaningful differences.
Sequencing Depth: Align sequencing depth to study objectives (e.g., higher depth for low-abundance transcripts).
Paired-End vs. Single-End: Paired-end sequencing provides more detailed information on transcript structure.

Once sequencing is complete, raw data is provided in FASTQ format, containing sequence reads and quality scores.

2. Quality Control and Preprocessing

Quality control (QC) ensures data integrity. Tools such as FastQC evaluate metrics like base quality, GC content, and adapter contamination.

Preprocessing Steps:

Trimming: Tools like Trimmomatic or Cutadapt remove low-quality bases and adapter sequences.
Filtering: Discard reads below a certain quality threshold or length.

3. Read Alignment

Reads are mapped to a reference genome or transcriptome to determine their origin. Alignment tools include:

HISAT2: Handles large genomes efficiently and supports spliced alignments.
STAR: High-speed aligner optimized for RNA-Seq.
Bowtie2: Suitable for short-read alignment.

Output: A SAM/BAM file containing aligned reads.

4. Transcript Assembly and Quantification

This step involves identifying transcripts and quantifying their expression levels. Tools used include:

StringTie: Assembles and quantifies transcripts from aligned reads.
Salmon/Kallisto: Perform pseudo-alignment for rapid and accurate quantification.

Expression levels are typically measured as TPM (transcripts per million) or FPKM (fragments per kilobase of transcript per million mapped reads).

5. Differential Expression Analysis

To identify genes with altered expression between conditions, bioinformaticians use tools such as:

DESeq2: Accounts for data normalization and variability.
edgeR: Handles overdispersed count data efficiently.
Limma-voom: Combines linear modeling with RNA-Seq count data.

The output includes a list of differentially expressed genes (DEGs) with statistical significance and fold-change values.

6. Functional Annotation and Pathway Analysis

Understanding the biological significance of DEGs involves:

Gene Ontology (GO) Analysis: Tools like DAVID or clusterProfiler categorize genes based on their biological functions.
Pathway Enrichment Analysis: Identifies pathways enriched in DEGs using tools like KEGG, Reactome, or GSEA.

7. Visualization

Visualizing results enhances interpretability. Common visualizations include:

Heatmaps: Show expression patterns across samples (e.g., pheatmap).
Volcano Plots: Highlight significant DEGs (e.g., ggplot2).
PCA/UMAP: Assess sample clustering and variability (e.g., Seurat).

Challenges in RNA-Seq Analysis

Batch Effects: Technical variability can confound biological signals. Combat this with normalization techniques or batch-correction tools like ComBat.
Low-Quality Samples: Poor-quality RNA impacts downstream analyses.
Computational Complexity: RNA-Seq generates massive datasets, requiring robust computing resources and optimized pipelines.

Key Tools and Resources

Bioconductor: A treasure trove of R packages for RNA-Seq analysis.
Galaxy: A web-based platform for running RNA-Seq workflows.
Nextflow/Snakemake: Workflow management tools to streamline analyses.

Applications of RNA-Seq

RNA-Seq is used in diverse research areas, including:

Cancer Transcriptomics: Identifying tumor-specific expression profiles.
Developmental Biology: Studying dynamic transcriptome changes.
Drug Discovery: Screening genes modulated by therapeutic compounds.

Conclusion

RNA-Seq analysis is a cornerstone of modern transcriptomics, offering bioinformaticians a versatile toolkit for unraveling gene expression and regulation. Mastering RNA-Seq workflows and tools empowers researchers to transform raw sequencing data into biological discoveries.

Whether you’re investigating disease mechanisms, exploring cellular pathways, or developing new therapeutics, RNA-Seq is a powerful ally in your bioinformatics arsenal.

Bioinformatics tools developed for Oxford Nanopore data analysis !

biogeek — Wed, 27 Dec 2017 20:47:30 -0600

MinION is the only portable real-time device for DNA and RNA sequencing. Each consumable flow cell can now generate 10–20 Gb of DNA sequence data. Ultra-long read lengths are possible (hundreds of kb) as you can choose your fragment length. One of the technical advantages of ONT data is the read length, which offers great prospects for genome assembly. Generally, assemblers are based on several different types of algorithms, such as greedy, overlap-layout-consensus (OLC), de Bruijn graph (DBG), and string graph.

List of analysis tools developed for Oxford Nanopore data

BWA
Fast nanopore data tuned alignment tool
https://github.com/lh3/bwa

GraphMap
Mapper for long and error-prone reads
https://github.com/isovic/graphmap

LAST
Nanopore tuned alignment tool
http://last.cbrc.jp/

LINKS
Software tool for long read scaffolding
https://github.com/warrenlr/LINKS/

marginAlign
Tools to align nanopore reads to a reference
https://github.com/benedictpaten/marginAlign

minoTour
Real time analysis tools
http://minotour.nottingham.ac.uk/

nanoCORR
Error-correction tool for nanopore sequence data
https://github.com/jgurtowski/nanocorr

NanoOK
Software for nanopore data, quality and error profiles
https://documentation.tgac.ac.uk/display/NANOOK/NanoOK

Nanopolish
Nanopore analysis and genome assembly software
https://github.com/jts/nanopolish

nanopore
Variant-detection tool for nanopore sequence data
https://github.com/mitenjain/nanopore

Nanocorrect
Error-correction tool for nanopore sequence data
https://github.com/jts/nanocorrect/

npReader
Real-time conversion and analysis of nanopore reads
https://github.com/mdcao/npReader

poRe
Tool for analyzing and visualizing nanopore data
https://sourceforge.net/p/rpore/wiki/Home/

PoreSeq
Error-correction and variant-calling software
https://github.com/tszalay/poreseq

Poretools
Nanopore sequence analysis and visualization software
https://github.com/arq5x/poretools

SSPACE-LongRead
Genome scaffolding tool
http://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE-longread

SMIS
Genome scaffolding tool
https://sourceforge.net/projects/phusion2/files/smis/

List of assemblers for Oxford Nanopore MinION long reads

LQS
DALIGNER, Celera OLC Nanocorrect,
Nanopolish corrector
https://github.com/jts/nanopolish

PBcR
HGAP or BLASR, Celera OLC
PBcR corrector
http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR
–
Canu
MHAP, Celera OLC
Canu corrector
https://github.com/marbl/canu

Falcon
String graph, Celera OLC
Falcon corrector
https://github.com/PacificBiosciences/falcon

Miniasm
OLC
https://github.com/lh3/miniasm

ra-integrate
OLC
https://github.com/mariokostelac/ra-integrate/

ALLPATHS-LG
de Bruijn graph
ALLPATHS-L corrector
https://www.broadinstitute.org/software/allpaths-lg/blog/?page_id=12

SPAdes
de Bruijn graph
SPAdes corrector
http://bioinf.spbau.ru/spades

Troyanskaya Lab

Tue, 04 Feb 2020 06:40:36 -0600

The goal of our research is to interpret and distill this complexity through accurate analysis and modeling of molecular pathways, particularly those in which malfunctions lead to the manifestation of disease. We are inventing integrative methods for systems-level pathway modeling through integrative analysis of genome-scale datasets. We apply these approaches in studying challenging biological problems, such as how pathways function in diverse cell types and how they change dynamically.

https://function.princeton.edu/

SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis

BioStar — Mon, 17 Aug 2020 05:21:09 -0500

SHAMAN is a shiny application for differential analysis of metagenomic data (16S, 18S, 23S, 28S, ITS and WGS) including bioinformatics treatment of raw reads for targeted metagenomics, statistical analysis and results visualization with a large variety of plots (barplot, boxplot, heatmap, …).
The bioinformatics treatment is based on Vsearch [Rognes 2016] which showed to be both accurate and fast [Wescott 2015].The statistical analysis is based on DESeq2 R package [Anders and Huber 2010] which robustly identifies the differential abundant features as suggested in [McMurdie and Holmes 2014] and [Jonsson2016]. SHAMAN robustly identifies the differential abundant genera with the Generalized Linear Model implemented in DESeq2 [Love 2014].
SHAMAN is compatible with standard formats for metagenomic analysis (.csv, .tsv, .biom) and figures can be downloaded in several formats. A presentation about SHAMAN is available here and a poster here.

More at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03666-4

Address of the bookmark: https://github.com/aghozlane/shaman