BOL: Related items

List of gene ontology software and tools

Jit — Sun, 09 Mar 2014 14:48:19 -0500

The Gene Ontology (GO) is a set of associations from biological phrases to specific genes that are either chosen by trained curators or generated automatically. GO is designed to rigorously encapsulate the known relationships between biological terms and and all genes that are instances of these terms. These Gene Ontology has become an extremely useful tool for the analysis of genomic data and structuring of biological knowledge. Several excellent software tools for navigating the gene ontology have been developed.

The GO provides core biological knowledge representation for modern biologists, whether computationally or experimentally based. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Although extensively used in data analysis workflows, and widely incorporated into numerous data analysis platforms and applications, the general user of GO resources often misses fundamental distinctions about GO structures, GO annotations, and what can and can not be extrapolated from GO resources. Here are ten quick tips for using the Gene Ontology.

Read "Ten Quick Tips for Using the Gene Ontology" at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003343

Following are the most commonly used old and new GO term enrichment determination tools. These tools are recommended to people working in a wet-lab.

CLASSIFI (Department of Pathology, UT Southwestern Medical Center)

CLASSIFI (Cluster Assignment for Biological Inference) is a data-mining tool that can be used to identify significant co-clustering of genes with similar functional properties (e.g. cellular response to DNA damage). Briefly, CLASSIFI uses the Gene OntologyTM (GO) gene annotation scheme to define the functional properties of all genes/probes in a microarray data set, and then applies a cumulative hypergeometric distribution analysis to determine if any statistically significant gene ontology co-clustering has occurred.

http://pathcuric1.swmed.edu/pathdb/classifi.html

EasyGO (China Agricultural University)

EasyGO is designed to automate enrichment job for experimental biologists to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers (with expression information for PAGE analysis). Also EasyGO is also a GO annotation database, especially focus on agronomical species, supporting 30 species. It is user friendly, with advanced result browsing format and in-time update.

http://bioinformatics.cau.edu.cn/neweasygo/

http://bioinformatics.cau.edu.cn/easygo/

g:GOSt (Institute of Computer Science, University of Tartu)

g:GOSt retrieves most significant Gene Ontology (GO) terms, KEGG and REACTOME pathways, and TRANSFAC motifs to a user-specified group of genes, proteins or microarray probes. g:GOSt also allows analysis of ranked or ordered lists of genes, visual browsing of GO graph structure, interactive visualisation of retrieved results, and many other features. Multiple testing corrections are applied to extract only statistically important results.

http://biit.cs.ut.ee/gprofiler/

DAVID : Gene Functional Classification (Laboratory of Immunopathogenesis and Bioinformatics, NIAID)

The Functional Classification Tool provides a rapid means to organize large lists of genes into functionally related groups to help unravel the biological content captured by high throughput technologies.

http://david.abcc.ncifcrf.gov/gene2gene.jsp

http://david.abcc.ncifcrf.gov/

API https://github.com/chrisamiller/davidapi

GOEAST (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences)

GOEAST is web based software toolkit providing easy to use, visualizable, comprehensive and unbiased Gene Ontology (GO) analysis for high-throughput experimental results, especially for results from microarray hybridization experiments. The main function of GOEAST is to identify significantly enriched GO terms among give lists of genes using accurate statistical methods.

http://omicslab.genetics.ac.cn/GOEAST/

GOstat (Walter and Eliza Hall Institute of Medical Research)

Find statistically overrepresented GO terms within a group of genes

http://gostat.wehi.edu.au/

GOrilla (Technion - Laboratory of Computational Biology , Israel Institute of Technology)

GOrilla is a tool for identifying and visualizing enriched GO terms in ranked lists of genes.
It uses two approaches, first by searching for enriched GO terms that appear densely at the top of a ranked list of genes or by searching for enriched GO terms in a target list of genes compared to a background list of genes.

GOrilla makes nice pictures !!!!

http://cbl-gorilla.cs.technion.ac.il/

Gene Ontology for Functional Analysis (GOFFA)

GOFFA is a tool developed for ArrayTrack™ that takes a list of genes and identifies terms in Gene Ontology (GO) disclaimer icon associated with those genes.

It provides several tools to view/access the GO term hierarchy, full listing of GO terms annotated with the genes associated with a given term with statically useful report.

http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm233315.htm

GOAT (The University of Manchester)

The aim of the GOAT project is to create an application that will guide users, especially biomedical researchers, in the annotation of gene products with terms from the Gene Ontology.

http://goat.man.ac.uk/

Script https://github.com/tanghaibao/goatools/

REVIGO ( Rudjer Boskovic Institute, Croatia)

REViGO is a web server that can take long lists of Gene Ontology terms and summarize them by removing redundant GO terms. The remaining terms can be visualized in semantic similarity-based scatterplots, interactive graphs, or tag clouds.

http://revigo.irb.hr/

QuickGo (EMBL-EBI Institute)

It uses extensive computational filters to allow the generation of specific subsets of GO annotations, mapped to sequence identifiers of your choice. Then GO slims are used which is collective list of GO full set of terms available from the Gene Ontology project.

http://www.ebi.ac.uk/QuickGO/

GOLEM

An interactive graph-based gene-ontology navigation and analysis tool. GOLEM is a userful tool which allows the viewer to navigate and explore a local portion of the Gene Ontology (GO) hierarchy.

http://reducio.princeton.edu/GOLEM/

BGI Web Gene Ontology (WEGO) Annotation Plot (Beijing Genomics Institute)

WEGO () is a useful tool for plotting GO annotation results. It has been widely used in many important biological research projects, such as the rice genome project [Yu, J. et al. Science 296, 79-92 (2002); Yu, J. et al. PLoS Biol 3, e38 (2005)] and the silkworm genome project [Xia, Q. et al. Science 306, 1937-40 (2004)]. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO along with two other tools, namely External to GO Query and GO Archive Query, are freely available for all users. Any suggestions are welcome at wego@genomics.org.cn. Here is a sample output generated by WEGO

http://wego.genomics.org.cn/cgi-bin/wego/index.pl

GeneGO MetaCore (MIT)

GeneGo is a leading provider of data mining & analysis solutions in systems biology. MetaCore, GeneGo's flapship product, is an integrated software suite for functional analysis of experimental data. MetaCore is based on a curated database of human protein-protein, protein-DNA interactions, transcription factors, signaling and metabolic pathways, disease and toxicity, and the effects of bioactive molecules.

https://portal.genego.com/

GOEx (Stony Brook University)

GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics.

http://pcarvalho.com/patternlab

GOssTo

GOssTo and GOssToWeb are tools to calculate the semantic similarity between genes or terms in the Gene Ontology.

http://www.paccanarolab.org/gosstoweb/

GO Workbench

The Gene Ontology Analysis Viewer allows direct browsing of the Gene Ontology, and also the visualization of GO Term analysis results.

http://wiki.c2b2.columbia.edu/workbench/index.php/Gene_Ontology_Viewer

Some other useful list of GO software and tools is available at http://www.geneontology.org/GO.tools.shtml#browser

Yet another useful webpage with list of GO tools at http://neurolex.org/wiki/Category:Resource:Gene_Ontology_Tools

HiCdat

Jit — Fri, 12 Feb 2016 05:23:44 -0600

HiCdat: a fast and easy-to-use Hi-C data analysis tool

HiCdat is easy-to-use and provides solutions starting from aligned reads up to in-depth analyses. Importantly, HiCdat is focussed on the analysis of larger structural features of chromosomes, their correlation to genomic and epigenomic features, and on comparative studies. It uses simple input and output formats and can therefore easily be integrated into existing workflows or combined with alternative tools.

More at http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0678-x

Address of the bookmark: https://github.com/MWSchmid/HiCdat

FUMA GWAS: Functional Mapping and Annotation of Genome-Wide Association Studies

Jit — Sat, 01 Jun 2019 03:11:16 -0500

FUMA is a platform that can be used to annotate, prioritize, visualize and interpret GWAS results.
The SNP2GENE function takes GWAS summary statistics as an input, and provides extensive functional annotation for all SNPs in genomic areas identified by lead SNPs.
The GENE2FUNC function takes a list of gene IDs (as identified by SNP2GENE or as provided manually) and annotates genes in biological context

Address of the bookmark: https://fuma.ctglab.nl/

GrAnnoT

LEGE — Sun, 31 Aug 2025 06:21:50 -0500

GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer linear genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.

Address of the bookmark: https://forge.ird.fr/diade/dynadiv/grannot

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Jit — Wed, 13 Jan 2021 19:29:32 -0600

MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs. Metaeuk combines the fast and sensitive homology search capabilities of MMseqs2 with a dynamic programming procedure to recover optimal exons sets. It reduces redundancies in multiple discoveries of the same gene and resolves conflicting gene predictions on the same strand. MetaEuk is GPL-licensed open source software that is implemented in C++ and available for Linux and macOS. The software is designed to run on multiple cores.

Address of the bookmark: https://github.com/soedinglab/metaeuk

CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation

Shruti Paniwala — Thu, 26 May 2022 00:59:49 -0500

CrowdGO is a protein Gene Ontology predictor using a meta approach, analyzing the predictions of other tools in order to get an improved precision and recall.

Please note that the CrowdGO snakemake workflow is currently only tested on Ubuntu. It should work on OSX, but please report any errors to maarten.reijnders@unil.ch or create an issue.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010075

Address of the bookmark: https://gitlab.com/mreijnders/crowdgo

Consed--A Finishing Package (BAM File Viewer, Assembly Editor, Autofinish, Autoreport, Autoedit, and Align Reads To Reference Sequence)

Neel — Fri, 07 Feb 2020 07:16:22 -0600

Supports Illumina, 454, other Next-Gen and Sanger Reads and allows mixtures of these read types
Consed includes BamScape which can view bam files with unlimited numbers of reads. BamScape can bring up consed to edit reads and the reference sequence in targeted regions.
Consed is compatible with Newbler, Cross_match, Phrap, MIRA, Velvet and PCAP output.
Quickly takes the user to each variant site for viewing (also available as an automated report)
Overview of assembly can help detect and fix misassemblies
Editing time reduced by the program's ability to pin-point problem areas
Editing is guided by error probabilities

Address of the bookmark: http://www.phrap.org/consed/consed.html

MGRA: Breakpoint graphs and ancestral genome reconstructions

Jit — Tue, 25 Jul 2017 08:48:25 -0500

MGRA (Multiple Genome Rearrangements and Ancestors) is a tool for reconstruction of ancestor genomes and evolutionary history of extant genomes.

It takes as an input a set of genomes represented as sequences of genes (or synteny blocks) and produces such sequences for ancestral genomes at the internal nodes of the phylogenetic tree.

The phylogenetic tree may be also specified completely or partially, in the latter case MGRA can reconstruct conserved ancestral regions (CARs) of the ancestral genome of interest.

Since version 2 MGRA supports gene insertion and deletions in addition to genome rearrangements and allows the input genomes to have different gene content.

It also can reconstruct most plausible phylogenetic tree based on the rearrangement characters.

Address of the bookmark: http://mgra.cblab.org/

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope