BOL: Related items

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

Shruti Paniwala — Wed, 01 Jun 2022 02:01:13 -0500

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

Address of the bookmark: https://github.com/schneebergerlab/syri

List of gene ontology software and tools

Jit — Sun, 09 Mar 2014 14:48:19 -0500

The Gene Ontology (GO) is a set of associations from biological phrases to specific genes that are either chosen by trained curators or generated automatically. GO is designed to rigorously encapsulate the known relationships between biological terms and and all genes that are instances of these terms. These Gene Ontology has become an extremely useful tool for the analysis of genomic data and structuring of biological knowledge. Several excellent software tools for navigating the gene ontology have been developed.

The GO provides core biological knowledge representation for modern biologists, whether computationally or experimentally based. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Although extensively used in data analysis workflows, and widely incorporated into numerous data analysis platforms and applications, the general user of GO resources often misses fundamental distinctions about GO structures, GO annotations, and what can and can not be extrapolated from GO resources. Here are ten quick tips for using the Gene Ontology.

Read "Ten Quick Tips for Using the Gene Ontology" at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003343

Following are the most commonly used old and new GO term enrichment determination tools. These tools are recommended to people working in a wet-lab.

CLASSIFI (Department of Pathology, UT Southwestern Medical Center)

CLASSIFI (Cluster Assignment for Biological Inference) is a data-mining tool that can be used to identify significant co-clustering of genes with similar functional properties (e.g. cellular response to DNA damage). Briefly, CLASSIFI uses the Gene OntologyTM (GO) gene annotation scheme to define the functional properties of all genes/probes in a microarray data set, and then applies a cumulative hypergeometric distribution analysis to determine if any statistically significant gene ontology co-clustering has occurred.

http://pathcuric1.swmed.edu/pathdb/classifi.html

EasyGO (China Agricultural University)

EasyGO is designed to automate enrichment job for experimental biologists to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers (with expression information for PAGE analysis). Also EasyGO is also a GO annotation database, especially focus on agronomical species, supporting 30 species. It is user friendly, with advanced result browsing format and in-time update.

http://bioinformatics.cau.edu.cn/neweasygo/

http://bioinformatics.cau.edu.cn/easygo/

g:GOSt (Institute of Computer Science, University of Tartu)

g:GOSt retrieves most significant Gene Ontology (GO) terms, KEGG and REACTOME pathways, and TRANSFAC motifs to a user-specified group of genes, proteins or microarray probes. g:GOSt also allows analysis of ranked or ordered lists of genes, visual browsing of GO graph structure, interactive visualisation of retrieved results, and many other features. Multiple testing corrections are applied to extract only statistically important results.

http://biit.cs.ut.ee/gprofiler/

DAVID : Gene Functional Classification (Laboratory of Immunopathogenesis and Bioinformatics, NIAID)

The Functional Classification Tool provides a rapid means to organize large lists of genes into functionally related groups to help unravel the biological content captured by high throughput technologies.

http://david.abcc.ncifcrf.gov/gene2gene.jsp

http://david.abcc.ncifcrf.gov/

API https://github.com/chrisamiller/davidapi

GOEAST (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences)

GOEAST is web based software toolkit providing easy to use, visualizable, comprehensive and unbiased Gene Ontology (GO) analysis for high-throughput experimental results, especially for results from microarray hybridization experiments. The main function of GOEAST is to identify significantly enriched GO terms among give lists of genes using accurate statistical methods.

http://omicslab.genetics.ac.cn/GOEAST/

GOstat (Walter and Eliza Hall Institute of Medical Research)

Find statistically overrepresented GO terms within a group of genes

http://gostat.wehi.edu.au/

GOrilla (Technion - Laboratory of Computational Biology , Israel Institute of Technology)

GOrilla is a tool for identifying and visualizing enriched GO terms in ranked lists of genes.
It uses two approaches, first by searching for enriched GO terms that appear densely at the top of a ranked list of genes or by searching for enriched GO terms in a target list of genes compared to a background list of genes.

GOrilla makes nice pictures !!!!

http://cbl-gorilla.cs.technion.ac.il/

Gene Ontology for Functional Analysis (GOFFA)

GOFFA is a tool developed for ArrayTrack™ that takes a list of genes and identifies terms in Gene Ontology (GO) disclaimer icon associated with those genes.

It provides several tools to view/access the GO term hierarchy, full listing of GO terms annotated with the genes associated with a given term with statically useful report.

http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm233315.htm

GOAT (The University of Manchester)

The aim of the GOAT project is to create an application that will guide users, especially biomedical researchers, in the annotation of gene products with terms from the Gene Ontology.

http://goat.man.ac.uk/

Script https://github.com/tanghaibao/goatools/

REVIGO ( Rudjer Boskovic Institute, Croatia)

REViGO is a web server that can take long lists of Gene Ontology terms and summarize them by removing redundant GO terms. The remaining terms can be visualized in semantic similarity-based scatterplots, interactive graphs, or tag clouds.

http://revigo.irb.hr/

QuickGo (EMBL-EBI Institute)

It uses extensive computational filters to allow the generation of specific subsets of GO annotations, mapped to sequence identifiers of your choice. Then GO slims are used which is collective list of GO full set of terms available from the Gene Ontology project.

http://www.ebi.ac.uk/QuickGO/

GOLEM

An interactive graph-based gene-ontology navigation and analysis tool. GOLEM is a userful tool which allows the viewer to navigate and explore a local portion of the Gene Ontology (GO) hierarchy.

http://reducio.princeton.edu/GOLEM/

BGI Web Gene Ontology (WEGO) Annotation Plot (Beijing Genomics Institute)

WEGO () is a useful tool for plotting GO annotation results. It has been widely used in many important biological research projects, such as the rice genome project [Yu, J. et al. Science 296, 79-92 (2002); Yu, J. et al. PLoS Biol 3, e38 (2005)] and the silkworm genome project [Xia, Q. et al. Science 306, 1937-40 (2004)]. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO along with two other tools, namely External to GO Query and GO Archive Query, are freely available for all users. Any suggestions are welcome at wego@genomics.org.cn. Here is a sample output generated by WEGO

http://wego.genomics.org.cn/cgi-bin/wego/index.pl

GeneGO MetaCore (MIT)

GeneGo is a leading provider of data mining & analysis solutions in systems biology. MetaCore, GeneGo's flapship product, is an integrated software suite for functional analysis of experimental data. MetaCore is based on a curated database of human protein-protein, protein-DNA interactions, transcription factors, signaling and metabolic pathways, disease and toxicity, and the effects of bioactive molecules.

https://portal.genego.com/

GOEx (Stony Brook University)

GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics.

http://pcarvalho.com/patternlab

GOssTo

GOssTo and GOssToWeb are tools to calculate the semantic similarity between genes or terms in the Gene Ontology.

http://www.paccanarolab.org/gosstoweb/

GO Workbench

The Gene Ontology Analysis Viewer allows direct browsing of the Gene Ontology, and also the visualization of GO Term analysis results.

http://wiki.c2b2.columbia.edu/workbench/index.php/Gene_Ontology_Viewer

Some other useful list of GO software and tools is available at http://www.geneontology.org/GO.tools.shtml#browser

Yet another useful webpage with list of GO tools at http://neurolex.org/wiki/Category:Resource:Gene_Ontology_Tools

HiCdat

Jit — Fri, 12 Feb 2016 05:23:44 -0600

HiCdat: a fast and easy-to-use Hi-C data analysis tool

HiCdat is easy-to-use and provides solutions starting from aligned reads up to in-depth analyses. Importantly, HiCdat is focussed on the analysis of larger structural features of chromosomes, their correlation to genomic and epigenomic features, and on comparative studies. It uses simple input and output formats and can therefore easily be integrated into existing workflows or combined with alternative tools.

More at http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0678-x

Address of the bookmark: https://github.com/MWSchmid/HiCdat

FUMA GWAS: Functional Mapping and Annotation of Genome-Wide Association Studies

Jit — Sat, 01 Jun 2019 03:11:16 -0500

FUMA is a platform that can be used to annotate, prioritize, visualize and interpret GWAS results.
The SNP2GENE function takes GWAS summary statistics as an input, and provides extensive functional annotation for all SNPs in genomic areas identified by lead SNPs.
The GENE2FUNC function takes a list of gene IDs (as identified by SNP2GENE or as provided manually) and annotates genes in biological context

Address of the bookmark: https://fuma.ctglab.nl/

GrAnnoT

LEGE — Sun, 31 Aug 2025 06:21:50 -0500

GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer linear genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.

Address of the bookmark: https://forge.ird.fr/diade/dynadiv/grannot

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates

Abhimanyu Singh — Thu, 16 May 2019 00:17:39 -0500

multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene-calling algorithm and assigns putative functions to gene calls using protein-, virus-, and phage-centric databases.

Address of the bookmark: https://github.com/carolzhou/multiPhATE

MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics

Jit — Wed, 13 Jan 2021 19:29:32 -0600

MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs. Metaeuk combines the fast and sensitive homology search capabilities of MMseqs2 with a dynamic programming procedure to recover optimal exons sets. It reduces redundancies in multiple discoveries of the same gene and resolves conflicting gene predictions on the same strand. MetaEuk is GPL-licensed open source software that is implemented in C++ and available for Linux and macOS. The software is designed to run on multiple cores.

Address of the bookmark: https://github.com/soedinglab/metaeuk

CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation

Shruti Paniwala — Thu, 26 May 2022 00:59:49 -0500

CrowdGO is a protein Gene Ontology predictor using a meta approach, analyzing the predictions of other tools in order to get an improved precision and recall.

Please note that the CrowdGO snakemake workflow is currently only tested on Ubuntu. It should work on OSX, but please report any errors to maarten.reijnders@unil.ch or create an issue.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010075

Address of the bookmark: https://gitlab.com/mreijnders/crowdgo

Omega2: metagenome assembly pipeline

Jit — Mon, 10 Jul 2017 05:56:07 -0500

Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.

Address of the bookmark: http://omega.omicsbio.org/

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm