BOL: All site bookmarks

NearHGT

Jit — Wed, 22 Jun 2016 05:41:57 -0500

Horizontal gene transfer (HGT), the transfer of genetic material between organisms, is crucial for genetic innovation and the evolution of genome architecture. Existing HGT detection algorithms rely on a strong phylogenetic signal distinguishing the transferred sequence from ancestral (vertically derived) genes in its recipient genome. Detecting HGT between closely related species or strains is challenging, as the phylogenetic signal is usually weak and the nucleotide composition is normally nearly identical. Nevertheless, there is a great importance in detecting HGT between congeneric species or strains, especially in clinical microbiology, where understanding the emergence of new virulent and drug-resistant strains is crucial, and often time-sensitive.

We developed a novel, self-contained technique named Near HGT, based on the synteny index, to measure the divergence of a gene from its native genomic environment and used it to identify candidate HGT events between closely related strains. The method confirms candidate transferred genes based on the constant relative mutability (CRM). Using CRM, the algorithm assigns a confidence score based on “unusual” sequence divergence. A gene exhibiting exceptional deviations according to both synteny and mutability criteria, is considered a validated HGT product. We first employed the technique to a set of three E. coli strains and detected several highly probable horizontally acquired genes. We then compared the method to existing HGT detection tools using a larger strain data set.

When combined with additional approaches our new algorithm provides richer picture and brings us closer to the goal of detecting all newly acquired genes in a particular strain.

Availability: The method is publicly available athttp://research.haifa.ac.il/~ssagi/software/nearHGT.zip

Address of the bookmark: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004408

DarkHorse

Jit — Wed, 22 Jun 2016 05:37:38 -0500

DarkHorse is a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis. It works by selecting potential ortholog matches from a reference database of amino acid sequences, then using these matches to calculate a lineage probability index (LPI) score for each genome protein.

LPI scores are inversely proportional to the phylogenetic distance between database match sequences and the query genome. These scores are useful not only for large-scalede novo predictions of horizontally transferred proteins, but can also serve as an independent quality control test for potential horizontal transfer candidates identified by alternative methods, especially those based on nucleic acid signatures. Candidates having high LPI scores are unlikely to have been horizontally transferred, since they are highly conserved among closely related organisms.

One unique and powerful feature of the DarkHorse HGT Candidate database is the opportunity to explore the phylogenetic background of potential HGT donors as well as recipients. The breadth of the database allows not only query sequences, but also their database match partners to be evaluated for sequence similarity or novelty compared to taxonomically related organisms.

DarkHorse is configurable for varying degrees of phylogenetic granularity and protein sequence conservation. Users should consult the references cited below for a complete explanation of parameter selection and result interpretation. A brief tutorial page is also available on-line.

Address of the bookmark: http://darkhorse.ucsd.edu/download.html

clusterProfiler

Jit — Thu, 16 Jun 2016 18:57:03 -0500

statistical analysis and visulization of functional profiles for genes and gene clusters

Bioconductor version: Release (3.3)

This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.

Author: Guangchuang Yu with contributions from Li-Gen Wang and Giovanni Dall'Olio.

Maintainer: Guangchuang Yu

Citation (from within R, enter citation("clusterProfiler")):

Yu G, Wang L, Han Y and He Q (2012). “clusterProfiler: an R package for comparing biological themes among gene clusters.” OMICS: A Journal of Integrative Biology, 16(5), pp. 284-287.
Installation

To install this package, start R and enter:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("clusterProfiler")

https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

Anvio

Shruti Paniwala — Thu, 16 Jun 2016 18:15:41 -0500

In a nutshell

Anvi’o is an analysis and visualization platform for ‘omics data.

Please find the methods paper here: https://peerj.com/articles/1319/

Anvi’o would not have been possible without the help of many people who directly or indirectly contributed to its development. Here is the acknowledgements section of our methods paper

An analysis and visualization platform for 'omics data http://merenlab.org/projects/anvio

Paper https://peerj.com/articles/1839/

Address of the bookmark: https://github.com/meren/anvio

CNIDARIA: fast, reference-free phylogenomic clustering

Shruti Paniwala — Thu, 16 Jun 2016 17:55:17 -0500

Motivation: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but these do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.

Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on ge-nome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at supra-species level and 78% accuracy for species level.

Availability and Implementation: Cnidaria is written in C++ and Python and is available at http://www.ab.wur.nl/cnidaria.

Contact: Saulo Aflitos - sauloal@gmail.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Address of the bookmark: https://github.com/sauloal/cnidaria/wiki

CovCal: Coverage / Read Count Calculator

Jit — Wed, 15 Jun 2016 18:08:13 -0500

Coverage / Read Count Calculator

Calculate how much sequencing you need to hit a target depth of coverage (or vice versa).

Instructions: set the read length/configuration and genome size, then select what you want to calculate.

Written by Stephen Turner, based on the Lander-Waterman formula, inspired by a similar calculator written by James Hadfield. Coverage is calculated as C=LN/G and reads as N=CG/L where C = Coverage (X),L = Read length (bp), G = Haploid genome size (bp), and N = Number of reads. Source code on GitHub.

Address of the bookmark: http://apps.bioconnector.virginia.edu/covcalc/

LoRMA: a tool for correcting sequencing errors in long reads such those produced by Pacific Biosciences sequencing machines

Jit — Wed, 15 Jun 2016 17:18:36 -0500

LoRMA is a tool for correcting sequencing errors in long reads such those produced by Pacific Biosciences sequencing machines.

Publication:

L. Salmela, R. Walve, E. Rivals, and E. Ukkonen: Accurate selfcorrection of errors in long reads using de Bruijn graphs. Accepted to RECOMB-Seq 2016.

Download:

Address of the bookmark: https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/

Blobsplorer

Jit — Tue, 14 Jun 2016 10:28:58 -0500

Blobsplorer is a tool for interactive visualization of assembled DNA sequence data ("contigs") derived from (often unintentionally) mixed-species pools. It allows the simultaneous display of GC content, coverage, and taxonomic annotation for collections of contigs with a view to separating out those belonging to different taxa.

Blobsplorer is unlikely to be of use on its own as it requires contig data to be supplied in a format that involves considerable preprocessing (see below for a description). The easiest way to use Blobsplorer is as part of a workflow using scripts from here.

Address of the bookmark: http://nematodes.org/martin/blobsplorer/blobsplorer.html

GAEMR

Jit — Tue, 14 Jun 2016 06:18:37 -0500

The Genome Assembly Evaluation Metrics and Reporting (GAEMR) package is an assembly analysis framework composed a number of integrated modules. These modules can be executed as a single program to generate a complete analysis report, or executed individually to generate specific charts and tables. GAEMR standardizes input by converting a variety of read types to Binary Alignment Map (BAM) format, allowing a single input format to be entered into GAEMR’s analysis pipeline, hence enabling the generation of standard reports.

GAEMR’s analysis philosophy is centered on contiguity, correctness, and completeness -- how many pieces in an assembly composed of, how well those pieces accurately represent the genome sequenced, and how much of that genome is represented by those pieces. By performing over twenty different analyses based on these principles, GAEMR gives a clear picture of the condition of a genome assembly.

Address of the bookmark: https://www.broadinstitute.org/software/gaemr/

Blobology

Jit — Mon, 13 Jun 2016 10:18:33 -0500

Tools for making blobplots or Taxon-Annotated-GC-Coverage plots (TAGC plots) to visualise the contents of genome assembly data sets as a QC step

Blaxter Lab, Institute of Evolutionary Biology, University of Edinburgh

Goal: To create blobplots or Taxon-Annotated-GC-Coverage plots (TAGC plots) to visualise the contents of genome assembly data sets as a QC step.

This repository accompanies the paper:
Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Sujai Kumar, Martin Jones, Georgios Koutsovoulos, Michael Clarke, Mark Blaxter
(submitted 2013-10-01 to Frontiers in Bioinformatics and Computational Biology special issue : Quality assessment and control of high-throughput sequencing data).

It contains bash/perl/R scripts for running the analysis presented in the paper to create a preliminary assembly, and to create and collate GC content, read coverage and taxon annotation for the preliminary assembly, which can be visualised, such as Figure 2a from the paper showing TAGC plots/blobplots for Caenorhabditis sp. 5:

Address of the bookmark: https://github.com/blaxterlab/blobology