Our Sponsors

Download BioinformaticsOnline(BOL) Apps in your chrome browser.





Metagenomics sequencing projects collect samples of DNA from uncharacterized environments that may contain hundreds or even thousands of species. One of the main challenges in analyzing a metagenome is phylogenetic classification of raw sequence reads into groups representing the same or similar species. Such classification is a useful prerequisite for genome assembly and for analysis of the biological diversity present in a sample. The newest sequencing technologies have simultaneously made metagenomics easier, by making the sequencing process faster, and more difficult, by producing shorter read lengths than previous technologies. Methods for classifying sequences as short as 100 base pairs (bp) have until now been relatively inaccurate, requiring metagenomics projects to use older, long-read technologies. Phymm, a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, can accurately classify reads as short as 100 bp. Its accuracy for short reads represents a significant leap forward over previous composition-based classification methods. PhymmBL (rhymes with "thimble"), the hybrid classifier included in this distribution which combines analysis from both Phymm and BLAST, produces even higher accuracy.