The central goal of the Earth BioGenome Project is to understand the evolution and organization of life on our planet by sequencing and functionally annotating the genomes of 1.5 million known species of eukaryotes
My research group consists primarily of computer science graduate students and postdocs with expertise in algorithms, statistical inferences and machine learning, and sharing a passion for understanding fundamental biological problems.
We work in...
github.com - This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch.
To use the latest version,...
gwct.github.io - Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site...
http://ggdc.dsmz.de/ - The GGDC web service reports digital DDH for a universal and accurate delineation of prokaryotic (sub-)species without inheriting the pitfalls of classic DDH, and also calculates differences in genomic G+C...
sanger-pathogens.github.io - Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of...