github.com - PyParanoid is a pipeline for rapid identification of homologous gene families in a set of genomes - a central task of any comparative genomics analysis. The "gold standard" for identifying homologs is to use reciprocal best hits (RBHs) which depends...
github.com - Next-gen sequence data such as Illumina HiSeq reads. Data must be sorted into folders by taxon (e.g. species or genus). Paired reads in fastq format must be specified by _R1 and _R2 in the (otherwise identical) filenames. Paired and unpaired reads...
ftp.genomics.org.cn - RePS (Repeat-masked Phrap with scaffolding), a WGS sequence assembler, that explicitly identifies exact kmer repeats from the shotgun data and removes them prior to the assembly. The established software Phrap is used to compute meaningful error...
github.com - GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is computationally efficient and designed to work with recent advances that allow hundreds or thousands of metagenome-assembled...
www.docker.com - dockr 0.8.6 is now available on CRAN. dockr is a minimal toolkit to build a lightweight Docker container image for your R package, in which the package itself is available. The Docker image seeks to mirror your R session as close as possible with...
omega.omicsbio.org - Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph...
github.com - Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream...
github.com - MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It...
www.ncbi.nlm.nih.gov - RGFA, an implementation of the proposed GFA specification in Ruby. It allows the user to conveniently parse, edit and write GFA files. Complex operations such as the separation of the implicit instances of repeats and the merging of linear paths can...
sourceforge.net - Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads.
Cerulean v0.1 has been implemented with bacterial genomes in mind.
The method is fully described in...