metagraph.ethz.ch - The MetaGraph framework is designed to work with a wide range of input data sets, indexing from a few samples up to the contents of entire archives with hundreds of thousands of records. The indexing workflow always follows the same principle,...
github.com - Cogent is a tool that identifies gene families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check assemblies for the presence of these known coding...
github.com - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq dataAfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good...
github.com - Filtlong is a tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the...
musket.sourceforge.net - Musket is a well-established leading next-generation sequencing read error correction algorithm targetting Illumina sequencing. This corrector employs the k-mer spectrum approach and introduces three correction techniques in a multistage...
ivory.idyll.org - DNA k-mers underlie much of our assembly work, and we (along with many others!) have spent a lot of time thinking about how to store k-mer graphs efficiently, discard redundant data, and count them efficiently.
More recently, we've...
github.com - Miniasm is a great long-read assembly tool: straight-forward, effective and very fast. However, it does not include a polishing step, so its assemblies have a high error rate – they are essentially made of stitched-together pieces of long...
github.com - Determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method...
csb5.github.io - LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or...