BBSplit internally uses BBMap to map reads to multiple genomes at once, and determine which genome they match best. This is different than with ordinary mapping. If a genome (say, human) contains an exact repeat somewhere, reads mapping to it will...
Short Read Simulators
With the popularity of next-generation sequencing (NGS) technologies, many NGS read simulators have been developed. Currently, many of the popular short read simulators are designed to simulate reads mimicking many Illumina,...
github.com - proovread : large-scale high-accuracy PacBio correction through iterative short read consensus
outperforms PacBioToCA/LSC in terms of accuracy and contiguity/sensitivity (http://dx.doi.org/10.1093/bioinformatics/btu392)
is easy to...
github.com - This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch.
To use the latest version,...
github.com - rHAT is a seed-and-extension-based noisy long read alignment tool. It is suitable for aligning 3rd generation sequencing reads which are in large read length with relatively high error rate, especially Pacbio's Single Molecule Read-time (SMRT)...
github.com - URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA with comparable accuracy on several validation tests. On a Genome in a Bottle (GIAB) variant calling test with 30× coverage 2×150 reads, URMAP achieves...
github.com - MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The...
github.com - MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin)...
cov-lineages.org - The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. This website documents all current Pango lineages and their spread, as well...
crdd.osdd.net - RNAcon is a web-server for the prediction and classification of non-coding RNAs. It uses SVM-based model for the discrimination between coding and ncRNAs and RandomForest-based prediction model for the classification of ncRNAs into different...