MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.
Comments
➜ mashmap-Linux64-v2.0 git:(master) ✗ ./mashmap -h
-----------------
Mashmap is an approximate long read or contig mapper based on Jaccard
similarity
-----------------
Example usage:
$ mashmap -r ref.fa -q seq.fq [OPTIONS]
$ mashmap --rl reference_files_list.txt -q seq.fq [OPTIONS]
Available options
-----------------
-h, --help
Print this help page
-r <value>, --ref <value>
an input reference file (fasta/fastq)[.gz]
--refList <value>, --rl <value>
a file containing list of reference files, one per line
-q <value>, --query <value>
an input query file (fasta/fastq)[.gz]
--ql <value>, --queryList <value>
a file containing list of query files, one per line
-s <value>, --segLength <value>
mapping segment length [default : 5,000]
sequences shorter than segment length will be ignored
--noSplit
disable splitting of input sequences during mapping [enabled by default]
--perc_identity <value>, --pi <value>
threshold for identity [default : 85]
-t <value>, --threads <value>
count of threads for parallel execution [default : 1]
-o <value>, --output <value>
output file name [default : mashmap.out]
-k <value>, --kmer <value>
kmer size <= 16 [default : 16]
-f <value>, --filter_mode <value>
filter modes in mashmap: 'map', 'one-to-one' or 'none' [default: map]
'map' computes best mappings for each query sequence
'one-to-one' computes best mappings for query as well as reference sequence
'none' disables filtering