MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

Comment by Jit

Jit — Fri, 13 Jul 2018 17:21:42 -0500

➜ mashmap-Linux64-v2.0 git:(master) ✗ ./mashmap -h
-----------------
Mashmap is an approximate long read or contig mapper based on Jaccard
similarity
-----------------
Example usage:
$ mashmap -r ref.fa -q seq.fq [OPTIONS]
$ mashmap --rl reference_files_list.txt -q seq.fq [OPTIONS]

Available options
-----------------
-h, --help
Print this help page

-r , --ref
an input reference file (fasta/fastq)[.gz]

--refList , --rl
a file containing list of reference files, one per line

-q , --query
an input query file (fasta/fastq)[.gz]

--ql , --queryList
a file containing list of query files, one per line

-s , --segLength
mapping segment length [default : 5,000]
sequences shorter than segment length will be ignored

--noSplit
disable splitting of input sequences during mapping [enabled by default]

--perc_identity , --pi
threshold for identity [default : 85]

-t , --threads
count of threads for parallel execution [default : 1]

-o , --output
output file name [default : mashmap.out]

-k , --kmer
kmer size <= 16 [default : 16]

-f , --filter_mode
filter modes in mashmap: 'map', 'one-to-one' or 'none' [default: map]
'map' computes best mappings for each query sequence
'one-to-one' computes best mappings for query as well as reference sequence
'none' disables filtering

BOL: MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Comment by Jit