X BOL wishing you a very and Happy New year

Alternative content

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




  • Bookmarks
  • Jit
  • MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to...

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

https://github.com/marbl/MashMap

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Comments

  • Jit 2355 days ago

    ➜ mashmap-Linux64-v2.0 git:(master) ✗ ./mashmap -h
    -----------------
    Mashmap is an approximate long read or contig mapper based on Jaccard
    similarity
    -----------------
    Example usage:
    $ mashmap -r ref.fa -q seq.fq [OPTIONS]
    $ mashmap --rl reference_files_list.txt -q seq.fq [OPTIONS]

    Available options
    -----------------
    -h, --help
    Print this help page

    -r <value>, --ref <value>
    an input reference file (fasta/fastq)[.gz]

    --refList <value>, --rl <value>
    a file containing list of reference files, one per line

    -q <value>, --query <value>
    an input query file (fasta/fastq)[.gz]

    --ql <value>, --queryList <value>
    a file containing list of query files, one per line

    -s <value>, --segLength <value>
    mapping segment length [default : 5,000]
    sequences shorter than segment length will be ignored

    --noSplit
    disable splitting of input sequences during mapping [enabled by default]

    --perc_identity <value>, --pi <value>
    threshold for identity [default : 85]

    -t <value>, --threads <value>
    count of threads for parallel execution [default : 1]

    -o <value>, --output <value>
    output file name [default : mashmap.out]

    -k <value>, --kmer <value>
    kmer size <= 16 [default : 16]

    -f <value>, --filter_mode <value>
    filter modes in mashmap: 'map', 'one-to-one' or 'none' [default: map]
    'map' computes best mappings for each query sequence
    'one-to-one' computes best mappings for query as well as reference sequence
    'none' disables filtering