Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




  • Bookmarks
  • Jit
  • LAMSA: fast split read alignment with long approximate matches

LAMSA: fast split read alignment with long approximate matches

https://github.com/hitbc/LAMSA

LAMSA (Long Approximate Matches-based Split Aligner) is a novel split alignment approach with faster speed and good ability of handling SV events. It is well-suited to align long reads (over thousands of base-pairs).

LAMSA takes takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming (SDP)-based split alignment approach to handle the large or non-co-linear variants.

We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; mean-while, it also has good ability to handle various categories of SVs.

LAMSA is open source and free for non-commercial use.

LAMSA is mainly designed by Bo Liu & Yan Gao and developed by Yan Gao in Center for Bioinformatics, Harbin Institute of Technology, China.

Comments

  • Jit 2114 days ago

    ➜ LAMSA git:(master) ./lamsa aln

    Usage: lamsa aln [options] <ref.fa> <read.fa/fq>

    Algorithm options:

    -t --thread [INT] Number of threads. [1]
    -l --seed-len [INT] Length of seeding fragments. [50]
    -i --seed-inv [INT] Distance between neighboring seeding fragments. [100]
    -p --max-loci [INT] Maximum allowed number of seeding fragments' hits. [200]
    -V --SV-len [INT] Expected maximum length of SV. [10000]
    -v --ovlp-rat [FLOAT] Minimum overlapping ratio to cluster two skeletons or alignment records.
    [0.70]
    -s --max-skel [INT] Maximum number of skeletons that are reserved in a cluster. [10]
    -R --max-reg [INT] Maximum allowed length of unaligned read part to trigger a bwt-based query.
    [300]
    -k --bwt-kmer [INT] Length of BWT-seed. [19]
    -f --fastest Use GEM-mapper's fastest mode(--fast-mapping=0). [false]

    Scoring options:

    -m --match-sc [INT] Match score for SW-alignment. [1]
    -M --mis-pen [INT] Mismatch penalty for SW-alignment. [3]
    -O --open-pen [INT(,INT,INT,INT)]
    Gap open penalty for SW-alignment(end2end-global: insertion, deletion,
    one-end-extend: insertion, deletion). [5(,5,5,5)]
    -E --ext-pen [INT(,INT,INT,INT)]
    Gap extension penalty for SW-alignment(end2end-global: insertion, deletion,
    one-end-extend: insertion, deletion). [2(,2,2,2)]
    -w --band-width[INT] Band width for banded-SW. [10]
    -b --end-bonus [INT] Penalty for end-clipping. [5]

    Read options:

    -e --err-rate [FLOAT] Maximum error rate of read. [0.04]
    -d --diff-rate [FLOAT] Maximum length difference ratio between read and reference. [0.04]
    -x --mis-rate [FLOAT] Maximum error rate of mismatch within reads. [0.04]

    -T --read-type [STR] Specifiy the type of reads and set multiple parameters unless overriden.
    [null] (Illumina Moleculo)
    pacbio (PacBio SMRT): -i25 -l50 -m1 -M1 -O1,1,2,2 -E1,1,1,1 -w200 -b0 -e0.30 -d0.30
    ont2d (Oxford Nanopore): -i25 -l50 -m1 -M1 -O1,1,1,1 -E1,1,1,1 -w100 -b0 -e0.25 -d0.10

    Output options:

    -r --max-out [INT] Maximum number of output records for a specific split read region. [10]
    -g --gap-split [INT] Minimum length of gap that causes a split-alignment. [100]
    -S --soft-clip Use soft clipping for supplementary alignment. [false]
    -C --comment Append FASTQ comment to SAM output. [false]
    -o --output [STR] Output file (SAM format). [stdout]

    -h --help Print this short usage.
    -H --HELP Print a detailed usage.