X BOL wishing you a very and Happy New year

Alternative content

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




  • Bookmarks
  • Jit
  • FastANI: fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI)

FastANI: fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI)

https://github.com/ParBLiSS/FastANI

FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. FastANI supports pairwise comparison of both complete and draft genome assemblies. Its underlying procedure follows a similar workflow as described by Goris et al. 2007. However, it avoids expensive sequence alignments and uses Mashmap as its MinHash based sequence mapping engine to compute the orthologous mappings and alignment identity estimates. Based on our experiments with complete and draft genomes, its accuracy is on par with BLAST-based ANI solver and it achieves two to three orders of magnitude speedup. Therefore, it is useful for pairwise ANI computation of large number of genome pairs. More details about its speed, accuracy and potential applications are described here: "High-throughput ANI Analysis of 90K Prokaryotic Genomes Reveals Clear Species Boundaries".

Comments

  • Neel 1332 days ago

    Usage Summary

    • Produce help page. Quickly check the software usage and available command line options.
    $ ./fastANI -h
    • One to One. Compute ANI between single query and single reference genome:
    $ ./fastANI -q [QUERY_GENOME] -r [REFERENCE_GENOME] -o [OUTPUT_FILE] 

    Here QUERY_GENOME and REFERENCE_GENOME are the query genome assemblies in fasta or multi-fasta format.

    • One to Many. Compute ANI between single query genome and multiple reference genomes:
    $ ./fastANI -q [QUERY_GENOME] --rl [REFERENCE_LIST] -o [OUTPUT_FILE]

    For above use case, REFERENCE_LIST should be a file containing directory paths to reference genomes, one per line.

    • Many to Many. When there are multiple query genomes and multiple reference genomes:
    $ ./fastANI --ql [QUERY_LIST] --rl [REFERENCE_LIST] -o [OUTPUT_FILE]

    Again, QUERY_LIST and REFERENCE_LIST are files containing paths to genomes, one per line.