Our Sponsors

Download BioinformaticsOnline(BOL) Apps in your chrome browser. BOLChromeApps


  • Bookmarks
  • Neelam Jha
  • ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and...

ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies


Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.

More at http://www.ncbi.nlm.nih.gov/pubmed/23303509


  • Jit 426 days ago

    Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines

    • CEGMA (formally discontinued but still useful)
    • BUSCO (we have issues with fish, seems not to be tailored to that group of organisms, developers tell us they are fixing it)
    • linkage map? or other map (RAD-tag based). (software?)
    • BioNanoGenomics can be used for QC also
    • Use a genome browser to get a feeling for your results, e.g. IGV; add assembly, BAM files, annotation, transcripts mapped and browse