Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




  • Bookmarks
  • Neel
  • ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and...

ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies

http://sc932.github.io/ALE/about.html

Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.

More at http://www.ncbi.nlm.nih.gov/pubmed/23303509

Comments

  • Jit 3134 days ago

    Thanks for reporting the updated tool for assembly validation, you can also try following methods/pipelines

    • CEGMA (formally discontinued but still useful)
    • BUSCO (we have issues with fish, seems not to be tailored to that group of organisms, developers tell us they are fixing it)
    • linkage map? or other map (RAD-tag based). (software?)
    • BioNanoGenomics can be used for QC also
    • Use a genome browser to get a feeling for your results, e.g. IGV; add assembly, BAM files, annotation, transcripts mapped and browse