Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Question: Question: What are the best approach that gave good results in genome projects?

Abhimanyu Singh
2355 days ago

Question: What are the best approach that gave good results in genome projects?

I am working on genome assembly, but wordering the optimised method to achieve the best result. Your comments and suggestions are welcome. 

Answers
0

One interesting approach that gave good results in other genome projects - do several cycles of SSPACE / GapCloser/ REAPR (to detect misassemblies and break them). Usually after 4-6 cycles this converge on an optimal assembly and the statistics (e.g. N50) do not change any more. Using this approach it should be possible to improve a bit further the current assembly and reach a N50 better than 100 kb, I think.

FIY, here is an excerpt from the M&M of one manuscript:
Raw sequences were trimmed for quality and the Illumina adapters were removed using Trimmomatic v0.32 (ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). For each lineage, the three short insert libraries were merged and assembled using IDBA-UD v1.1.1 (--pre_correction). Scaffolding was performed with SSPACE_Standard v3.0 using the 500 bp, 800 bp and 5 kb libraries (-x 0 -k 4 -a 0.70). The gaps in the resulting scaffolds were closed with the three short insert libraries using SOAPdenovo2’s GapCloser module (default options). Scaffolding errors were identified and broken with REAPR v1.0.17. The 800 bp library was used to identify region of high confidence (perfectfrombam 600 900 3 4 76) and the 5 kb library was used to identify assembly errors. SSPACE, GapCloser and REAPR were iteratively repeated 5 more times. The final assemblies completeness were assessed with BUSCO.