I assembled the genome, by fragmenting(split) the read data in TWO set. After assembling both sets, I am just wondering what to do the next? How to validate? Is that everything is going alright?
I only use QUAST and it seems OK to me. Any other suggestions?
Note: I assembled the genome with MIRA
Answers
0
Here are the following steps I suggest:
1) map the reads on the contigs in order to determine the average coverage (and see whether 25% of the reads is an amount too low or actually OK in view of assembler/MIRA requirements. 2) plot the histogram of the per-base coverage to see whether it comprises two peaks (indicating that some alleles were not resolved) or a single peak; you can do this using SAMtools and BEDtools (specifically the GenomeCoverageBED function of BEDtools);
3) try to scaffold the contigs (if it turns out that MIRA separated the haplotypes better than DDN but produced shorted contigs, you could try to use SSPACE-long-reads or Bambus2 to scaffold the MIRA assembly using the DDN assembly).
Here are the following steps I suggest:
1) map the reads on the contigs in order to determine the average coverage (and see whether 25% of the reads is an amount too low or actually OK in view of assembler/MIRA requirements.
2) plot the histogram of the per-base coverage to see whether it comprises two peaks (indicating that some alleles were not resolved) or a single peak; you can do this using SAMtools and BEDtools (specifically the GenomeCoverageBED function of BEDtools);
3) try to scaffold the contigs (if it turns out that MIRA separated the haplotypes better than DDN but produced shorted contigs, you could try to use SSPACE-long-reads or Bambus2 to scaffold the MIRA assembly using the DDN assembly).