MIRA follows for the ref based assembly (like mapping the reads on the reference with xxx, etc.) and if its possible to have a figure with the workflow.
Answers
1
Following are the steps, as suggested by @mira_talk
1. Reference sequences loaded and transformed into contigs, each contig containing exactly one read. 2. Sequence of each contig is cut into overlapping pieces (called rails). Length of rails: 2* longest read to map. Overlap: length of longest read to map. Rails are inserted as pseudo reads into contigs. 3. SKIM (fast overlap search) of reads to map against rails. 4. Smith-Waterman (SW) check of potential mapping sites, only best match of a read against rail is kept. 5. Multiple stage mapping: 5a Mapping of SW where the ends of the overlap are “clean”, i.e., no mismatch in the last 28bp left and right of the overlap. 5b Repeat 5a for 24, 20, 16, 12, 8, 4 bp 5c Map reads where SW overlap has one mismatch 5d Map reads where SW has one indel 5e Map reads where SW has two mismatches 5f Map reads with one mismatch, one indel 5g Map reads with two indels 5h Map reads with 3 difference (don’t bother about mismatch or indel). 5i Repeat 5h with 4, 5, 6, … n differences, until maximum allowed number of differences reached. 6. (Optional, but do the following exactly once:) If bootstrapping is used, use the now available alignment to calculate a consensus. This consensus is used to recalculate rail sequences which then contain the current best guess of the mapped sequence. Remove all mapped reads, go back to 5a 7. Remove rails from contigs. 8. Search for discrepancies between original reference sequences and mapped sequence, mark them with tags for SNPs. 9. Write results. 10. Done.
Following are the steps, as suggested by @mira_talk
1. Reference sequences loaded and transformed into contigs, each contig containing exactly one read.
2. Sequence of each contig is cut into overlapping pieces (called rails). Length of rails: 2* longest read to map. Overlap: length of longest read to map. Rails are inserted as pseudo reads into contigs.
3. SKIM (fast overlap search) of reads to map against rails.
4. Smith-Waterman (SW) check of potential mapping sites, only best match of a read against rail is kept.
5. Multiple stage mapping:
5a Mapping of SW where the ends of the overlap are “clean”, i.e., no mismatch in the last 28bp left and right of the overlap.
5b Repeat 5a for 24, 20, 16, 12, 8, 4 bp
5c Map reads where SW overlap has one mismatch
5d Map reads where SW has one indel
5e Map reads where SW has two mismatches
5f Map reads with one mismatch, one indel
5g Map reads with two indels
5h Map reads with 3 difference (don’t bother about mismatch or indel).
5i Repeat 5h with 4, 5, 6, … n differences, until maximum allowed number of differences reached.
6. (Optional, but do the following exactly once:) If bootstrapping is used, use the now available alignment to calculate a consensus. This consensus is used to recalculate rail sequences which then contain the current best guess of the mapped sequence. Remove all mapped reads, go back to 5a
7. Remove rails from contigs.
8. Search for discrepancies between original reference sequences and mapped sequence, mark them with tags for SNPs.
9. Write results.
10. Done.