Reapr is a tool trying to find explicit errors in the assembly based on incongruently mapped reads. It is heavily based on too low span coverage, or reads mapping too far or too close to each other. The program will also break up contigs/scaffolds at spurious sites to form smaller (but hopefully correct) contigs. Reapr runs pretty slowly, sadly,
Reapr is a bit fuzzy with contig names, but luckily it’s given us a tool to check if things are ok before we proceed! The command reapr facheck <assembly.fasta>
will tell you if everything’s ok! in this case, no output is good output, since the only output from the command is the potential problems with the contig names. If you run into any problems, run reapr facheck <assembly.fasta> <renamed_assembly.fasta>
, and you will get an assembly file with renamed contigs.
Once the names are ok, we continue:
The first thing we reapr needs, is a list of all “perfect” reads. This is reads that have a perfect map to the reference. Reapr is finicky though, and can’t use libraries with different read lengths, so you’ll have to use assemblies based on the raw data for this. Run the command reapr perfectmap
to get information on how to create a perfect mapping file, and create a perfect mapping called <assembler>_perfect
. This should take about a minute.
The next tool we need is reapr smaltmap
which creates a bam file of read-pair mappings. Do the same thing you did with perfectmap
and create an output file called <assembler>_smalt.bam
. This should take about twenty minutes.
Finally we can use the smalt mapping, and the perfect mapping to run the reapr pipeline. Run reapr pipeline
to get help on how to run, and then run the pipeline. Store the results in reapr_<assembler>
. This should take about ten minutes.
There are several checks you can do after running Reapr (detailed here) but for now we’ll stick to looking at the split output file, called 04.break.broken_assembly.fa
. Use this file together with the original assembly to generate a quast report. How does the results look after reapr?
You can also try LAZER, it produces the same evaluation as Quast but is 5.6 times faster and use half the memory
Chromosomer – a reference-based genome arrangement tool, which rapidly builds chromosomes from genome contigs or scaffolds using their alignments to a reference genome of a closely related species. Chromosomer does not require mate-pair libraries and it offers a number of auxiliary tools that implement common operations accompanying the genome assembly process.
More at http://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0141-6
Script at https://github.com/gtamazian/chromosomer
MIRA5 is ready to use http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html