pyScaf orders contigs from genome assemblies utilising several types of information:
Scaffolding
In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.
Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:
--identity
and --overlap
)In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs
and ~2 min for ARATH on 16 CPUs
.
Important remarks: