GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap...
npScarf (jsa.np.npscarf) is a program that connect contigs from a draft genomes to generate sequences that are closer to finish. These pipelines can run on a single laptop for microbial datasets. In real-time mode, it can be integrated with simple...
The Blast Extension and Assembly Program (BEAP) is a computer program that uses a short starting DNA fragment, often a EST or partial gene segment, as "primer", to recursively blast nucleotide databases in an attempt to obtain all sequences that...
We are pleased to release PRICE (Paired-Read Iterative Contig Extension), a de novo genome assembler implemented in C++. Its name describes the strategy that it implements for genome assembly: PRICE uses paired-read information to iteratively...
HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig...
Contigs assembled from the corresponding short reads in FASTA format.
The initial short reads in FASTA format (only for -ordinary mode; obtained with cat left_reads.fa >short_reads.fa and then cat right_reads.fa >>short_reads.fa).
Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads.
Cerulean v0.1 has been implemented with bacterial genomes in mind.
The method is fully described in...
PERGA - Paired End Reads Guided Assembler
PERGA is a novel sequence reads guided de novo assembly approach which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds. Instead of using single-end reads to construct...
EAGLER is a scaffolding tool for long reads. The scaffolder takes as input a draft genome created by any NGS assembler and a set of long reads. The long reads are used to extend the contigs present in the NGS draft and possibly join overlapping...
One code to find them all is a set of perl scripts to extract useful information from RepeatMasker about transposable elements, retrieve their sequences and get some quantitative information.
Assemble RepeatMasker hits into complete TE copies,...
MCMCTREE is a phylogenetic program for Bayesian estimation of species divergence times using soft fossil constraints under various molecular clock models. This is part of the PAML package. In this tutorial I will analyze an easy...
GLEAN is an unsupervised learning system to integrate disparate sources of gene structure evidence (gene model predictions, EST/protein genomic sequence alignments, SAGE/peptide tags, etc) to produce a consensus gene prediction, without prior...
Gblocks eliminates poorly aligned positions and divergent regions of a DNA or protein alignment so that it becomes more suitable for phylogenetic analysis. This server implements the most important features of the Gblocks program to make its...
There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like...
Ranbow is a haplotype assembler for polyploid genomes. It has been developed for the haplotype assembly of the hexaploid sweet potato genome, which is highly heterozygous. Ranbow can also be applied to other polyploid genomes. After a first phasing,...
BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes.
The BFC algorithm is a...
CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)).
It supports most commonly...