The genome assemblers generally take a file of short sequence reads and a file of quality-value as the input. Since the quality-value file for the high throughput short reads is usually highly memory-intensive, only a few assemblers, best suited for your assembly. For the sake of computational memory saving and convenience of data inquiry, high-throughput short reads data is always initially formatted to specific data structure. Currently, existing data structure for this usage can be predominantly classified into two categories: string-based model and graph-based model.
We therefore list many genomle assembly tools here. We mainly reported for the assembly of genomes while the others are designed aiming at handling complex genomes.
The purpose of this Python module is help scientists use optical map data.
Once complete, it will encapsulate and abstractify optical maps and their most common manipulations as they exist in a variety of formats.
Lightweight resources assembly algorithm for high-throughput sequencing reads.
System requirements
64-bit machine with g++ compiler or gcc in general, pthreads,and zlib libraries.
QUAST evaluates genome assemblies.
QUAST works both with and without a reference genome.
The tool accepts multiple assemblies, thus is suitable for comparison.
DNA Assembly Benchmark for Nanopore long reads
A system for benchmarking DNA assembly tools, based on 3rd generation sequencers.
ARC is a pipeline which facilitates iterative, reference guided de novo assemblies with the intent of:
1.Reducing time in analysis and increasing accuracy of results by only considering those reads which should assemble together.
2.Reducing/removing reference bias as compared to mapping based approaches.
tex
directory). The assembly does not only encode SNPs and short INDELs, but also retains long deletions, novel sequence insertions, translocations and copy numbersREPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools. The overall pipeline is shown in the mannual file. REPdenovo supports the following main functionalities.
1.Assembly. This step performs k-mer counting. Then we find frequent k-mers whose frequencies are over certain threshold. We then assemble these frequent k-mers into consensus repeats (in the form of contigs). Then we merge the constructed contigs to more completeness ones.
2.Scaffolding. We use paired-end reads to connect repeat contigs into scaffolds, also provide the average coverage (indicates the copy number) for each constructed repeats.
A color space assembly must be translated into bases before applying bioinformatics analyses. SATRAP is designed to accomplish this important task adopting a very efficient strategy. The package integrates the Oases pipeline and several optimizations specifically designed for color space management. All steps of the pipeline allow to produce a SOLiD de novo transcriptome assembly and the subsequent color space translation. Alternatively, SATRAP can be used as a stand alone program to perform color space translation for either RNA-seq or DNA-seq SOLiD assemblies.
MAIA (Multiple Assembly IntegrAtion) is an algorithm to integrate multiple genome assemblies. For example, assemblies originating from:
– Different runs of a de novo assembler
– Assemblies of different data types
– Comparative assemblies
SEQLandscape is an application allowing the generation and visualization of a sequence landscape. HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly.