github.com - FastProNGS to integrate the quality control process with automatic adapter removal. Parallel processing was implemented to speed up the process by allocating multiple threads. Compared with similar up-to-date preprocessing tools, FastProNGS is by...
If we only had Illumina reads, we could also assemble these using the tool Spades.
You can try this here, or try it later on your own data.
Get data
We will use the same Illumina data as we used above:
illumina_R1.fastq.gz: the Illumina...
www.cs.helsinki.fi - LoRMA is a tool for correcting sequencing errors in long reads such those produced by Pacific Biosciences sequencing machines.
Publication:
L. Salmela, R. Walve, E. Rivals, and E. Ukkonen: Accurate selfcorrection of errors in long reads using de...
www.ncbi.nlm.nih.gov - We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia...
github.com - HECIL—Hybrid Error Correction with Iterative Learning—a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read...
github.com - Created by Stephen Johnson, Brett Trost, Dr. Jeffrey R. Long, Dr. Anthony Kusalik University of Saskatchewan, Department of Computer Science
BEAR is intended to be an easy-to-use collection of scripts for generating simulated WGS metagenomic reads...
www.bx.psu.edu - A computational pipeline for identifying single-base substitutions between two closely related genomes without the help of a reference genome. DIAL works even when the depth of coverage is insufficient for de novo assembly, and it can be extended to...
http://assemblytics.com/ - Download and install MUMmer
Align your assembly to a reference genome using nucmer (from MUMmer package)
$ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT
Consult the MUMmer manual if you encounter problems
Optional: Gzip...
http://last.cbrc.jp/ - LAST can:
Handle big sequence data, e.g:
Compare two vertebrate genomes
Align billions of DNA reads to a genome
Indicate the reliability of each aligned column.
Use sequence quality data properly.
Compare DNA to proteins, with...
github.com - Next-gen sequence data such as Illumina HiSeq reads. Data must be sorted into folders by taxon (e.g. species or genus). Paired reads in fastq format must be specified by _R1 and _R2 in the (otherwise identical) filenames. Paired and unpaired reads...