github.com - Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies....
ratt.sourceforge.net - RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.
It was first developed to transfer annotations between different genome assembly versions. However, it can also transfer annotations between...
github.com - Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered.
Canu is...
bioinfo.lifl.fr - YASS is a genomic similarity search tool, for nucleic (DNA/RNA) sequences in fasta or plain text format (it produces local pairwise alignments). Like most of the heuristic pairwise local alignment tools for DNA sequences (FASTA, BLAST,...
cutadapt.readthedocs.io - Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the...
bitbucket.org - RCircos package provides a simple and flexible way to make Circos 2D track plots with R and could be easily integrated into other R data processing and graphic manipulation pipelines for presenting large-scale multi-sample genomic research data. It...
github.com - methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods such as Agilent...
github.com - Reads simulator
Wgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing...
cran.r-project.org - Most variant calling pipelines result in files containing large quantities of variant information. The variant call format (vcf) is an increasingly popular format for this data. The format of these files and their content is discussed in...