www.homolog.us - If genomes were completely random sequences in a statistical sense, 'overlap-consensus-layout' method would have been enough to assemble large genomes from Sanger reads. In contrast, real genomes often have long repetitive regions, and they are hard...
www.broadinstitute.org - Spines is a collection of software tools, developed and used by the Vertebrate Genome Biology Group at the Broad Institute. It provides basic data structures for efficient data manipulation (mostly genomic sequences, alignments, variation...
github.com - Simple ideogram plotting and annotation in R.
Basic usage:
Rscript Ideoplot.R --heatmap hm.bed --annotate annotations.bed --out ideogram.pdf -or- Rscript Ideoplot.R --annotate annotations.bed
Options
--ideobed, i A bed file of reference...
github.com - MUM&Co is able to detect:Deletions, insertions, tandem duplications and tandem contractions (>=50bp & <=150kb)Inversions (>=1kb) and translocations (>=10kb)
kakitone.github.io - FinisherSC, a repeat-aware and scalable tool for upgrading de novo assembly using long reads. Experiments with real data suggest that FinisherSC can provide longer and higher quality contigs than existing tools while maintaining high...
bitbucket.org - MetaBAT, An Efficient Tool for Accurately Reconstructing Single Genomes from Complex Microbial Communities
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome...
http://assemblytics.com/ - Download and install MUMmer
Align your assembly to a reference genome using nucmer (from MUMmer package)
$ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT
Consult the MUMmer manual if you encounter problems
Optional: Gzip...
qualimap.bioinfo.cipf.es - Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like...
github.com - DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
Our work is published in Scientific Reports:
Ye, C. et al. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous...
en.wikipedia.org - FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a...