BOL: Related items

A 3D Map of the Human Genome

Fri, 12 Dec 2014 22:27:55 -0600

Suhas Rao and Miriam Huntley (of the Aiden Lab) describe a 3D map of the human genome at kilobase resolution, revealing the principles of chromatin looping. Guest Origami Folding: Sarah Nyquist. Suhas S.P. Rao*, Miriam H. Huntley*, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, Ido Machol, Arina D. Omer, Eric S. Lander, Erez Lieberman Aiden. (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell.

Genome Assembly Tools and Software - PART1 !!

Jit — Mon, 19 Dec 2016 18:09:22 -0600

The genome assemblers generally take a file of short sequence reads and a file of quality-value as the input. Since the quality-value file for the high throughput short reads is usually highly memory-intensive, only a few assemblers, best suited for your assembly. For the sake of computational memory saving and convenience of data inquiry, high-throughput short reads data is always initially formatted to specific data structure. Currently, existing data structure for this usage can be predominantly classified into two categories: string-based model and graph-based model.

We therefore list many genomle assembly tools here. We mainly reported for the assembly of genomes while the others are designed aiming at handling complex genomes.

TriMetAss 1.2 – The Trinity-based Iterative Metagenomics Assembler
- TriMetAss is an extension to the Trinity software [1], which can assemble select regions surrounding interesting features in metagenomic data. The software is particularly useful for very common and well-conserved genes (and – in theory – non-coding regions) that can occur in multiple contexts in the microbial community under study. It uses Vmatch [2] to extend seed reads (or contigs generated by another assembler) into longer contigs, by iteratively calling Vmatch and Trinity, until some stop criteria are met. Currently, TriMetAss lacks a thorough documentation, but you can direct questions to me if the README.txt file and the “-h” option is not sufficient to understand the software.
OMWare 1.0 – Efficient Assembly of Genome-wide Physical Maps
- The purpose of this Python module is help scientists use optical map data.
  Once complete, it will encapsulate and abstractify optical maps and their most common manipulations as they exist in a variety of formats.
LightAssembler – Lightweight Resources Assembly Algorithm
- Lightweight resources assembly algorithm for high-throughput sequencing reads.
  System requirements
  64-bit machine with g++ compiler or gcc in general, pthreads,and zlib libraries.
QUAST 4.1 – Quality Assessment Tool for Genome Assemblies
- QUAST evaluates genome assemblies.
  QUAST works both with and without a reference genome.
  The tool accepts multiple assemblies, thus is suitable for comparison.
DNA Baser 4.36 – DNA Sequence Assembly & Analysis
- DNA Sequence Assembler is revolutionary bioinformatics software for automatic DNA sequence assembly , DNA sequence analysis, contig editing, file format conversion and mutation detection.
COCACOLA – Binning Metagenomic Contigs using Sequence COmposition, Read CoverAge, CO-alignment, and Paired-end Read LinkAge
- COCACOLA: a general framework for binning contigs in metagenomic studies incorporating read COverage, CorrelAtion, sequence COmposition and paired-end read LinkAge
MaxBin 2.2 – Binning Assembled Metagenomic Sequences
- MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads.
GAML 0.1 – Genome Assembly by Maximum Likelihood
- GAML is a prototype genome assembly tool based on maximizing likelihood of the assembly in a model encompaasing error rate, insert length and other features of indvidual sequencing technologies. It can combine datasets produced by different technologies (currently Illumina, 454 and Pacific Biosciences).
NanoMark – DNA Assembly Benchmark for Nanopore long reads
- DNA Assembly Benchmark for Nanopore long reads
  A system for benchmarking DNA assembly tools, based on 3rd generation sequencers.
ARC 1.1.4-beta – Assembly by Reduced Complexity
- ARC is a pipeline which facilitates iterative, reference guided de novo assemblies with the intent of:
  1.Reducing time in analysis and increasing accuracy of results by only considering those reads which should assemble together.
  2.Reducing/removing reference bias as compared to mapping based approaches.
TransPS 1.1.0 – Transcriptome Post Scaffolding
- TransPS is a pipeline for post-processing of pre-assembled transcriptomes using reference based method. It applies an align-layout-consensus structure, consisting of three major stages. First, query sequences are aligned with a reference genome. Second, query sequences are ordered based on the alignment to the reference. Third, non-redundant sequences matched to the same gene of reference genome are scaffolded into one contig.
assemblyManager – Computing the Robotic Commands for 2ab Assembly
- Clotho provides persistence to such objects through relational databases that at least partially correspond the Clotho data model. Beyond database access and data model API support, Clotho Apps provide more specific functionality to Clotho such as viewing and editing data, running simulations, and automating various tasks. When thinking about Clotho Apps, an appropriate analogy would be Apps running on the Android operating system rather than the add-ons that extend the functionality of Firefox
BinPacker 1.1 – Packing-Based De Novo Transcriptome Assembly from RNA-seq Data
- BinPacker is a novel de novo assembler by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their corresponding isoforms by solving a series of bin-packing problems
FermiKit 0.13 – De novo Assembly based Variant Calling pipeline for Illumina Short Reads
- FermiKit is a de novo assembly based variant calling pipeline for deep Illumina resequencing data. It assembles reads into unitigs, maps them to the reference genome and then calls variants from the alignment to an accuracy comparable to conventional mapping based pipelines (see evaluation in the tex directory). The assembly does not only encode SNPs and short INDELs, but also retains long deletions, novel sequence insertions, translocations and copy numbers
REPdenovo – A tool to Construct Repeats directly from Raw Reads
- REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools. The overall pipeline is shown in the mannual file. REPdenovo supports the following main functionalities.
  1.Assembly. This step performs k-mer counting. Then we find frequent k-mers whose frequencies are over certain threshold. We then assemble these frequent k-mers into consensus repeats (in the form of contigs). Then we merge the constructed contigs to more completeness ones.
  2.Scaffolding. We use paired-end reads to connect repeat contigs into scaffolds, also provide the average coverage (indicates the copy number) for each constructed repeats.
Xander – Gene-targeted Metagenomic Assembler
- Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. We present a novel method for targeting assembly of specific protein-coding genes using a graph structure combining both de Bruijn graphs and protein HMMs. The inclusion of HMM information guides the assembly, with concomitant gene annotation.
SWAP-Assembler 2 – A scalable and fully parallelized Genome Assembler
- There is a growing gap between the output of new generation massively parallel sequencing machines and the ability to process and analyze the sequencing data. We present SWAP-Assembler, a scalable and fully parallelized genome assembler designed for massive sequencing data. Intend of using traditional de Bruijn Graph, SWAP-Assembler adopts multi-step bi-directed graph (MSG). With MSG, the standard genome assembly (SGA) is equivalent to the edge merging operations in a semi-group. Then a computation model, SWAP, is designed to parallelize semi-group computation. Experimental results showed that SWAP-Assembler is the fastest and most efficient assemblers ever, it can generated contigs with highest accuracy over all five selected assemblers and longest contig N50 in all selected parallel assemblers. Specially, in the scalability test, SWAP-Assembler can scales up to 1024 cores when processing Fish and Yanhuang dataset, and finishes the assembly work in only 15 and 29 minutes respecitively
TGNet – Visualization and Quality Assessment of de novo Genome Assemblies
- TGNet is a Cytoscape-based tool for visualization and quality assessment of de novo genome assemblies. Specifically it facilitates rapid detection of inconsistencies between a genome assembly and an independently derived transcriptome assembly.
Circlator 1.1.3 – A tool to Circularize Genome Assemblies
- A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript.
misFinder v0.4.05.05 – Identify Mis-assemblies in an unbiased manner using Reference and Paired-end Reads
- misFinder is a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Structure variation and mis-assembly can be detected by comparing the reference genome and assembled sequence.
Scaffold_builder v2.2 – Order Contigs generated by draft sequencing along a Reference Sequence
- The abundance of repeat elements in genomes can impede the assembly of a single sequence. The tool Scaffold_builder was designed to generate scaffolds (super contigs of sequences joined by N-bases) using the homology provided by a closely related reference sequence. Scaffold_builder is an advanced wrapper for Nucmer, written in Python that resolves several situations that may arise when mapping contigs to the reference genome.
Rnnotator 3.5.0 – de novo Transcriptome Assembly pipeline from stranded RNA-Seq reads
- Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. Rnnotator is an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. The contigs produced by Rnnotator are highly accurate and reconstruct full-length genes when transcripts are sequenced sufficiently deep, roughly 30X for a given transcript. Rnnotator was designed to assemble Illumina single or paired-end reads. Rnnotator is also able to incorporate strand-specific RNA-Seq reads into the assembly in order to further improve the assembly.
SATRAP 0.2 – SOLiD Assembler TRAnslation Program
- A color space assembly must be translated into bases before applying bioinformatics analyses. SATRAP is designed to accomplish this important task adopting a very efficient strategy. The package integrates the Oases pipeline and several optimizations specifically designed for color space management. All steps of the pipeline allow to produce a SOLiD de novo transcriptome assembly and the subsequent color space translation. Alternatively, SATRAP can be used as a stand alone program to perform color space translation for either RNA-seq or DNA-seq SOLiD assemblies.
Bandage v0.7.1 – Navigating De novo Assembly Graphs Easily
- Bandage is a program for visualising de novo assembly graphs. By displaying connections which are not present in the contigs file, Bandage opens up new possibilities for analysing de novo assemblies.
HapCol 1.1.1 – Haplotype Assembly from Long Gapless Reads
- A fast and memory-efficient method for haplotype assembly from long gapless reads, like those produced by SMRT sequencing technologies (PacBio RS II) and Oxford Nanopore flow cell technologies (MinION).
REAGO 1.1 – REconstruct 16S ribosomal RNA Genes from MetagenOmic data
- an assembly tool for 16S ribosomal RNA recovery from metagenomic data
FGAP 1.8.1 – Automated Gap Closing tool
- FGAP aims to improve genome sequences by merging alternative assemblies or incorporating alternative data, analyzing the gap region and indicating the best sequence to close the gap.
DETONATE 1.10 – DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation
- DETONATE consists of two component packages, RSEM-EVAL and REF-EVAL. Both packages are mainly intended to be used to evaluate de novo transcriptome assemblies, although REF-EVAL can be used to compare sets of any kinds of genomic sequences.
Trinity 2.1.1 – RNA-Seq De novo Assembly
- Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
IsoSCM 2.0.11 – Transcript Assembly tool using Multiple Change-point Inference to improve 3’UTR Annotation
- IsoSCM (Isoform Structural Change Model) is a new method for transcript assembly that incorporates change-point analysis to improve the 3′ UTR annotation process.
IVA 1.0.3 – Iterative Virus Assembler
- IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
SFA-SPA 0.2.1 – A Suffix Array based Short Peptide Assembler for Metagenomic Data
- SFA-SPA is a suffix array based short peptide assembler for metagenomic data
RAMPART 0.12.2 – A Workflow Management System for de novo Genome Assembly
- RAMPART is a de novo assembly pipeline that makes use of third party-tools and High Performance Computing resources. It can be used as a single interface to several popular assemblers, and can perform automated comparison and analysis of any generated assemblies
Celera Assembler 8.3 – Whole Genome Shotgun Assembler
- Celera Assembler (wgs-assembler) is scientific software for DNA research. It can reconstruct long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler has enabled discovery in microbial genomes, large eukaryotic genomes, diploid genomes, and genomes from environmental samples. Celera Assembler contributed the first diploid sequence of an individual human, and metagenomics assemblies of the Global Ocean Sampling
A5-miseq 20150522 – de novo Assembly & Analysis of Illumina Sequence data
- de novo assembly & analysis of Illumina sequence data, including the A5 pipeline, A5-miseq, tools to evaluate assembly quality, and scripts to facilitate data submission to NCBI and the RAST annotation system
Trans-ABySS 1.5.3 – Analyze ABySS multi-k-assembled Shotgun Transcriptome Data.
- Trans-ABySS is a software pipeline for analyzing ABySS-assembled contigs from shotgun transcriptome data. The pipeline accepts assemblies that were generated across a wide range of k values in order to address variable transcript expression levels. It first filters and merges the multi-k assemblies, generating a much smaller set of nonredundant contigs. It contains scripts that map assembled contigs to known transcripts, currently supporting Blat and Exonerate contig-to-genome aligners. It identifies novel splicing events like exon-skipping, novel exons, retained introns, novel introns, and alternative splicing. Its scripts can also estimate gene expression levels, identify candidate polyadenylation sites, and identify candidate gene-fusion events.
SAT-Assembler 20160120 – Scalable and Accurate Targeted Gene Assembly Tool
- SAT-Assembler can perform targeted gene assembly for both RNA-Seq and metagenomic data. It addresses the above challenges of de novo assembly of large scale NGS data by conducting family-specic gene assembly, homology-guided overlap graph construction, and careful graph traversal.
Opera 2.0.2 – Sequence Assembly Program
- Opera (Optimal Paired-End Read Assembler) is a sequence assembly program . It uses information from paired-end reads to optimally order and orient contigs assembled from shotgun-sequencing reads.
Sequencher 5.4.1 – DNA Sequence Assembly and Analysis
- Sequencher is the industry standard software for DNA sequence analysis. It works with all automated sequencers and is widely known for its lightning-fast contig assembly, short learning curve, user-friendly editing tools, and superb technical support. First released almost 15 years ago, Sequencher is currently used for sequence analysis tasks in every major genomic and pharmaceutical company as well as numerous academic and government labs in over 40 countries around the world. Life Science researchers use Sequencher for many diverse DNA sequence analysis applications including de novo gene sequencing, mutation detection, forensic human identification, systematics, and more.
Minia 2.0.3 – Short-read Assembler based on a de Bruijn graph
- Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day
MaSuRCA 3.1.3 – Whole Genome Short Read Assembler
- MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).
KmerGenie 1.6982 – K-mer size Selection for Genome Assembly
- KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie’s choices lead to assemblies that are close to the best possible over all k-mer lengths.
pilon v1.16 – Automated Assembly Improvement
- pilon uses read alignment analysis to diagnose, report, and automatically improve de novo genome assemblies.
Phred/Phrap/Consed 29.0 – DNA Sequence Assembler & Finishing Tools
- phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.
CLC Genomics Workbench 8.5.1 – Assembly & Analysis of Sequencing Data
- CLC Genomics Workbench, for analyzing and visualizing Next Generation Sequencing data, incorporates cutting-edge technology and algorithms, while also supporting and integrating with the rest of your typical NGS workflow.
Metassembler 1.5 – Combines multiple Whole Genome de novo Assemblies into a combined Consensus Assembly
- Metassembler is a software package for reconciling assemblies produced by de novo short-read assemblers such as SOAPdenovo and ALLPATHS-LG. The goal of assembly reconciliation, or “metassembly,” is to combine multiple assemblies into a single genome that is superior to all of its constituents
Tablet 1.15.09.01 – Next Generation Sequence Assembly Visualization
- Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine.
ABySS 1.9.0 – de novo, parallel, paired-end Sequence Assembler
- ABySS (Assembly By Short Sequences) is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
CLEAT 2.0 – Identifies 3′ UTR Ends of Transcripts in de novo RNA-Seq Assemblies
- CLEAT is a post-processing tool for CLEavage site Analysis of Transcriptomes. CLEAT is designed to work on trans-ABySS output.
StriDe – novel Assembler
- The StriDe Assembler integrates string and de Bruijn graph by decomposing reads within error-prone regions, while extending paire-end read into long reads for assembly through repetitive regions.
REAPR 1.0.18 – Genome Assembly Evaluation
- REAPR (Recognising Errors in Assemblies using Paired Reads) is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.
GapFiller 1.10 – Close Gaps within Pre-assembled Scaffolds
- GapFiller is a stand-alone program for closing gaps within pre-assembled scaffolds. It is unique in offering the possibility to manually control the gapclosure process. By using the distance information of paired-read data, GapFiller seeks to close the gap from each edge in an iterative manner. From a good number of tests we see the program yields excellent results both on bacterial en eukaryotic datasets. The command-line Perl script and additional files van be downloaded below. The input data is given by pre-assembled scaffold sequences (FASTA) and NGS paired-read data (FASTA or FASTQ).
SSAKE 3.8.4 – Assembling Millions of short DNA Sequences
- SSAKE is a genomics application for assembling millions of very short DNA sequences.SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets.
SGA 0.10.14 – String Graph Assembler
- SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.
r2cat – Synteny Plots & Comparative Assembly
- r2cat (related reference based contig arrangement tool) can be used to order a set of contigs with respect to a single reference genome. This is done by mapping the contigs onto the reference using a q-gram filter. The mapping is visualized in a synteny plot.
TASR 1.6 – Targeted Assembly of Sequence Reads
- TASR (Targeted Assembly of Sequence Reads) is a genomics application that allows hypothesis-based interrogation of genomic regions (sequence targets) of interest.
Rainbow v2.0.4 – Clustering and Assembling Short Reads, especially for RAD
- Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.
CAFTOOLS 2.0.2 – Tools for the Common Assembly Format (CAF)
- CAFTOOLS comprises a set of libraries and programs for manipulating DNA sequence assemblies using CAF files, a comprehensive representation of a sequence assembly as a text file.
Gap Resolution – Improving Newbler Genome Assemblies. Gap Resolution was developed by DOE Joint Genome Institute to improve Newbler genome assemblies by automating the closure of sequence gaps caused by repetitive regions in the DNA.
Meraculous 2.0.5 – De novo Genome Assembler from Short Reads
- Meraculous is a new algorithm for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis.
COPE 1.2.5 – Pair-end Reads Connection tool to facilitate Genome Assembly
- COPE (Connecting Overlapped Pair-End reads) is a method to align and connect the illumina sequenced Pair-End reads of which the insert size is smaller than the sum of the two read length.The connected reads can be used in genome assembly, resequencing and transcriptome research.
PEAR 0.9.6 – Pair-End reads AssembleR
- PEAR is an ultrafast, memory-efficient and highly accurate pair-end reads assembler. It is fully parallelized and can run with as low as just a few kilobytes of memory.
EBARDenovo 2.0.1 – Highly-accurate de novo Assembler of Paired-end RNA-Seq
- EBARDenovo is a highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.
EagleView 2.2 – Genome Assembler Viewer
- EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations. It provides an easy way for inspecting visually the quality of a genome assembly and validating polymorphism candidate sites (e.g., SNPs) reported by polymorphism discovery tools. It can also facilitate data interpretation and hypothesis generation.
MAIA 0.5 – Integrating Genome Assemblies
- MAIA (Multiple Assembly IntegrAtion) is an algorithm to integrate multiple genome assemblies. For example, assemblies originating from:
  – Different runs of a de novo assembler
  – Assemblies of different data types
  – Comparative assemblies
InteMAP 1.0 – Integrated Metagenomic Assembly pipeline for NGS Short Reads
- InteMAP is a pipeline which integrates individual assemblers for assembling metagenomic short sequencing reads.
MAP 20121108 – A de novo Metagenomic Assembly program for Shotgun DNA reads
- MAP (Metagenomic Assembly program) is a de novo assembly approach and its implementation based on an improved Overlap/Layout/Consensus (OLC) strategy incorporated with several special algorithms.MAP uses the mate pair information, resulting in being more applicable to shotgun DNA reads (recommended as > 200 bp) currently widely-used in metagenome projects. Results of extensive tests on simulated data show that MAP can be superior to both Celera and Phrap for typical longer reads by Sanger sequencing, as well as has an evident advantage over Celera, Newbler, and the newest Genovo, for typical shorter reads by 454 sequencing.
Phusion 2.1c – Assembly Genome Sequences from Whole Genome Shotgun(WGS) Reads
- Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads.
CodonCode Aligner 6.0.2 – DNA Sequence Assembly & Alignment
- CodonCode Aligner is a program for sequence assembly, contig editing, and mutation detection, available for Windows and Mac OS X. Aligner is compatible with Phred-Phrap and fully supports sequence quality scores, while offering a familiar, easy-to-learn user interface.
Cerulean 0.1.1 – Hybrid Genome Assembler
- Cerulean is a hybrid assembly using high throughput short and long reads
Ragout 1.2 – Tool for Reference-assisted Assembly
- Ragout (Reference-Assisted Genome Ordering UTility) is a tool for assisted assembly using multiple references. It takes a short read assembly (a set of contigs), a set of related references and a corresponding phylogenetic tree and then assembles the contigs into scaffolds.
laSV 1.0.2 – Local Assembly based Structural Variation Discovery tool
- laSV is a software that employs a local de novo assembly based approach to detect genomic structural variations from whole-genome high-throughput sequencing datasets.
SPAdes 3.6.2 – Single-cell Genome Assembler
- SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies.
PERGA 0.5.03.02 – Paired End Reads Guided Assembler
- PERGA is a novel sequence reads guided de novo assembly approach which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds.
Telescoper 0.2 – De novo Assembly Algorithm
- Telescoper is a local assembly algorithm designed for short-reads from NGS platforms such as Illumina. The reads must come from two libraries: one short insert, and one long insert.
MetaCompass 1.0 – Comparative Assembly of Metagenomic Sequences
- MetaCompass is a software package for comparative assembly of metagenomic reads. MetaCompass achieves comparable assembly performance to the state of the art de novo assemblers, but these two different approaches complement each other a lot. So combining contigs between MetaCompass and other independent de novo assemblers give us the best overall metagenomic assembly.
SCARF – Scaffolded and Corrected Assembly of Roche 454
- SCARF is a next-gen sequence assembly tool for evolutionary genomics. Designed especially for assembling 454 EST sequences against high quality reference sequences from related species.
MetaCAA – Assembly of Metagenomic Datasets
- MetaCAA is a sequence-assembly tool specifically intended for metagenomes.
Contiguity 1.0.4 – Contig Adjacency Graph Construction and Visualisation
- Contiguity is interactive software for the visualization and manipulation of de novo genome assemblies.
ScaffoldScaffolder 0.1 – Solving Contig Orientation via Bidirected to Directed Graph Reduction
- ScaffoldScaffolder is a stand-alone scaffolding algorithm which was designed specifically for scaffolding diploid genomes.
HaploClique 0.1 – Viral Quasispecies Assembly from Paired-end data
- HaploClique is a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples.
TAG 0.91 – Transcript Assembly by Mapping Reads to Graphs
- TAG is a tool for metatranscriptome assembly using de Bruijn graph of matched metagenome as the reference
EPGA2 – De Novo Assembler
- EPGA2 updates some modules in EPGA which can improve memory efficiency in genome asssembly.
GMcloser 1.5.1 / GMvalue 1.3 – Closing the Gaps in Scaffolds with Preassembled Contigs
- GMcloser fills and closes the gaps present in scaffold assemblies, especially those generated by the de novo assembly of whole genomes with next-generation sequencing (NGS) reads.
SLICEMBLER – Meta-assembler Designed for Ultra-deep Sequencing data
- SLICEMBLER is a meta-assembler designed for ultra-deep sequencing data
SEQLandscape v1 – Generation and Visualization of Sequence Landscape
- SEQLandscape is an application allowing the generation and visualization of a sequence landscape. HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly.
misSEQuel v1.0beta – Misassembly Detection in Draft Genomes
- misSEQuel is a software that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data.
Dawg 1.2 – Simulating Sequence Evolution
- Dawg (DNA Assembly with Gaps) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation.
BUSCO v1.1b1 – Assessing Genome Assembly and Annotation Completeness with Single-copy Orthologs
- BUSCO completeness assessment employs sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.
FinisherSC 2.0 – A Repeat-aware tool for upgrading de-novo Assembly using Long Reads
- FinisherSC is a repeat-aware and scalable tool for upgrading de-novo assembly using long reads.
WhatsHap – Haplotype Assembly for Future-Generation Sequencing Reads
- WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
Compartmentalized Assembler – Assembly of Physical Maps
- Compartmentalized assembler is a novel method for the assemlby of high quality physical maps from fingerprinted clones.
Elviz – Exploration of Metagenomic Assemblies
- Elviz (Environmental Laboratory Visualization) is an interactive web-based tool for the visual exploration of assembled metagenome data and their complex metadata.
SSP – de novo Transcriptome Assembler
- SSP is a de novo transcriptome assembler that assembles RNA-seq reads into transcripts. SSP aims to reconstructs all the alternatively spliced isoforms and estimates the expression level of them.
VirAmp – Galaxy-based Viral Genome Assembly pipeline
- VirAmp is a web-based semi-de novo fast virus genome assembly pipeline designed for extremely high coverage NGS data. VirAmp is a collection of existing tools, combined into a single Galaxy interface. Users without further computational knowledge can easily operate the pipeline.
aTRAM 1.04 – automated Target Restricted Assembly Method
- aTRAM performs targeted de novo assembly of loci from paired-end Illumina runs.
Ray 2.3.1 – Parallel Genome Assemblies for Parallel DNA sequencing
- Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data.
CAR – Contig Assembly of Prokaryotic Draft Genomes Using Rearrangements
- CAR is an efficient and more accurate tool for assembling contigs of a prokaryotic draft genome based on a reference genome.
VTBuilder – Assembly of Multi Isoform Transcriptomes
- VTBuilder is a tool for the inference of non-chimeric contigs from read data that has been sequenced from complex multi-isoformic transcriptomes, such as snake venom glands, or rapidly evolving viral populations, such as HIV-1.
TruHmm – TRanscription Unit Assembly by a Hidden Markov model
- TruHmm is a reference based transcriptome assembler for prokaryotes, and is suitable for assembling transcripts for directional RNA-seq library.
Bridger 20141201 – RNA-Seq Assembly
- Bridger is a new de novo transcriptome assembler which takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers.
GRASP 0.0.4 – Guided Reference-based Assembly of Short Peptides
- GRASP is a gene annotation tool for metagenomic studies. GRASP assembles the fragmented short-peptides, which are called from the NGS reads, and aligns the assembled contigs to the query reference protein. GRASP achieves much higher sensitivity than BLASTP for gene annotation purpose.
Cortex 1.05.21 – Genome Assembly and Variation Analysis
- Cortex is an efficient and low-memory software framework for analysis of genomes using sequence data. There are two main executables, being developed in parallel streams: cortex_con (primary contact Mario Caccamo) is for consensus genome assembly, and cortex_var (primary contact Zamin Iqbal) is for variation and population assembly.
MEGAHIT v0.1.4 – Large and Complex Metagenomics Assembly via Succinct de Bruijn graph
- MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph to achieve low memory usage, whereas its goal is not to make memory usage as low as possible.
CISA 20140304 – Contig Integrator for Sequence Assembly
- CISA has been developed to integrate the assemblies into a hybrid set of contigs, resulting in assemblies of superior contiguity and accuracy, compared with the assemblies generated by the state-of-the-art assemblers and the hybrid assemblies merged by existing tools
Cufflinks 2.2.1 – Transcript Assembler & Abundance Estimator for RNA-Seq
- Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
mapsembler 2.2.4 – Targetted Assembly of Short Sequence Reads
- Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.
Tedna 1.2.2 – Transposable Element De Novo Assembler
- Tedna is a lightweight de novo transposable element assembler. It assembles the transposable elements directly from the raw reads.
HyDA 1.3.1 / Squeezambler 2.0.3 – Hybrid De Novo Assembler
- HyDA is a multipurpose assembler, particularly tested for single cell and normal multicell genome co-assembly
PANDASEQ 2.8 / Pandaseq-sam 1.3 – PAired-eND Assembler for DNA sequences
- PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
ZORRO 2.2 – Hybrid Sequencing Technology Assembler
- ZORRO is a hybrid sequencing technology assembler. It merges two sets of pre-assembled contigs into a more contiguous and consistent assembly.
FLASH 1.2.11 – Fast Length Adjustment of SHort reads
- FLASH (Fast Length Adjustment of SHort reads) is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.
ALLPATHS-LG 51750 – Whole Genome Shotgun Assembler
- ALLPATHS-LG (Large Genome) is a whole genome shotgun assembler that can generate high quality assemblies from short reads. It works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity.
More Tools at http://bioinformaticsonline.com/pages/view/30440/genome-assembly-tools-and-software-part2

Rosalind Bioinformatics problems !!!

Abhi — Thu, 18 Dec 2014 10:32:48 -0600

Rosalind is a platform for learning bioinformatics and programming through problem solving. Take a tour to get the hang of how Rosalind works.

http://rosalind.info/problems/list-view/

Address of the bookmark: http://rosalind.info/problems/list-view/

Integrative Genomics Viewer (IGV) tutorial

Neel — Sat, 12 Jul 2014 15:16:23 -0500

The Integrative Genomics Viewer (IGV) from the Broad Center allows you to view several types of data files involved in any NGS analysis that employs a reference genome, including how reads from a dataset are mapped, gene annotations, and predicted genetic variants.

http://www.broadinstitute.org/igv/

Address of the bookmark: https://wikis.utexas.edu/display/bioiteam/Integrative+Genomics+Viewer+%28IGV%29+tutorial

BioinfoLab

Fri, 25 Mar 2016 11:05:35 -0500

Laboratory of Statistics and Computational tools for Bioinformatics

The Laboratory of Statistics and Computational tools for Bioinformatics (BioinfoLab) is hosted at the Istituto per le Applicazioni del Calcolo "Mauro Picone" - CNR . The laboratory has been officially opened in 2012 with the support of Programma Operativo Nazionale "Ricerca e Competitività" 2007-2013 (PON "R&C"), and it incorporates several expertise and research activities started since 2007, and supported by several CNR projects. Main interest of BioinfoLab is to develop novel statistical methods and computational tools for the analysis of high dimensional data arising from "Multi-omics" applications. In particular, current activities involve the analysis of ChIP-seq and RNA-seq experiments.

More at http://bioinfo.na.iac.cnr.it/BioinfoLab/index.html

List of Bioinformatics Software Tools for Next Generation Sequencing

Jitendra Prajapati — Fri, 11 Mar 2016 20:22:14 -0600

Commercial tools

Strand NGS
- offers many different tools including alignment, RNA-Seq, DNA-Seq, ChIP-Seq, Small RNA-Seq, Genome Browser, visualizations, Biological Interpretation, etc. Supports workflows “one can import the sample data in FASTA, FASTQ or tag-count format. In addition, prealigned data in SAM, BAM or Illumina-specific ELAND format can be directly imported for analysis.”
- Alignment feature: Supports alignment from Illumina, Ion Torrent, 454 (Roche), and Pac Bio
- DNA-Seq Feature, can annotate with dbSNP
CLC Genomics Workbench
- (QIAGEN). Features include: resequencing, workflow, read mapping, de novo assembly, variant detection, RNA-Seq, ChIP-Seq, Genome Browser, etc (entire list on website); Main Workbench offers database search (Genbank, Blast, Pubmed); 2000 organizations have invested in CLC
- Accepts VCF files from 1000 Genomes Project
- Accepts downloaded tracks from dbSNP
- Also accepts: FASTA, GFF/GTF/GVF, BED, Wiggle, Cosmic, UCSC variant database, complete genomics master var file
- Read mapping: “In addition to Sanger sequence data, reads from these high-throughput sequencing machines are supported: The 454 FLX System and the 454 GS Junior System from Roche, Illumina Genome Analyzer, Illumina HiSeq, Illumina HiScan, and Illumina MiSeq sequencing systems, SOLiD system from Life Technologies, Ion Torrent system from Life Technologies, Helicos from Helicos BioSciences”
- De novo assembly: “In addition to Sanger sequence data, reads from these high-throughput sequencing machines are supported The 454 FLX System and the 454 GS Junior System from Roche, Illumina Genome Analyzer, Illumina HiSeq, Illumina HiScan, and Illumina MiSeq sequencing systems, SOLiD system from Life Technologies, Ion Torrent system from Life Technologies”
- Annotation tracks from Ensembl
DNAnexus
- Private cloud repository -- formerly a redistributor of SRA and other NCBI resources; command-line or via web, can fetch data from a URL, build custom pipeline/ workflow has sra.dnanexus.com site: data downloads come directly from NCBI
Ingenuity Variant Analysis
- (QIAGEN) allows for variant identification and analysis, uses NCI-60 data set for cancer, Supported third part informatin: Entrez Gene, RefSeq, ClinVar; gives contextual details of results instead of just A to B relationship
- Has own database-- “knowledge base” based on COSMIC, OMIM, and TCGA databases
Lasergene Genomics Suite
- Comprehensive NGS software pipeline for assembly, alignment, variant calling and analysis of NGS data
- Supported workflows include: reference-guided and de novo genome and transcriptome assembly and analysis, metagenomics sample assembly, targeted resequencing, exome alignment, gene panels with validation control, variant analysis, and RNA-Seq, ChIP-Seq and miRNA alignment and analysis.
- #1 in accuracy: fewer false negatives and better sensitivity compared to results obtained from other aligners
- Aligns exome data and performs variant calling an average of 3 times faster than alternative pipelines
- Annotates genomic data with allele and genotype frequency, functional impact predictions, evolutionary conservation scores and pathogenicity
- Supports all major NGS technologies (Illumina, Ion Torrent, Pac Bio and Roche 454) and project types
- Available on Windows, Mac OS X, Linux, and the Amazon Cloud
NextGENe
- “perfect analytical partner for the analysis of desktop sequencing data produced by the ION PGM™, Roche Junior, Illumina MiSeq as well as high throughput systems as the Ion Torrent Proton, Roche FLX, Applied BioSystems SOLiD™ and Illumina® platforms.” runs on Windows, free-standing multi-application package-- SNP/Indel analysis, CNV prediction and disease discovery, whole genome alignment, etc.
- Data can be imported from Clinvar, dbSNP, Genbank:http://www.softgenetics.com/PDF/NextGene_UsersManual_web.pdf
Partek Genomics Suite
- Cited in over 3,500 peer-reviewed scientific publications
- Workflows for microarray and PCR data include: Gene expression including alternative splicing, miRNA expression, Genome Wide Association Studies, Mother-Father-Child Trio analysis, DNA Copy number including allele specific copy number and Loss of Heterozygosity (LOH), and ChIP, and methylation. Next Generation Sequencing (NGS) workflows include: RNA-Seq, miRNA-Seq, ChIP-Seq, DNA-Seq, and Methylation
- Powerful statistics and interactive, publication ready visualizations
- Supports all commercial next generation sequencing and microarray file format as well as text files
- Can input GEO SOFT files
Partek Flow
- Installation can be cloud-based or on a local cluster or Linux server
- Easy to use point-and-click interface
- Takes NGS data (.fastq, BAM, SAM), microarrays (Affymetrix, Illumina) and text files
- Supports custom genome builds and annotation databases
- Performs base trimming, alignment, quantification, quality analysis, statistics, and visualization
- Includes ten fully customizable aligners (Bowtie, Bowtie 2, BWA, GSNAP, Isaac 2, SHRiMP 2, STAR, TMAP, TopHat and TopHat 2)
- Applications for RNA-Seq, Small RNA-Seq, WGS/WES, Pathway enrichment, Fusion detection and Variant calling
- Allows users to create, save, share, or download analysis pipelines for automated and repeatable analysis
- Collaborate with others without transferring data
- Integrates microarray and next generation sequencing data
Golden Helix: SNP and Variation Suite
- used for managing, analyzing and visualizing genotypic and phenotypic data; Features: Genome-wide association studies, genomic prediction, copy number analysis, small sample DNA-Seq workflows, large sample DNA-seq analysis, RNA-seq analysis. Supported files: .txt, excel XLS & XLSX, CEL, CHP, CNT, Illumina, Plink PED, TPED, BED, Agilent files, NimbleGen data summary files, VCF files, Impute2 GWAS files, HapMap format, MACH output, + 50 other formats consumes NCBI data directly
Genomatix
- Applications: ChIP-Seq, DNA-Seq, RNA-Seq, DNA methylation; enable personalized medicine,
- Mining Stations: Supports all established NGS sequencing platforms- SOLiD, 454 Life Sciences, Genome Analyzer, HiSeq, MiSeq, IonTorrent
- Software Suite: can upload sequence of BED files
- Genome browser: BED and BAM files, Public data- 1500 BED files available for every user
Biodatomics
- Open source platform (SaaS), analysis and genome sequencing tools, integrates over 400 genomic analysis open source tools and pipelines, have a private and public cloud version. Features: genomic data visualization, drag and drop interface, accelerated analysis, real-time collaboration
- They have a couple modules to do so, and have enabled parts of the sra toolkit
SolveBio
- Software product, for clinical genomics professionals, manage, curate, report genomic variation
- Has own data library -- data from NCBI
Basepair
- Offers high quality workflows for all common NGS applications (RNA-Seq, ChIP-Seq, DNA-Seq, etc.)
- Very fast - get all results in a 1-2 hours. Cloud-based, no storage or computing limits.
- Easy to use - less than a minute to run an analysis
- REST and Python API to mange large projects.

Variant Identification

Germline Callers

IMPUTE2
- Description: phasing observed genotypes and imputing missing genotypes uses reference panels to provide all available halotypes, does not use population labels or genome-wide measures; designed to represent variation in one population; Fairly popular
- Input:
- Reference Haplotypes: Links to 1000 Genomes and HapMap downloads
- Output:
FreeBayes
- Description: finds SNPs, Indels, MNPs; reports variants based on alignment; haplotype based
- Input: BAM- uses BAMtools API to parse
- Reference genome: FASTA
- Output: VCF
SOAPindel
- Description: detects indels from NGS paired-end sequencing
- Input: files with read alignment can be SOAP or SAM formats, users must also give raw reads in Fasta or Fastq
- Reference Sequence used to align reads: FASTA
- Output:
2Kplus2
- Description: algorithm searches graphs produced by de novo assembler Cortex; c++ source code for SNP detection “2kplus2.cpp is a c++ source code for the detection and the classification of single nucleotide polymorphisms in transformed De Bruijn graphs using Cortex assembler.”
- Input:
- Output:
Atlas 2
- Description: specializes in separation of true SNPs and indels from sequencing and mapping errors, last update January 2013
- Input: takes BAM file,
- Reference Genome: FASTA
- Output: produces VCF
CRISP
- Description: identifies SNPs and INDELs from pooled high-throughput NGS, not used for analysis of single samples; implemented in C and uses SAMtools API; latest version should work with diploid genomes
- Input: requires BAM files (aligned with GATK)
- Reference Genome: indexed FASTA file
- Output: VCF files
Dindel
- Description: (Wellcome Trust Sanger) calls small indels from short-read sequences, only can handle Illumina data; cannot test candidate indels; written in C++, used on Linux based and Mac computers (not tested in windows)
- Input: BAM files
- Output: VCF
discoSnp++
- Description: detects homozygous and heterozygous SNPs and Indels; software composed of 2 modules (kissnp2 and kissreads)
- Input: raw NGS datasets; fasta, fastq, gzipped or not;
- no reference genome required; read pairs can be given
- Output: FASTA
FamSeq
- Description: family-based sequencing studies- provides probability of an individual carrying variant based on family’s raw measurements; accommodates de novo mutations, can perform variant calling at chrX;
- Input: VCF
- Output: VCF
GeneticThesaurus
- Description: “Annotation of genetic variants in repetitive regions”
- Input: Initial variant calling from bam → vcf output
- Reference Genome: need to provide own fasta file for hg19 genome,
- Output: vcf.gz, vtf.gz, and baf.tsv.gz output
glfMultiples
- Description: command-line, variant caller
- Input: GLF
- Output: VCF
glfSingle
- Description: uses likelihood-based model for variant calling, starts from genotype likelihoods that have been computed from other tools (ex. Samtools BAQ), the likelihoods combine with individual-based prior p(genotype) to generate posterior probabilities
- Input: GLF
- Output: VCF
Halvade
- Description: command-line; written in Java, “to run halvade a reference is needed for both GATK and BWA and a SNP (dbSNP!) database is required
- Input: FASTQ
- Output: VCF
indelMINER
- Description: identifies indels from paired-end reads
- Input: BAM (aligned in SAMtools API)
- Output: VCF
Indelocator
- Description: (Broad Institute): does not perform realignment, relies on alignments in BAM files (BAM files need aligned before put into indelocator); recommended to use GATK prior;
- Input: 2 BAM files(tumor & normal), annotated as germline or somatic; also has single sample mode
- Output: “Output of Indelocator is a high-sensitivity list of putative indel events containing large numbers of false positives. The statistics reported for each event have to be used to custom-filter the list in order to lower false positive rate”
Isaac Variant Caller
- Description: detects SNPs and small indels from diploid sample; designed to run on “nux-like platforms”
- Input: BAM
- Output: VCF
KvarQ
- Description: in silico genotyping for selected loci in bacterial genome, written in Python and C
- Input: FASTQ
- reference genome or de novo assembly not needed
- Output:
LoFreq
- Description: SNV caller, Python language, standalone program, uncovers cell-population heterogeneity from high-throughput sequencing datasets; calls variants found in <.05% of the population
- Input: BAM file input→ suggest running through GATK
- Output:
Manta
- Description: Calls indels and SVs from paired end reads; standalone, command line program; Written in C++ and Python
- Input: BAM (can tolerate non-paired-end reads); a matched tumor sample may be provided as well
- Output: VCF
MarginAlign
- Description: SNV caller, specifically tailored to Oxford Nanopore Reads, written in Python; Package comes with 3 programs, marginAlign, marginCaller (calls SNVs), marginStats (computes qc stats on sam files)
- Input: SAM
- Output: SAM
MendelScan
- Description: Last release March 2014; for analyzing sequencing data in family studies of inherited diseases; variant calls for a family in VCF file; still in alpha-testing on github, example data uses 1000 genomes dataset
- Input:
- Output:
nanopore
- Description: UCSC Nanopore group (group at UCSC studying using ion channels for analysis of single RNA/DNA structures) software pipeline; tailored to Oxford Nanopore Reads; command line program
- Input: FASTQ
- Reference files: FASTA
- Output: “For each possible pair of read file, reference genome and mapping algorithm an experiment directory will be created in the nanopore/output directory.”
Platypus
- Description: Package program, written in C, Python, Cython; Can identify SNPs, MNPs, short indels, and larger variants; has been tested on very large datasets (1000 genomes)
- Input: BAM
- Reference Genome: FASTA (files must be indexed using Samtools or similar program
- Output: VCF
QualitySNPng
- Description: detection of SNPs; “can be used as a standalone application with graphical user interface as part of pipeline system”; does not require fully sequenced reference genome; haplotype strategy
- Input:SAM, ACE
- Output: GUI
ReviSTER
- Description: command line program; automated pipeline; utilizes BWA, BLAT, and SAMTools; utilizes BWA mapping program;
- Input: FASTQ,
- Reference sequence file and list file containing STR locations as inputs
- Output: SAM
RVD
- Description: command-line program, detection of rare SNVs, relies upon Samtools, can be run in MATLAB
- Input: BAM
- Reference Genome: FASTA
- Output: “The algorithm output is a call table -- a comma-separated file with one line for each base position and each line in the following format:
- AlginmentReferencePosition, AlignmentBase, Call ,SecondBase, CenteredErrorPrc, ReferenceErrorPrc, SecondBasePrc”
SNVer
- Description: calls common and rare variants in pool or individual NGS data, reports overall p-value, operating system independent statistical tool, identifies SNPs and INDELs, written in Java, no dependencies, straightforward command-line
- (SNVerGUI=GUI version) --SNVerGUI: desktop tool for variant detection
- Input: chrX annotation, sam.zip, bam.zip
- reference file must be aligned to the data file
- Output:
SNVMix
- Description: detects SNVs from NGS, post-alignment tool
- Input: pileupformat (Maq or Samtools)
- Output:
SV-M
- Description: Structural Variant Machine - predicts indels, uses split read alignment profiles, validated by Sanger Sequencng
- Input:paired-end Illumina reads from 1001 genomes project (uses ref plant- 1001genomes.org)
- Ouptut:
SNPest
- Description: Standalone program, language C++, Perl
- Input: mpileup (SAMtools)
- Output: VCF
TrioCaller
- Description:Command line program, relies on BWA and samtools; genotype calling for unrelated individuals and parent-offspring trios
- Input: BAM (that has been aligned in BWA and Samtools
- Output: BCF that can be formatted to VCF using bcftools
Snippy
- Description: finds indels between haploid reference genome and NGS sequence reads
- Input:read files- FASTQ or FASTA (can be .gz compressed), output- .aln, .tab, .txt
- Reference genome in FASTA or GENBANK
- Output:
VntrSeek
- Description: pipeline for discovering microsatellite tandem repeats with high-throughput sequencing data
- Input: gzip-compressed FASTA or FASTQ
- Output: VCF files; one for TRs and observed alleles, another file contains link to viewer

Somatic Callers

Cake
- Description: standalone program, “pipeline for the integrated analysis of somatic variants in cancer genomes”; integrates four algorithms; written in Perl; required tools: samtools, tabix, vcftools, VarScan2, bambino, cmake, somaticsniper (User guide; workflow page)
- Input: tumor and normal reads in BAM files, run through variant calling programs to generate intermediate VCF
- Output: VCF
MuTect
- Description: Broad Institute, identification of somatic point mutations in cancer genomes; requires preprocessing of reads (GATK)
- Input: same as GATK (FASTA reference genome, SAM read files)
- Output: call-stats, VCF, wiggle files
Polymutt
- Description: calls SNVs and detects de novo point mutations in families
- Input: GLF or BAM or VCF (must have identical chromosome orders)
- Output: VCF
Bassovac
- Description: Improved Bayesian inversion somatic caller; unlike other software packages, treats effects fully probabilisticallys instead of using ad-hoc modeling; effects are integrated at the atomic level and standard probability theory integrates read tallies to the sample level and to the tumor-normal pair level; "pending public release"
- Input:
- Output:
CLImAT
- Description: standalone program; “accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole genome sequencing data”
- Input: depth file generated by DFExtract and a config file
- Output: .results file, .Gtype, LOG.txt, also generates visualization
DeNovoGear
- Description: de-novo variant calling and interpretation; standalone program; dependencies C++ compiler, CMake, HTSlib, Eigen, Boost
- Input: PED and BCF
- Output: “The output format is a single row for each putative de novo mutation (DNM), with the following fields”
EBCall
- Description: Empirical Baysian Mutation Calling; standalone program; uses tumor/normal paired reads and non-paired normal reference samples; dependent on samtools, R and VGAM pack for R
- Input: BAM
- Output: not sure what exact type of file- “The format of the result is suitable for adding annotation by annovar.”
HapMuc
- Description: standalone program; “utilizes the information of heterozygous germline variants near candidate mutations”; Dependent upon- Boost, SAMtools, BEDtools; 3 step workflow
- Input: BAM
- Output: BED
MultiGeMS
- Description: Multi-sample Genotype Model Selection
- Input: .txt, pileup (SAM/BAM converted to pileup format)
- Output: VCF
MultiSNV
- Description: command-line program; calls SNVs from NGS data from multiple samples from the same patient; dependent on R, Git, cmake, Boost and compile libraries
- Input: BAM or pileup
- Output: VCF
MutationSeq
- Description: standalone program, somatic SNV detection in tumor/normal samples; dependent on python, bamtools, boost, and LAPACK
- Input: BAM
- Output: VCF4.1 consisting of two parts (meta information & data lines)
qSNP
- Description: standalone program; SNV caller for somatic variants in “low cellularity cancer samples”
- Input: BAM, dbSNP data, Illumina data, chrConv
- Output: “qSNP output files are named using a 4-element pattern: ...”
RADIA
- Description: RNA and DNA Integrated Analysis for Somatic Mutation Detection; DNA only Method(tumor/normal pair, ignores RNA) or Triple BAM Method (uses all three datasets from same patient); dependent upon python, samtoools, pysam API, BLAT, SnpEff
- Input: BAM
- Reference Genome: FASTA indexed with SAMtools faidx
- Output: VCF
RVD2
- Description: sensitive, variant detection for low-depth targeted NGS data; python module or command- line program;
- Input: tab- deliminted depth chart format (converted from pileup files)
- Output: three hdf5 files and a vcf file
Shimmer
- Description: standalone program; detects somatic SNVs with multiple testing correction, uses Fisher’s exact test; dependent on git, samtools, R, R statmod package; for tumor/normal matched samples
- Input: BAM
- Output: VCF
SNV-PPILP
- Description: Refines GATK’s Unified Genotyper SNV calls for “multiple samples assumed to form a phylogeny”
- Input:
- Output:
SomaticSniper
- Description: command-line application to identify SNPs between tumor/normal pairs- predicts probability of difference between two
- Input: BAM
- Reference Genome in FASTA
- Output: VCF
Strelka
- Description: somatic variant calling workflow for matched tumor-normal samples; detects indels; runs on *nux-like platform
- Input: BAM (must be sorted and indexed)- Strelka does own realignment around indels-- don’t need to do this type of pre-processing
- Output: pair of VCF files
Triodenovo
- Description: Bayesian framework for calling de novo mutations in trios
- Input: VCF file with PL or GL fields (recommend using GATK or samtools to generate)
- Output: out_vcf
UNCeqr
- Description: finds somatic mutations using integration of DNA and RNA seq data-- boosts sensitivity for low purity tumors and rare mutations;
- Input:”can accept a variety of sequencing inputs and configurations”
- Output: “table of somatically mutated sites and associated information. These somatic mutations can be annotated with predicted transcript and protein effects using third party tools, such as Annovar”
Virmid
- Description: Virtual Microdissection for SNP calling; Java based; for disease-control matched samples; uncovers SNPs with low allele frequency by considering alpha contamination
- Input: BAM (must be sorted and indexed- samtools sort)
- Output: VCF and report file

Germline + Somatic Callers

VarScan 2
- Description: identify germline variants, private and shared variants, somatic mutations, and somatic CNVs; detects indels
- Input: SAMtools pileup
- Output: VCF
BAYSIC
- Description: Bayesian method; combines variant calls from different methods (GATK, FreeBayes, Atlas, Samtools, etc)
- Input: VCF format from one or more variant calling programs
- Output: VCF file containing integrated set of variant calls
MSIsensor
- Description: Microsatellite instability detection; C++ program, detects somatic and germline variants in tumor-normal paired data
- Input: BAM index files (normal and tumor)
- Output:
Beagle version 4
- Description: software package: genotype calling, phasing, imputation of ungenotyped markers, and identity-by-descent segment detection:unsure if this one is in the right category; genotype calling, phasing, imputation of ungenotyped markers, and identity-by-descent segment detection;
- Input: VCF
- Output: VCF
QuadGT
- Description: software package, SNV calling from normal-tumor pair and two parent genomes; quantifies descent-by-modification relationships; Written in Java
- Input: BAM files (parsed by Picard/Samtools API)
- Reference Genome; FASTA
- Output: VCF
RAREVATOR
- Description: RAre REference VAriant annotaTOR; command line; “identification and annotation of germline and somatic variants in rare reference allele loci from second generation sequencing data”; Bayesian genotype likelihood model
- Input: BED or VCF files from GATK
- Output: two VCF files (one for SNVs, one for Indels)
Scalpel
- Description: Used for detecting indels in a reference genome; performs localized micro-assembly of specific regions of interest; can do single, de novo, somatic reads; requires that raw reads are aligned with BWA
- Input: BAM
- Output: either VCF or ANNOVAR
SOAPsnp
- Description: based on Baye’s theorem; calls consensus genotype
- Input:SOAP short read alignment results
- Output: GLF, option of flat tabular format
VariantMaster
- Description: “extract causative variants for monogenic and sporadic genetic diseases”; uses ANNOVAR;
- Input: BAM or VCF files (from SAMtools, GATK)
- Output:

Downstream Analysis of Variants

PrediXcan
- Description: command-line, standalone package program; available in Perl, Python, and R versions; predicts liklihood of a gene being related to a certain phenotype- “that directly tests the molecular mechanisms through which genetic variation affects phenotype.”; no actual expression data used, only in silico expression; “PrediXcan can detect known and novel genes associated with disease traits and provide insights into the mechanism of these associations.”
- Input: genotype and phenotype file (doesn’t specify file type)
- Output:default values: genelist, dosages (file format: snpid rsid) , dosage_prefix, weights, output
ATHENA
- Description: Analysis Tool for Heritable and Environmental Network Associations; software package, combines machine learning model with biology and statistics to predict non-linear interactions
- Input: Configuration file, Data file, Map file (includes rsID)
- Output: Summary file, Best model file, dot file, individual score file, cross-validation file
CCRaVAT and QuTie
- Description: (Wellcome Trust Sanger) Case-Control Rare Variant Analysis Tool and Quantitative Trait; software packages for large-scale analysis of rare variants
- Input: PED file and MAP file
- Output: Five tab-delimited txt files
GCTA
- Description: Genome Wide Complex Trait Analysis; package program, command line interface; estimates variance by all SNPs; 5 main functions: “data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation”
- Input: PLINK binary PED files, MACH output format
- Output:
GenomeComb
- Description: package for analysis of complete genome data; annotation using public data or custom tracks, automated primer desing for Sanger or Sequenom validation; “The cg process_illumina command can be used to generate annotated multisample data starting from fastq files, using tools such as bwa for alignment and GATK and samtools for variant calling. Sequencing data can also be imported from Complete Genomics (cg_process_sample command), Real Time Genomics (cg_process_rtgsample command) and VariantCallFormat (VCF) variant files (vcf2sft command).”
- Input: Sequencing data from Complete Genomics, Illumina, SOLiD and VCF;
- Output: standard file format used is a simple tab delimited file (.sft, .tsv)
Genome Track Analyzer
- Description: compares genome tracks; allows user to compare DNA expression/binding;
- Input: multiple: SGR/TXT, BED, BED6, GFF; if using prealigned sequence data- use MACS peak caller: BAM, BED, SAM, ELAND
- Output:
GVCBLUP
- Description: animal gene mapping; “genomic prediction and variance component estimation of additive and dominance effects”; standalone program, command line interface, writting in C++ and Java
- Input:
- Output:
HOMOG
- Description: Analyzes heterogeneity with respect to single marker loci or known maps of markers; Carries out homogeneity test for alternative hypothesis “Two family types, one with linkage betweeen a trait to a marker or map of markers, the other without linkage”
- Input: HOMOG.DAT - described on website
- Output: HOMOG.OUT
INTERSNP
- Description: GWIA for case-control SNP and quantitative traits; selected for joint analysis using priori information; Provides linear regression framework, Pathway Association Analysis, Genome-wide Haplotype Analysis,
- Input: PLINK input formats (ped/map, tped/tfam, bed/bim/fam) Compatible with SetID files
- Gene reference file: Ensembl Release 75
- Output: covariance matrix for regression models
mtSet
- Description: Currently only the standalone version available, but moving to LIMIX software suite; offers set tests- allows for testing between variants and traits; accounts for confounding factors ex. relatedness
- Input: sample-to-sample genetic covariance matrix needs to be computed; multiple types of input; simulator requires input genotype and relatedness component;
- Output: resdir (result file of analysis), outfile (test statistics and p-values), manhattan_plot (flag)
MultiBLUP
- Description: Package program, command line interface; constructs linear prediction models; Best Linear Unbiased Prediction; improves upon BLUP involving kinship matrices; options: pre-specified kinships, regional kinships, adaptive multiblups, LD weightings
- Input: PLINK format
- Output:.reml, .indi.blp

Variant Annotation

ANNOVAR
- Description: command-line tool, supports SNPs, INDELs, CNVs and block substitutions, provides wide variety of annotation techniques, depends upon multiple databases (each needing to be downloaded); annotates genetic variants; utilizes RefSeq, UCSC Genes, and the Ensembl gene annotation systems; can compare mutations detected in dpSNP or 1000 Genomes Project; Very popular *“The final command run TABLE_ANNOVAR, using dbSNP version 138, 1000 Genomes Project 2014 Oct version, NIH-NHLBI 6500 exome database version 2 (referred to as esp6400siv2), dbNFSP version 2.6 (referred to as ljb26), dbSNP version 138 (referred to as snp138) databases and remove all temporary files, and generates the output file called myanno.hg19_multianno.txt”
- Input: VCF, ANNOVAR input format (simple text-based format); can convert other formats into ANNOVAR input format
- Output: VCF (if input VCF), output file with multiple columns, tab-delimited output file
wANNOVAR
- provides web-based access to ANNOVAR software
PolyPhen-2
- Description: Very popular; Polymorphism Phenotyping; Web application; predicts impact of amino acid substitution on protein; Calculates Bayes posterior probability (Last update July 2015)
- Input: FASTA
- Output:
SIFT
- Description: predicts how an amino acid substitution will affect protein function; Based on degree of conservation of amino acid residues- collected though PSI-BLAST; can be applied to nonsynonymous polymorphisms or laboratory-induced missense mutations; links to dbSNP 132, GRCh37; Standalone or web app program; Very popular
- Input: Uniprot ID or Accession, Go term ID, Function name, Species Name or ID, etc
- Output:
snpEff
- Description: Genetic variant annotation and effect prediction toolbox; integrated with Galaxy, GATK, and GNKO; can annotate SNPs, INDELs, and multiple-nucleotide polymorphisms; categorizes effects into classes by functionality; Very popular; Standalone or Web app; Claims to calculate all SNPs in 1000 genomes (EMBI) in less than 15 minutes; can annotate SNPs, MNPs, and insertions and deletions; Provides assessment of impact of the variant ( low, medium or high)
- Input: VCF, BED
- Output: VCF (with new ANN field, also used in ANNOVAR and VEP), HTML summary files
SnpSIFT
- Description: Filter and manipulate annotated files; Part of SnpEff main distribution; one variants have been annotated, this can be used to filter your data to find relevant variants
- Input:
- Output:
VAAST 2
- Description: Variant Annotation, Analysis, and Search Tool; probabilistic search tool for identifying damage genes and the disease causing variants; can score both coding and non-coding variants; Four tools: VAT (Variant annotation tool), VST (Variant Selection Tool), VAAST, pVAAST (for pedigree data); updated April 2015
- Input: FASTA, GFF3, GVF
- Output: CDR (condenser file), VAAST file (both unique to VAAST)
VEP
- Description: (Ensembl) Variant Effect Predictor; determines effect of variants on genes, transcripts, and protein sequence; uses SIFT and PolyPhen
- Input: Coordinates of variants and nucleotide changes; whitespace- separated format, VCF, pileup, HGVS
- Output: VCF, JSON, Statistics
ABSOLUTE
- Description: (Broad Institute); can estimate purity and ploidy to compute absolute copy number and mutation multiplicitie; reextracts data from the mixed DNA population
- Input: HAPSEQ segdat or segmentation file
- Output: per-sample output directory and subdirectory providing per-sample text files containing standard out being emitted from R
Alamut Batch
- Description: high-throughput annotation software for NGS analysis; for “intensive variant analysis workflows”; “enriches raw NGS variants with dozens of attributes”; based on clinically oriented Alamut database; Supports human genes; easy to integrate into pipeline (Latest Release- July 2015)
- Input:VCF, tab-delimted file
- Output: tab-separated file of annotations
AVIA
- Description: Annotation, Visualization, and Impact Analysis; “The tool is based on coupling a comprehensive annotation pipeline with a flexible visualization method. We leveraged the ANNOVAR (Wang et. al, 2010) framework for assigning functional impact to genomic variations by extending its list of reference annotation databases (RefSeq, UCSC, SIFT, Polyphen etc.) with additional in-house developed sources (Non-B DB, PolyBrowse).”
- Input: BED
- Output: Table of annotations with gene annotation features
BioR
- Description: (Mayo Clinic) (Page last updated June 2015) Biological Reference Repository; “data integration tool that enables coordinate based searches and joins based on strings”; “BioR consists of two parts 1) the BioR toolkit which depends on Java…. 2) the BioR catalogs which are the data files used by the system”
- Input: VCF
- BioR-Supported Catalogs (tar-gzip files): dbSNP, 1000 genomes, HapMap, OMIM, NCBIGene
- Output: VCF + JSON
CADD
- Description: Combined Annotation Dependent Depletion; tool for scoring SNV deletions/insertions; “integrates multiple annotations into one metric”; Score strongly correlates with allelic diversity and pathogenicity; links to 1000 Genome variants; uses Ensembl Variant Effect Predictor
- Input: VCF
- Output: CADD score
CandiSNPer
- Description: web application, characterizes SNPs located in vicinity of SNP of interest;
- Input: enter SNP ID (rsID), choose population, region, measure for LD, threshold plot format, color of SNPs, and chose to show genes
- Output: Imagefile
CanvasDB
- Description: “local database infrastructure for analysis of targeted- and whole genome re-sequencing projects”; dependent on MySQL, R, and ANNOVAR
- Input:
- Output:
CAROL
- Description: (Wellcome Trust Sanger); Combined Annotation scoRing toOL; Combined functional annotation score of nonsynonymous coding variants; Combines information from PolyPhen-2 and SIFT
- Input: tab-delimited with columns obtained from PolyPhen-2 and SIFT output
- Output: tab-delimited file
CHASM
- Description: Cancer-specific High-throughput Annotation of Somatic Mutations; Last updated May 2014; uses Random Forest Method to “distinguish between driver and passenger somatic mutations”; Positive driver class curated from COSMIC database; packed together with SNVBox (database)
- Input:Passenger mutation rates, Transcript and amino acid change, Genomic coordinates
- Output: CHASM score, p-value, FDR
CRAVAT
- Description: Cancer-Related Analysis of Variants Toolkit; Web application; Uses CHASM, VEST, SNVGet; “CRAVAT provides predictive scores for germline variants, somatic mutations and relative gene importance, as well as annotations from published literature and databases” Latest Release May 2015;
- Input: VCF, CRAVAT format
- Output: CRAVAT report- MS Excel spreadsheet or tab-separated file (emailed)
CUPSAT
- Description: Cologne University Protein Stability Analysis Tool; “tool to predict changes in protein stability upon point mutations”; web service program; Can predict mutant stability from existing PDB structures or custom protein structures
- Input:for PDB- provide PDB ID and Amino Acid Residue Number; for custom- PDB file format
- Output:
DANN
- Description: Deleterious Annotation of genetic variants; standalone program, uses “the same feature set and training data as CADD to train a deep neural network”; can catch nonlinear relationships; “There are four different datasets: training, validation, testing, and ClinVar_ESP...The ClinVar_ESP dataset is also a testing set containing a set of “gold standard” pathogenic and benign variants”
- Input:
- Output:
ESEfinder
- Description: Exonic Splicing Enhancer; useful for interpretation of point mutations/polymorphisms that are disease-associated; GUI interface; web app program
- Input: FASTA
- Output: html or plain text format, graphical display of results
Exomiser
- Description: Wellcome Trust Sanger; functionally annotates variants from whole-exome sequencing data; Based on Jannovar and uses UCSC KnownGene; Java program; web app program (Page last modified Feb 2015)
- Input: VCF
- Output: TSV, VCF
FamAn
- Description: Automated variant annotation pipeline for family-based sequencing studies; Annotaties SNVs and INDELs; 4 models- autosomal dominant, autosomal recessive, de novo mutations and a general model; “A variety of annotations are provided for each segregating variant: number of family (and family ID) each variant hits, variant genomic location and coding effect (based on snpEff), loss-of-function mutation annotation, selected ENCODE annotation, allele frequency in the 1000 Genomes Project, allele frequency in the Exome Variant Server (ESP6500), segmental duplication annotation, SIFT, PolyPhen2, LRT, MutationTaster, GERP++, PhyloP, SiPhy, etc.” (Last updated May 2014)
- Input: VCF
- Output: two excel compatible outputs
GeneTalk
- Description: Combines tool for filtering and data analysis with an online network for genetic professionals; Different degrees- basic license, premium license, in-house solution (the last ones are paid for- Commercial tool?)
- Input: VCF
- Output: GeneTalk Annotation- includes clinical data, medical relevance, scientific relevance (http://www.gene-talk.de/public/GeneTalk_Whitepaper_Annotations.pdf)
GeneVetter
- Description: “GeneVetter is a tool designed for investigation of the background prevalence of exonic variation in the Phase 3 1000 Genomes data under user defined filtering criteria”; web app program; GeneVetter uses GRch37p4 (hs37d5.fa.gz), dbSNP build 138, 1000G Phase 3, clinvar_2014072
- Input: VCF
- Output: TIMS score, summary table, PCA plot
GSITIC
- Description: (Broad Institute) Last update- July 2014; Identifies genomic regions that are significantly “amplified or deleted”; Each is given a G score; gives genomic locations and q-values from aberrant regions
- Input: segmentation file -seg, markers file -mk (required); -array file list -alf, CNV file -cnv
- Reference genome: -refgene (created in MATLAB, GISITIC provides four reference genomes: hg16.mat, hg17.mat, hg18.mat, hg19.mat
- Output: All lesions file (text file), amplifications file (text file), deletion genes file (text file), Gistic Scores file, Segmented copy number (pdf file), amplification score GISTIC plot (pdf file), Deletion score/q-vale GISTIC plot (pdf file)
HOPE
- Description: Have yOur Protein Explained; Web app program; Automatic mutant analysis server that provides structural effects of a mutation; Uses BLAST against UniProt and PDB along with homology modeling
- Input: FASTA protein sequence, or accession code of protein of interest
- Output: a report containing information from a “decision tree” and illustrated figures and animations
Human Splicing Finder
- Description: Last update: May 2013; aimed to help study pre-mRNA splicing; combines 12 algorithms to identify mutations’ effect on splicing motifs; uses ensembl database 70
- Input: Gene Name, Ensembl transcript ID, Ensembl Gene ID, Consensus CDS, RefSeq Peptide ID, or own sequence (looks like you can enter FASTA)
- Output: Chart with columns for predicted signal, predicted algorithm, cDNA position and interpretation
LARVA
- Description: Large-scale Analysis of Variants in noncoding Annotations; New version released July 2015; Command-line program; used for studying noncoding variants; integrates comprehensive set of noncoding elements, modeling their mutation count; Dependent on C++ and BEDtools
- Input: multiple
- Output:
LINKAGE
- Description:three main programs: mlink (calculates lod scores at fixed values for the recombination fraction in one interval of a genetic map), linkmap (calculates location scores for positions of a disease locus along a marker), and ilink (estimates parameters including recombination fractions, allele frequencies, penetrances, etc)
- Input: pedfile (processed by MAKEPED) and datafile (reflects loci for each individual; set in PREPLINK)
- Output:
MAC
- Description: MNV Annotation Corrector; Ad hoc software, fixes incorrect amino acid predictions that are caused by multiple nucleotide variations; Uses existing annotators ANNOVAR, SnpEff, VEP (last update April 2015) (only 1 download this week → not popular)
- Input: List of called SNVs and corresponding BAM
- Output: Report identifying block of mutation within codon (BMCs)
mit-o-matic
- Description: focuses on mtDNA, provides clinically relevant information from different resources; two component pipeline: command link for alignment of NGS reads and online version that provides genetic report on mitocondrial variants
- Input:FASTQ, pileup
- Reference sequence: rCRSm
- Output: Online version gives comprehensive genetic report
Mutadelic
- Description: Web App program; “This application generates reports on inherited mutations in five genes (ANK1, SLC4A1, SPTA1, SPTB and EPB42) associated with the following rare Mendelian blood disorders: Hereditary Spherocytosis (HS), Hereditary Elliptocytosis (HE) and Hereditary Pyropoikilocytosis”; Newer program- recently validated on omictools
- Input: Can upload coordinates of DNA variants or VEP
- Output: Displayed on web or can be downloaded in Excel or RDF format
MutationTaster
- Description: (Last post on site 2014) Web app program; Rapid evaluation of disease causing alterations; uses NCBI 37 and Ensembl 69
- Input: HGNC symbol, NCBI GeneID, or Ensembl ID,
- Output: Report containing prediction, summary, name of alteration, etc
MutPred
- Description: web app tool; Classifies amino acids substituation as disease associated or neutral in humans; Last modified Feb. 2014; Based on SIFT, trained using Human Gene Mutation Database
- Input:
- Output: “The output of MutPred contains a general score (g), i.e., the probability that the amino acid substitution is deleterious/disease-associated, and top 5 property scores (p), where p is the P-value that certain structural and functional properties are impacted.”
MutSigCV
- Description: (Broad Institute) Mutation Significance (CV= covariates); Analyzes mutations discovered in DNA sequencing to identify genes that were mutated more often than expected
- Input: mutations.maf, coverage.txt, covariates.txt
- Output: output.txt
NGS-SNP
- Description: Collection of command-line scripts for providing rich SNP annotations; “NCBI, Ensembl, and Uniprot IDs are provided for genes, transcripts and proteins when applicable”;
- Input: Samtools consensus pileup, Maq, diBayes, Genetic format, VCF
- Output: File containing annotated SNPs is copied from SNP list and some classes are added
Oncotator
- Description: (Broad Institute) “Tool for annotating human genomic point mutations and data relevant to cancer researchers”; Web app; Supports annotation of data from ClinVar, dbSNP, 1000 genomes (plus many other external sites); Only GRCh27 coordinates supported; Last update: April 2015
- Input: tal-delimited file
- Output: tab-delimited MAF
PANTHER
- Description: Protein ANalysis THrough Evolutionary Relationships; Web app program, also has its own database; Classification system used to classify proteins and their genes; Also, “Estimates the likelihood of a particular nonsynonymous (amino-acid changing) coding SNP to cause a functional impact on the protein”; Updated in 2015
- Input: Data from PANTHER, IDs from Ensembl, EntrezGene, NCBI GI numbers, NCBI UniGene IDs HUGO, UniProt; if ID type is not one of the above, can input txt file or excel format
- Output: Analysis results displayed online
PESX
- Description: Putative Exonic Splicing Enhancers/Silencers; (Can’t tell if this is outdated or not)
- Input: FASTA or plain text
- Output: Excel spread sheet
Phen-Gen
- Description: Combines patient's’ disease symptoms with sequencing data; Standalone or Web app version; Only excepts 1 family per run, in order to evaluate unrelated individuals, each sample needs to be run individually
- Input: Variant- VCF; Pheotype- HPO; Pedigree- PED
- Output: Combined scores file, variants for top genes file
PMUT
- Description: Aimed at annotation and prediction of pathological mutations; based on different kinds of sequence info and neural networks to process information
- Input: FASTA
- Output; Simple yes/no and reliability index
PROVEAN
- Description: Protein Variation Effect Analyzer; predicts whether an amino acid substitution or indel has impact on biological function of the protein; “comparable to SIFT or Polyphen-2”; Standalone, Web app, Command line or GUI; Last update May 2014
- Input: FASTA, list of variants;
- Output: tab-separated columns including Variant, Provean Score and prediciton
Rescue-ESE
- Description: “An online tool for identifying candidate ESEs in vertebrate exons”; Web application; For human, mouse, zebrafish, pufferfish
- Input: multi-FASTA or plain text
- Output:
SCAN
- Description: Web application program, includes a database as well; Database contains physical-based SNP annotations and functional annotations; “Information on physical, functional, and LD annotation served on the SCAN database comes directly from public resources, including the HapMap (release 23a), NCBI (dbSNP 129), or is information created by us using data downloaded from these public resources”; “SCAN can be utilized in several ways including: (i) queries of the SNP and gene databases; (ii) analysis using the attached tools and algorithms; (iii) downloading files with SNP annotation for various GWA platforms”
- Input:
- Output: HTML, comma-delimited, tab-delimited
SeattleSeq Annotation
- Description: “SeattleSeqAnnotation137 was most recently updated October 13, 2013. The current version is 8.08. The most recent site, based on dbSNP build 141, and hg38/NCBI 38”; Provides annotations for SNVs and Indels- includes dbSNP rsID, gene names and accession numbers, variation functions, protein positions and amino acid changes, conservation scores, HapMap frequencies, PolyPhen predictions and clinical association.
- Input: Maq, gff, CASAVA, VCF, GATK bed, custom
- Output: “default output file format is a header line (starting with "#") followed by tab-separated annotations”; VCF
seqminer 3.7
- Description: “Efficiently Read Sequence Data (VCF Format, BCF Format and METAL Format) into R”; Command line package program; Published August 2015
- Input: VCF, BCF
- Output: VCF
SG Adviser
- Description: Scripps Genome Annotation and Distributed Variant Interpretation Server, web developed applications for variant annotation, “Downstream applications of variant annotation include: Clinical sequencing applications including: carrier testing, or identification of causal variants in molecular diagnosis, tumor sequencing, or diagnostic odyssey. Prioritization of variants prior to statistical analysis of sequence based disease association studies, especially for automated set-generation and enrichment of likely functional variants within sets. Identification of causal variants in post-GWAS/linkage sequencing studies. Identification of causal variants in forward genetic screens (stay tuned for non-human annotation)”
- Input: SNV- VCF, BED, and a few others; CNV- BED, CNVator, plus others
- Output: tab-delimited file
SNAP-2
- Descriptio

Jvarkit : Java utilities for Bioinformatics

Jit — Fri, 08 Jun 2018 09:31:55 -0500

Collection of Java tool kits for bioinformatics works: Jvarkit : Java utilities for Bioinformatics

Address of the bookmark: http://lindenb.github.io/jvarkit/

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Commercial and public next-gen-seq (NGS) software

Surabhi Chaudhary — Tue, 03 Jun 2014 20:45:11 -0500

Integrated solutions
CLCbio Genomics Workbench - de novo and reference assembly of Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Windows, Mac OS X and Linux.
Galaxy - Galaxy = interactive and reproducible genomics. A job webportal.
Genomatix - Integrated Solutions for Next Generation Sequencing data analysis.
JMP Genomics - Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
NextGENe - de novo and reference assembly of Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Win or MacOS.
Partek - Commercial software for NGS, microarray, and qPCR data analysis. Streamlined analysis workflows for: ChIP-Seq, RNA-Seq, DNA-Seq, DNA Methylation, Gene Expression, Exon, miRNA Expression, Copy Number, Allele-Specific Copy Number, LOH, Association, Trio Analysis, and Tiling. Supports all commercial sequencing and microarray technologies.
SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
SHORE - SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A suite created by the 1001 Genomes project. Source for POSIX.
SlimSearch - Fledgling commercial product.
Synamatix has SXOligoSearch (http://synasite.mgrc.com.my:8080/sxo...ligoSearch.php)
The SWIFT suit is a software collection for fast index-based sequence comparison. It contains the following programs: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences; SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. http://bibiserv.techfak.uni-bielefeld.de/swift/
biolib.is library and a set of script targeted to NGS. There are modules to: clean sequences (sanger, 454, ilumina), parse caf, ace and bowtie map files, clean and filter contigs, look for snps and indels., filter snps, do statistics for: reads, contigs and snps.

Align/Assemble to a reference
BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.
ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.
Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.
GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.
MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source
MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.
MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.
Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.
PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.
SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.
Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.
SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.
SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.
Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.
Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data.
NCGR uses GMAP (http://www.gene.com/share/gmap/) to alignment Solexa reads. GMAP is free, though.
Exonerate (http://www.ebi.ac.uk/~guy/exonerate/)
MUMmer (http://mummer.sourceforge.net/)
The mapping short reads called gnumap (http://dna.cs.byu.edu/gnumap/) made to increase the accuracy with duplicate matches. Open source, creates viewable output (with Affy's Integrated Genome Browser), and produces results very similar to novocraft's.
SOCS (short oligonucleotides in color space)
BFAST https://secure.genome.ucla.edu/index.php/BFAST

De novo Align/Assemble
ABySS - Assembly By Short Sequences. ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. By Simpson JT and others at the Canada's Michael Smith Genome Sciences Centre. C++ as source.
ALLPATHS - ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can generate high quality assemblies from short reads. Assemblies are presented in a graph form that retains ambiguities, such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. Broad Institute.
Edena - Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Edena is based on the traditional overlap layout paradigm. By D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
EULER-SR - Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
SEQAN - A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
SSAKE - The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
SOAPdenovo - Part of the SOAP suite. See above.
VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).
SOAP (http://soap.genomics.org.cn) by Ruiqiang Li, as has been pointed by ECO.
Euler-SR (Euler-Short Reads Assembly, http://euler-assembler.ucsd.edu/portal/) by Mark J. Chaisson and Pavel A. Pevzner from UCSD. (published in Genome Research)
RMAP (A program for mapping Solexa reads, http://rulai.cshl.edu/rmap/) by Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics)
Short read aligner called Bowtie (http://bowtie-bio.sourceforge.net/) designed for fast mapping of Illumina reads

SNP/Indel Discovery
ssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.
PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.
Maq is also able to find SNPs with its own alignment. It has a graphical viewer, but again for its own alignment format.
SSAHA has been optimized for short-reads, too. But yes, SSAHASNP appears in your "SNP/INDEL discovery" category.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
EagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.
LookSeq - LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data. LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. From the Sanger Centre.
MapView - MapView: visualization of short reads alignment on desktop computer. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China. Linux.
SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here
XMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
BS-Seq - The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.
CHiPSeq - Program used by Johnson et al. (2007) in their Science publication
CNV-Seq - CNV-seq, a new method to detect copy number variation using high-throughput sequencing. Chao Xie and Martti T Tammi at the National University of Singapore. Perl/R.
FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis Package
MACS - Model-based Analysis for ChIP-Seq. MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.
PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a two-pass approach for scoring ChIP-Seq data relative to controls. The first pass identifies putative binding sites and compensates for variation in the mappability of sequences across the genome. The second pass filters out sites that are not significantly enriched compared to the normalized input DNA and computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)
SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.
SeqMap (http://biogibbs.stanford.edu/~jiangh/SeqMap/) - work like ELand, can do 3 or more bp mismatches and also insdel
ChIPSeq analysis is: http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/

See also this thread for ChIP-Seq, until I get time to update this list.

Alternate Base Calling
Rolexa - R-based framework for base calling of Solexa data. Project publication
Alta-cyclic - "a novel Illumina Genome-Analyzer (Solexa) base caller"

Transcriptomics
ERANGE - Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie, BLAT and ELAND. From the Wold lab.
G-Mo.R-Se - G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. From CNS in France.
MapNext - MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China.
QPalma - Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang at Stanford University.
TopHat - TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland and the University of California, Berkeley
NGS-Trex: Next Generation Sequencing Transcriptome profile explorer http://www.biomedcentral.com/1471-2105/14/S7/S10

Reference

Illumina has a software list: http://www.illumina.com/pagesnrn.ilmn?ID=245.

Some softwares in his blog (http://www.fejes.ca/labels/DNA.html)

http://seqanswers.com/wiki/Software

List of visualization tools for genome alignments

Rahul Nayak — Fri, 02 Feb 2018 13:25:33 -0600

Genome browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. But sometime, we need publication ready figure of genomes. Following are the list of genome alignment visualization tools, which could be useful for analysis and interpretation of results:

ABySS Explorer

Interactive Java application that uses a novel graph-based representation to display a sequence assembly and associated metadata

http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer

BamView

Genome browser and annotation tool that allows visualization of sequence features, next-generation sequencing (NGS) data and the results of analyses within the context of the sequence, and also its six-frame translation

http://www.sanger.ac.uk/resources/software/artemis/

DNannotator

Annotation web toolkit for regional genomic sequences

http://bioapp.psych.uic.edu/DNannotator.htm

JVM

Java Visual Mapping tool for NGS reads

http://www.springer.com/cda/content/document/cda_downloaddocument/9789401792448-c2.pdf?SGWID=0-0-45-1487072-p176815501

LookSeq

Web-based visualization of sequences derived from multiple sequencing technologies. Low- or high-depth read pileups and easy visualization of putative single nucleotide and structural variation

http://lookseq.sourceforge.net

MagicViewer

Visualization of short read alignment, identification of genetic variation and association with annotation information of a reference genome

http://bioinformatics.zj.cn/magicviewer/

MapView

Alignments of huge-scale single-end and pair-end short reads

http://omictools.com/mapview-s1367.html

MultiPipMaker

Computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a ‘percent identity plot’ (pip)

http://pipmaker.bx.psu.edu/pipmaker/

PileLineGUI

Handling genome position files in NGS studies

http://sing.ei.uvigo.es/pileline/pilelinegui.html

SAMtools tview

Simple and fast text alignment viewer; NGS compatible

http://www.htslib.org/

SEWAL

Uses a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run

http://www.sourceforge.net/projects/sewal

STAR

A web-based integrated solution to management and visualization of sequencing data

http://wanglab.ucsd.edu/star/browser

SVA

Software for annotating and visualizing sequenced human genomes

http://www.svaproject.org

Viewer (IGV)

Visualization of large heterogeneous datasets, providing a smooth and intuitive user experience at all levels of genome resolution

https://www.broadinstitute.org/igv/

ZOOM Lite

NGS data mapping and visualization software

http://bioinfor.com/zoom/lite/