BOL: Related items

Virus Bioinformatics Tools

LEGE — Wed, 24 Apr 2024 06:19:55 -0500

Bioinformatics tools play a crucial role in studying viruses, enabling researchers to analyze their genetic makeup, structure, function, and evolution. Here are some commonly used bioinformatics tools for virus research

https://evirusbioinfc.notion.site/18e21bc49827484b8a2f84463cb40b8d?v=92e7eb6703be4720abf17a901bc9a947

Address of the bookmark: https://evirusbioinfc.notion.site/18e21bc49827484b8a2f84463cb40b8d?v=92e7eb6703be4720abf17a901bc9a947

Exploring Bacterial Comparative Genomics: A Bioinformatics Approach

LEGE — Sat, 14 Dec 2024 12:31:14 -0600

In the world of microbiology, bacteria have long fascinated scientists for their diversity, adaptability, and crucial roles in ecosystems and human health. Comparative genomics—a field that involves analyzing and comparing the genomes of different organisms—has revolutionized our understanding of bacterial evolution, adaptation, and pathogenicity. By leveraging bioinformatics tools and techniques, researchers can uncover genomic insights that were once hidden. This blog delves into the principles, methodologies, and applications of bacterial comparative genomics from a bioinformatics perspective.

What is Bacterial Comparative Genomics?

Comparative genomics involves the systematic comparison of genomes across different bacterial species or strains. This approach allows scientists to:

Identify conserved and unique genes.
Explore genetic determinants of pathogenicity.
Understand bacterial evolution and phylogenetics.
Investigate horizontal gene transfer and its role in antibiotic resistance.

Bioinformatics is central to these analyses, enabling the processing and interpretation of large-scale genomic data.

Key Steps in Bacterial Comparative Genomics

Genome Sequencing and Assembly: The process begins with obtaining high-quality bacterial genome sequences. Advances in next-generation sequencing (NGS) technologies have made it faster and more affordable to sequence bacterial genomes. Tools such as SPAdes and Velvet are commonly used for genome assembly.
Genome Annotation: Annotating a genome involves identifying genes, regulatory elements, and other genomic features. Automated tools like Prokka and RAST provide functional annotations, allowing researchers to predict the roles of genes and proteins.
Genome Alignment: Aligning genomes is crucial for identifying conserved regions, single-nucleotide polymorphisms (SNPs), and structural variations. Tools like Mauve and progressiveMauve are commonly employed for whole-genome alignments.
Comparative Analyses:
- Core and Pan-genome Analysis: The core genome consists of genes shared across all strains of a species, while the pan-genome includes all genes found in any strain. Software like Roary and BPGA can perform core and pan-genome analyses.
- Phylogenetic Analysis: Comparative genomics often involves reconstructing evolutionary relationships. Tools such as MEGA and IQ-TREE facilitate phylogenetic tree construction based on genomic data.
- Functional Enrichment Analysis: To understand the biological significance of unique or shared genes, functional enrichment analysis using databases like GO (Gene Ontology) and KEGG is essential.

Recommended Bioinformatics Tools for Comparative Genomics

Here are some additional bioinformatics tools that can aid bacterial comparative genomics:

OrthoFinder: For accurate ortholog identification across multiple genomes.
PanOCT: Specifically designed for pan-genome clustering and annotation.
FASTANI: A tool for calculating Average Nucleotide Identity (ANI) for microbial genome comparisons.
CIRCOS: For visually comparing genomic data through circular genome plots.
Galaxy Platform: A user-friendly web-based platform offering numerous genomic analysis tools.
BLAST: Essential for sequence alignment and similarity searches.
PhyloSift: Focused on phylogenetic analysis of microbial genomes using marker genes.

These tools, in combination with the methods discussed, provide a robust framework for conducting comprehensive comparative genomic studies.

Applications of Bacterial Comparative Genomics

Understanding Pathogenicity: Comparative genomics helps identify virulence factors that distinguish pathogenic strains from non-pathogenic relatives. For instance, comparing genomes of Escherichia coli strains has revealed key genetic determinants of pathogenicity in enterohemorrhagic strains.
Antibiotic Resistance Research: The spread of antibiotic resistance genes through horizontal gene transfer is a major global concern. Comparative analyses can trace the origins and dissemination of resistance genes, aiding in the development of countermeasures.
Microbial Ecology and Evolution: By studying genomic variations, researchers can understand how bacteria adapt to different environments. This is particularly relevant for extremophiles and symbiotic bacteria.
Vaccine Development: Identifying conserved antigens across pathogenic strains is critical for vaccine design. Comparative genomics has been instrumental in developing vaccines against pathogens like Neisseria meningitidis.
Biotechnology Applications: Comparative studies can uncover unique metabolic pathways in bacteria, paving the way for applications in bioremediation, synthetic biology, and industrial microbiology.

Challenges in Bacterial Comparative Genomics

While the field has made significant strides, several challenges remain:

Data Overload: The rapid growth of sequencing data requires robust computational infrastructure and efficient algorithms.
Genome Plasticity: High rates of horizontal gene transfer and genome rearrangements in bacteria complicate comparative analyses.
Annotation Accuracy: Automated annotation tools are not infallible, and manual curation is often needed for high-confidence results.
Interpreting Non-Coding Regions: Understanding the functional significance of non-coding genomic regions remains a challenge.

Future Directions

The integration of bacterial comparative genomics with other ‘omics’ approaches—such as transcriptomics, proteomics, and metabolomics—promises a more comprehensive understanding of bacterial biology. Additionally, advancements in machine learning and artificial intelligence are likely to further enhance bioinformatics analyses, enabling the prediction of complex phenotypes from genomic data.

Conclusion

Bacterial comparative genomics, driven by bioinformatics, continues to unravel the complexities of bacterial life. From combating antibiotic resistance to uncovering the secrets of microbial evolution, this interdisciplinary field holds immense potential for addressing pressing challenges in microbiology and beyond. As technology advances, so too will our ability to harness the power of comparative genomics for scientific and societal benefit.

Predicting Pathogen Virulence Using Bioinformatics Tools

BioStar — Tue, 04 Nov 2025 07:55:53 -0600

In the genomic era, the ability to predict the virulence potential of pathogens has become an indispensable part of infectious disease research. With the exponential growth of microbial genome data, bioinformatics tools now enable scientists to identify virulence factors, model pathogen behavior, and even forecast outbreak risks — all from sequence data.

In an age where pathogens continue to evolve and cross boundaries, understanding what makes them virulent—that is, capable of causing disease—has become a critical focus in modern microbiology and genomics. Virulence prediction bridges computational biology, genomics, and machine learning to forecast the pathogenic potential of microbes before they strike.

What Is Virulence?

Virulence refers to the degree of damage a pathogen can inflict on its host. It is determined by a combination of genetic factors—called virulence factors (VFs)—that allow the organism to attach, invade, evade, and harm the host. These include genes coding for toxins, secretion systems, adhesins, and enzymes that disrupt host defenses.

Understanding virulence factors not only helps in deciphering the mechanisms of infection but also provides early warning signs for emerging threats.

Why Predict Virulence?

Traditional virulence studies relied heavily on experimental infection models, which, although accurate, are time-consuming, expensive, and ethically constrained.
Today, the availability of whole-genome sequences and large-scale pathogen databases has paved the way for in silico virulence prediction—a computational approach that can screen thousands of genomes within hours.

This approach enables researchers to:

Rapidly identify potential high-risk strains.
Prioritize pathogens for containment, surveillance, or further study.
Guide vaccine development and drug target discovery.
Support One Health frameworks, linking animal, human, and environmental health data.

How Is Virulence Predicted?

Virulence prediction combines bioinformatics pipelines with machine learning and comparative genomics. The process generally involves:

Genome Annotation: Identifying genes and coding sequences in microbial genomes.
Feature Extraction: Comparing sequences with curated databases like VFDB (Virulence Factor Database), PATRIC, or Victors.
Pattern Recognition: Using algorithms (e.g., Random Forest, SVM, or deep learning models) to classify genes or strains as virulent or non-virulent based on sequence patterns, motifs, and protein domains.
Scoring and Visualization: Assigning a virulence score or confidence level and visualizing it through heatmaps or genome maps.

Tools and Resources for Virulence Prediction

A number of tools and databases make virulence prediction accessible to the scientific community:

VFanalyzer – For identifying virulence genes based on VFDB.
PathoFact – Predicts virulence, antimicrobial resistance (AMR), and toxin genes from metagenomic data.
Pangenome-based models – Identify virulence-associated gene clusters across strains.
Machine learning models – Use features like GC content, codon usage bias, or protein domains to predict pathogenicity.

Emerging tools now integrate multi-omic data—including transcriptomics, proteomics, and metabolomics—to understand virulence in a systems biology framework.

Applications in the Real World

Virulence prediction has major implications across public health and research sectors:

Epidemic preparedness: Early identification of virulent strains in outbreak samples.
AMR surveillance: Linking virulence profiles with antibiotic resistance determinants.
Environmental monitoring: Predicting pathogenic potential of soil or waterborne microbes.
Clinical diagnostics: Supporting personalized treatment through pathogen profiling.

For instance, integrating virulence prediction pipelines into national surveillance networks could enable faster risk assessment and response to infectious outbreaks.

The Road Ahead

As machine learning and genomics advance, virulence prediction will evolve from simple gene-based detection to dynamic, context-aware models that account for host–pathogen interactions, environmental signals, and evolutionary adaptation.

Future tools may predict not just if a strain is virulent, but under what conditions it expresses that virulence—bridging the gap between genotype and phenotype.

In Summary

Virulence prediction is redefining how we understand and anticipate infectious diseases. By coupling genomic insights with computational intelligence, researchers can identify potential threats earlier, design smarter interventions, and ultimately, strengthen our preparedness against emerging pathogens.

Bioinformatics software for biologists in the genomics era

Poonam Mahapatra — Sun, 22 Dec 2013 17:31:05 -0600

The genome sequencing revolution is approaching a landmark figure of 1000 completely sequenced genomes. Coupled with fast-declining, per-base sequencing costs, this influx of DNA sequence data has encouraged laboratory scientists to engage large datasets in comparative sequence analyses for making evolutionary, functional and translational inferences. However, the majority of the scientists at the forefront of experimental research are not bioinformaticians, so a gap exists between the user-friendly software needed and the scripting/programming infrastructure often employed for the analysis of large numbers of genes, long genomic segments and groups of sequences. We see an urgent need for the expansion of the fundamental paradigms under which biologist-friendly software tools are designed and developed to fulfill the needs of biologists to analyze large datasets by using sophisticated computational methods. We argue that the design principles need to be sensitive to the reality that comparatively small teams of biologists have historically developed some of the most popular biological software packages in molecular evolutionary analysis. Furthermore, biological intuitiveness and investigator empowerment need to take precedence over the current supposition that biologists should re-tool and become programmers when analyzing genome scale datasets.

Address of the bookmark: http://bioinformatics.oxfordjournals.org/content/23/14/1713.full

Amity University Bioinformatics Summer Program - Kolkata

eliabrodsky — Tue, 11 Jun 2019 21:27:10 -0500

Registrations are now open for the 2019 Summer Bioinformatics Training program at Amity University, Kolkata. The program will focus on introductory topics for life science students. We will review important history, topics and challenges bioinformatics can help address in the context of basic research, discovery and industry.

Read more: https://edu.t-bio.info/amity-university-summer-bioinformatics-program-registrations-are-open/

NCBI PSI-BLAST Tutorial

Fri, 23 Aug 2013 02:25:02 -0500

http:--www.biotechnology.jhu.edu- Tutorial for PSI-BLAST, an extension of BLAST that uses matrix algebra. BLAST is a cornerstone bioinformatics tool at NCBI. BLAST is the Basic Local Alignment Search tool and will protein and DNA sequences that are related to a sequence that the user provides.

BEAP: Blast Extension and Assembly Program

Shruti Paniwala — Mon, 11 Jun 2018 04:52:56 -0500

The Blast Extension and Assembly Program (BEAP) is a computer program that uses a short starting DNA fragment, often a EST or partial gene segment, as "primer", to recursively blast nucleotide databases in an attempt to obtain all sequences that overlaps, directly or indirectly, with the "primer" therefore help to "extend" the length of the original sequence for constructing a "full length" sequence for functional analysis, or at least to obtain neighboring regions of the segment for SNP discovery and linkage disequilibrium analysis. The confidence of assembling the resulting sequences is achieved by using a known genome, such as human genome, as a reference. https://www.animalgenome.org/tools/beap/

Address of the bookmark: https://www.animalgenome.org/tools/beap/

Commercial and public next-gen-seq (NGS) software

Surabhi Chaudhary — Tue, 03 Jun 2014 20:45:11 -0500

Integrated solutions
CLCbio Genomics Workbench - de novo and reference assembly of Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Windows, Mac OS X and Linux.
Galaxy - Galaxy = interactive and reproducible genomics. A job webportal.
Genomatix - Integrated Solutions for Next Generation Sequencing data analysis.
JMP Genomics - Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
NextGENe - de novo and reference assembly of Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Win or MacOS.
Partek - Commercial software for NGS, microarray, and qPCR data analysis. Streamlined analysis workflows for: ChIP-Seq, RNA-Seq, DNA-Seq, DNA Methylation, Gene Expression, Exon, miRNA Expression, Copy Number, Allele-Specific Copy Number, LOH, Association, Trio Analysis, and Tiling. Supports all commercial sequencing and microarray technologies.
SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
SHORE - SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A suite created by the 1001 Genomes project. Source for POSIX.
SlimSearch - Fledgling commercial product.
Synamatix has SXOligoSearch (http://synasite.mgrc.com.my:8080/sxo...ligoSearch.php)
The SWIFT suit is a software collection for fast index-based sequence comparison. It contains the following programs: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences; SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. http://bibiserv.techfak.uni-bielefeld.de/swift/
biolib.is library and a set of script targeted to NGS. There are modules to: clean sequences (sanger, 454, ilumina), parse caf, ace and bowtie map files, clean and filter contigs, look for snps and indels., filter snps, do statistics for: reads, contigs and snps.

Align/Assemble to a reference
BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.
ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.
Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.
GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.
MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source
MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.
MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.
Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.
PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.
SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.
Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.
SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.
SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.
Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.
Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data.
NCGR uses GMAP (http://www.gene.com/share/gmap/) to alignment Solexa reads. GMAP is free, though.
Exonerate (http://www.ebi.ac.uk/~guy/exonerate/)
MUMmer (http://mummer.sourceforge.net/)
The mapping short reads called gnumap (http://dna.cs.byu.edu/gnumap/) made to increase the accuracy with duplicate matches. Open source, creates viewable output (with Affy's Integrated Genome Browser), and produces results very similar to novocraft's.
SOCS (short oligonucleotides in color space)
BFAST https://secure.genome.ucla.edu/index.php/BFAST

De novo Align/Assemble
ABySS - Assembly By Short Sequences. ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. By Simpson JT and others at the Canada's Michael Smith Genome Sciences Centre. C++ as source.
ALLPATHS - ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can generate high quality assemblies from short reads. Assemblies are presented in a graph form that retains ambiguities, such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. Broad Institute.
Edena - Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Edena is based on the traditional overlap layout paradigm. By D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
EULER-SR - Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
SEQAN - A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
SSAKE - The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
SOAPdenovo - Part of the SOAP suite. See above.
VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).
SOAP (http://soap.genomics.org.cn) by Ruiqiang Li, as has been pointed by ECO.
Euler-SR (Euler-Short Reads Assembly, http://euler-assembler.ucsd.edu/portal/) by Mark J. Chaisson and Pavel A. Pevzner from UCSD. (published in Genome Research)
RMAP (A program for mapping Solexa reads, http://rulai.cshl.edu/rmap/) by Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics)
Short read aligner called Bowtie (http://bowtie-bio.sourceforge.net/) designed for fast mapping of Illumina reads

SNP/Indel Discovery
ssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.
PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.
Maq is also able to find SNPs with its own alignment. It has a graphical viewer, but again for its own alignment format.
SSAHA has been optimized for short-reads, too. But yes, SSAHASNP appears in your "SNP/INDEL discovery" category.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
EagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.
LookSeq - LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data. LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. From the Sanger Centre.
MapView - MapView: visualization of short reads alignment on desktop computer. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China. Linux.
SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here
XMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
BS-Seq - The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.
CHiPSeq - Program used by Johnson et al. (2007) in their Science publication
CNV-Seq - CNV-seq, a new method to detect copy number variation using high-throughput sequencing. Chao Xie and Martti T Tammi at the National University of Singapore. Perl/R.
FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis Package
MACS - Model-based Analysis for ChIP-Seq. MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.
PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a two-pass approach for scoring ChIP-Seq data relative to controls. The first pass identifies putative binding sites and compensates for variation in the mappability of sequences across the genome. The second pass filters out sites that are not significantly enriched compared to the normalized input DNA and computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)
SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.
SeqMap (http://biogibbs.stanford.edu/~jiangh/SeqMap/) - work like ELand, can do 3 or more bp mismatches and also insdel
ChIPSeq analysis is: http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/

See also this thread for ChIP-Seq, until I get time to update this list.

Alternate Base Calling
Rolexa - R-based framework for base calling of Solexa data. Project publication
Alta-cyclic - "a novel Illumina Genome-Analyzer (Solexa) base caller"

Transcriptomics
ERANGE - Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie, BLAT and ELAND. From the Wold lab.
G-Mo.R-Se - G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. From CNS in France.
MapNext - MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China.
QPalma - Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang at Stanford University.
TopHat - TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland and the University of California, Berkeley
NGS-Trex: Next Generation Sequencing Transcriptome profile explorer http://www.biomedcentral.com/1471-2105/14/S7/S10

Reference

Illumina has a software list: http://www.illumina.com/pagesnrn.ilmn?ID=245.

Some softwares in his blog (http://www.fejes.ca/labels/DNA.html)

http://seqanswers.com/wiki/Software

Alignment-free sequence comparison tools available for next-generation sequencing data analysis

Abhimanyu Singh — Tue, 07 Nov 2017 05:33:33 -0600

kallisto

Transcript abundance quantification from RNA-seq data (uses pseudoalignment for rapid determination of read compatibility with targets)

Software (C++)

https://pachterlab.github.io/kallisto/

Sailfish

Estimation of isoform abundances from reference sequences and RNA-seq data (k-mer based)

Software (C++)

http://www.cs.cmu.edu/~ckingsf/software/sailfish/

Salmon

Quantification of the expression of transcripts using RNA-seq data (uses k-mers)

https://combine-lab.github.io/salmon/

RNA-Skim

RNA-seq quantification at transcript-level (partitions the transcriptome into disjoint transcript clusters; uses sig-mers, a special type of k-mers)

Software (C++)

http://www.csbio.unc.edu/rs/

Variant calling

ChimeRScope

Fusion transcript prediction using gene k-mers profiles of the RNA-seq paired-end reads

Software (Java)

https://github.com/ChimeRScope/ChimeRScope/wiki

FastGT

Genotyping of known SNV/SNP variants directly from raw NGS sequence reads by counting unique k-mers

Software (C)

https://github.com/bioinfo-ut/GenomeTester4/

Phy-Mer

Reference-independent mitochondrial haplogroup classifier from NGS data (k-mer based)

Software (Python)

https://github.com/danielnavarrogomez/phy-mer

LAVA

Genotyping of known SNPs (dbSNP and Affymetrix's Genome-Wide Human SNP Array) from raw NGS reads (k-mer based)

Software (C)

http://lava.csail.mit.edu/

MICADo

Detection of mutations in targeted third-generation NGS data (can distinguish patients’ specific mutations; algorithm uses k-mers and is based on colored de Bruijn graphs)

Software (Python)

http://github.com/cbib/MICADo

General mapper

Minimap

Lightweight and fast read mapper and read overlap detector (uses the concept of “minimazers”, a special type of k-mers)

Software (C)

https://github.com/lh3/minimap

Assembly

De novo genome assembly

MHAP

Produces highly continuous assembly (fully resolved chromosome arms) from third-generation long and noisy reads (10 kbp) using a dimensionality reduction technique MinHash

Software (Java)

https://github.com/marbl/MHAP

Miniasm

Assembler of long noisy reads (SMRT, ONT) using the Overlap-Layout Consensus (OLC) approach without the necessity of an error correction stage (uses minimap)

Software (C)

https://github.com/lh3/miniasm

LINKS

Scaffolding genome assembly with error-containing long sequence (e.g., ONT or PacBio reads, draft genomes)

Software (Perl)

https://github.com/warrenlr/LINKS/

Read clustering

afcluster

Clustering of reads from different genes and different species based on k-mer counts

Software (C++)

https://github.com/luscinius/afcluster

QCluster

Clustering of reads with alignment-free measures (k-mer based) and quality values

Software (C++)

http://www.dei.unipd.it/~ciompin/main/qcluster.html

Reads error correction

Lighter

Correction of sequencing errors in raw, whole genome sequencing reads (k-mer based)

Software (C++)

https://github.com/mourisl/Lighter

QuorUM

Error corrector for Illumina reads using k-mers

Software (C++)

https://github.com/gmarcais/Quorum

Trowel

Software (C++)

https://sourceforge.net/projects/trowel-ec/

Metagenomics

Assembly-free phylogenomics

AAF

Phylogeny reconstruction directly from unassembled raw sequence data from whole genome sequencing projects; provides bootstrap support to assess uncertainty in the tree topology (k-mer based)

Software (Python)

https://github.com/fanhuan/AAF

kSNP v3

Reference-free SNP identification and estimation of phylogenetic trees using SNPs (based on k-mer analysis)

Software (C)

https://sourceforge.net/projects/ksnp/files/

NGS-MC

Phylogeny of species based on NGS reads using alignment-free sequence dissimilarity measures d2* and d2 S under different Markov chain models (using k-words)

R package

http://www-rcf.usc.edu/~fsun/Programs/NGS-MC/NGS-MC.html

Species identification/taxonomic profiling

CLARK

Taxonomic classification of metagenomic reads to known bacterial genomes using k-mer search and LCA assignment

Software (C++)

http://clark.cs.ucr.edu/

FOCUS

Reports organisms present in metagenomic samples and profiles their abundances (uses composition-based approach and non-negative least squares for prediction)

Web service Software (Python)

http://edwards.sdsu.edu/FOCUS/

GSM

Estimation of abundances of microbial genomes in metagenomic samples (k-mer based)

Software (Go)

https://github.com/pdtrang/GSM

Mash

Species identification using assembled or unassembled Illumina, PacBio, and ONT data (based on MinHash dimensionality-reduction technique)

Software (C++)

https://github.com/marbl/mash

Kraken

Taxonomic assignment in metagenome analysis by exact k-mer search; LCA assignment of short reads based on a comprehensive sequence database

Software (C++)

https://ccb.jhu.edu/software/kraken/

LMAT

Assignment of taxonomic labels to reads by k-mers searches in precomputed database

Software (C++/Python)

https://sourceforge.net/projects/lmat/

stringMLST

k-mer-based tool for MLST directly from the genome sequencing reads

Software (Python)

http://jordan.biology.gatech.edu/page/software/stringMLST

Taxonomer

k-mer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from clinical and environmental samples

Web service

http://taxonomer.iobio.io/

Other

d2-tools

Word-based (k-tuple) comparison (pairwise dissimilarity matrix using d2S measure) of metatranscriptomic samples from NGS reads

Software (Python/R)

https://code.google.com/p/d2-tools/

VirHostMatcher

Prediction of hosts from metagenomic viral sequences based on ONF using various distance measures (e.g., d2)

Software (C++)

https://github.com/jessieren/VirHostMatcher

MetaFast

Statistics calculation of metagenome sequences and the distances between them based on assembly using de Bruijn graphs and Bray–Curtis dissimilarity measure

Software (Java)

https://github.com/ctlab/metafast

seqloc 0.6

Gudiya Pal — Sun, 28 Dec 2014 12:51:29 -0600

The Bio.SeqLoc modules in seqloc are designed to represent positions and locations (ranges of positions) on sequences, particularly nucleotide sequences. My original motivation for writing these packages was handing the locations of genes in eukaryotic genomes.

Handle sequence locations for bioinformatics http://www.ingolia-lab.org/seqloc-tutorial.html

Address of the bookmark: http://www.stackage.org/snapshot/nightly-2014-12-28/package/seqloc-0.6