BOL: Related items

HIV genome database !

Rahul Nayak — Fri, 21 Jan 2022 05:40:15 -0600

HIV resources

https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Address of the bookmark: https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Human Complete Genome

Shruti Paniwala — Wed, 06 Jul 2022 06:42:55 -0500

Telomere-to-telomere consortium

We have sequenced the CHM13hTERT human cell line with a number of technologies. Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. The data includes 30x PacBio HiFi, 120x coverage of Oxford Nanopore, 70x PacBio CLR, 50x 10X Genomics, as well as BioNano DLS and Arima Genomics HiC. Most raw data is available from this site, with the exception of the PacBio data which was generated by the University of Washington/PacBio and is available from NCBI SRA.

A UCSC browser is available for v2.0 (as well as legacy v1.0 and v1.1 versions). An interactive dotplot visualization of all genomic repeats is also available from resgen.io. Known issues identified in the assembly are tracked at CHM13 issues.

MORE at https://github.com/marbl/CHM13

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987

Genome Context Viewer (GCV)

LEGE — Sun, 21 May 2023 19:33:43 -0500

The Genome Context Viewer (GCV) is a web-app that visualizes genomic context data provided by third party services. Specifically, it uses functional annotations as a unit of search and comparison. By adopting a common set of annotations, data-store operators can deploy federated instances of GCV, allowing users to compare genomes from different providers in a single interface.

Address of the bookmark: https://github.com/legumeinfo/gcv

CGView.js is a Circular Genome Viewing tool

LEGE — Wed, 27 Mar 2024 11:16:24 -0500

CGView.js is a Circular Genome Viewing tool for visualizing and interacting with small genomes. This software is an adaptation of the Java program CGView.

CGView.js is the genome viewer of Proksee, an expert system for genome assembly, annotation and visualization.

Features

Circular and linear views of genomes
Capable of drawing genomes up to 10 Mbp with 1000's of features and 100's contigs
Smooth zooming down to the sequence level
Easily generate features and plots directly form the sequence (e.g. ORFs, GC-content and GC-Skew)
Save high resolution PNG maps up to 8000x8000px
Fully documented API for interacting with CGView.js maps

Address of the bookmark: https://js.cgview.ca/

The Role of lncRNA in Bioinformatics: Unlocking the Secrets of the Genome

LEGE — Sat, 07 Dec 2024 02:09:47 -0600

In the intricate dance of molecular biology, long non-coding RNAs (lncRNAs) have emerged as key players, capturing the interest of researchers worldwide. These RNA molecules, once dismissed as "junk," have proven to be vital in the regulation of gene expression, cellular processes, and the progression of diseases. The intersection of lncRNA studies and bioinformatics is transforming our understanding of these enigmatic molecules, offering profound insights into their structure, function, and therapeutic potential.

What Are lncRNAs?

lncRNAs are RNA transcripts longer than 200 nucleotides that do not code for proteins. Despite their non-coding nature, they play diverse roles in gene regulation, including chromatin remodeling, transcriptional control, and post-transcriptional processing. Unlike messenger RNAs (mRNAs), lncRNAs often function as scaffolds, decoys, or guides in cellular machinery, influencing biological processes such as cell differentiation, immune response, and even cancer metastasis.

Challenges in lncRNA Research

Identifying and understanding lncRNAs pose unique challenges:

High Sequence Variability: Unlike protein-coding genes, lncRNAs exhibit low sequence conservation across species, making functional predictions difficult.
Low Expression Levels: lncRNAs are often expressed at low levels, complicating their detection in transcriptomic data.
Diverse Functions: The multifunctional nature of lncRNAs requires advanced computational tools to decipher their roles in complex networks.

Bioinformatics: A Crucial Ally in lncRNA Research

Bioinformatics bridges the gap between raw biological data and meaningful insights, making it indispensable in lncRNA research. Here’s how:

1. Identification and Annotation

High-throughput sequencing technologies like RNA-seq generate vast amounts of data. Bioinformatics tools such as StringTie, Cufflinks, and HISAT2 help assemble and annotate lncRNAs from this data. Additionally, databases like NONCODE, LNCipedia, and Ensembl provide curated repositories of lncRNA sequences and annotations.

2. Functional Prediction

Bioinformatics algorithms predict the potential functions of lncRNAs by analyzing their interactions with DNA, RNA, and proteins. Tools like LncRNA2Function and RIblast utilize sequence motifs and secondary structure predictions to hypothesize about the roles of specific lncRNAs.

3. Network Construction

lncRNAs often act as regulatory hubs. Bioinformatics platforms such as Cytoscape enable the visualization of lncRNA-mediated networks, elucidating their roles in pathways like cell cycle regulation and apoptosis.

4. Epigenetic Studies

lncRNAs are known to interact with chromatin-modifying complexes, influencing gene expression epigenetically. Tools like ChIP-seq and ATAC-seq, combined with computational pipelines, identify these interactions and map them to the genome.

5. Clinical Applications

Bioinformatics aids in the discovery of lncRNA biomarkers for diseases like cancer and neurodegenerative disorders. Machine learning models analyze differential expression profiles, helping prioritize lncRNAs with therapeutic potential.

Case Study: lncRNAs in Cancer Research

lncRNAs such as HOTAIR and MALAT1 have been implicated in cancer progression. Bioinformatics analyses have revealed their roles in promoting metastasis and altering the tumor microenvironment. For example, transcriptome analysis in cancer patients identifies lncRNA expression signatures, enabling precision medicine approaches.

Future Directions

The fusion of bioinformatics with experimental biology is unlocking the secrets of lncRNAs. Advances in artificial intelligence, single-cell sequencing, and structural modeling promise to overcome current limitations. Here are some promising directions:

Integrative Analysis: Combining multi-omics data to understand the interplay of lncRNAs with other biomolecules.
CRISPR Screens: Leveraging bioinformatics to design CRISPR-based functional screens for lncRNAs.
Therapeutic Development: Using bioinformatics to design lncRNA-based therapeutics, including antisense oligonucleotides and RNA interference tools.

Conclusion

lncRNAs are the hidden gems of the genome, and bioinformatics is the key to unearthing their full potential. As research progresses, lncRNAs could pave the way for novel diagnostics, targeted therapies, and personalized medicine, revolutionizing our approach to complex diseases.

The journey into the world of lncRNAs is only beginning, and bioinformatics will continue to play a pivotal role in decoding these molecular mysteries. Whether you’re a researcher, clinician, or bioinformatics enthusiast, the study of lncRNAs offers a fascinating frontier of discovery.

Genome Simulation with SLiM and msprime

BioStar — Fri, 31 Jan 2025 12:47:43 -0600

Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are SLiM and msprime. While both serve different purposes, they can be used together with the slendr framework to compare simulation outputs effectively.

Overview of SLiM and msprime

SLiM: Forward Genetic Simulator

SLiM is a free, open-source tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.

Key Features of SLiM:

Simulates population evolution forward in time
Supports custom evolutionary models using an embedded scripting language
Allows modeling of spatial and ecological dynamics
Provides high flexibility and extensibility for user-defined scenarios
Available on GitHub as an open-source project

msprime: Ancestry and Mutation Simulator

msprime is an efficient, open-source tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.

Key Features of msprime:

Implements coalescent simulations for ancestry modeling
Efficiently simulates large population histories
Supports the addition of mutations to genealogies
Developed using an open-source community model
Often faster and more memory-efficient than alternative simulators

Using SLiM and msprime with slendr

Both SLiM and msprime can be integrated with slendr, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.

How They Work Together:

SLiM and msprime simulations can be analyzed within slendr.
The ts_read() function in slendr enables loading and comparing tree sequence outputs from both simulators.
This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.

Performance Considerations

While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its speed and memory efficiency when simulating ancestry and mutations. The choice between the two depends on the research goals:

For detailed evolutionary modeling with selection and recombination: Use SLiM.
For large-scale coalescent simulations with mutations: Use msprime.
For comparing different simulation models and their outputs: Use slendr to integrate SLiM and msprime results.

Conclusion

SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.

For more information, check out the official GitHub repositories for SLiM and msprime, and explore the slendr framework for streamlined simulation workflow

Molecular Genetics Lecture

Rahul Agarwal — Fri, 27 Sep 2013 04:24:45 -0500

"Robert Sapolsky makes interdisciplinary connections between behavioral biology and molecular genetic influences. He relates protein synthesis and point mutations to microevolutionary change, and discusses conflicting theories of gradualism and punctuated equilibrium and the influence of epigenetics on development theories."

"Robert Sapolsky is an American neuroendocrinologist, professor of biology, neuroscience, and neurosurgery at Stanford University, researcher and author" ----Wikipedia

Address of the bookmark: http://www.youtube.com/watch?v=_dRXA1_e30o

Postdoctoral Associate - Bioinformatics at Duke University Medical Center

Sat, 10 Aug 2013 18:38:38 -0500

The Department of Biostatistics and Bioinformatics at Duke University Medical Center is seeking a Postdoctoral Associate for a one year appointment to work on several high-dimensional research projects. The specific goals of the project are to identify genes or molecular markers that are predictive of clinical outcomes in renal and prostate cancer.

Candidates must have: a PhD degree in statistics, biostatistics or bioinformatics, extensive experience in analyzing high-dimensional data (microarray, SNP, CNVs) and of validation approaches. In addition, experience in penalized regression methods, data base manipulation; and strong programming skills in order to conduct Monte Carlo studies and applications (R). Candidate must have excellent communication skills (verbal, written and presentation), a strong proficiency in Linux system.

This position is available immediately and will be filled as soon as possible. Appointment could be extended beyond the first year based on additional funding.

For more information about the Department of Biostatistics and Bioinformatics, please visit our website: http://www.biostat.duke.edu.

For more info: http://biostat.duke.edu/sites/biostat.duke.edu/files/Halabi%20-%20Postdoc%20Job%20Posting%202013%20updated.pdf

Duke University is an Equal Opportunity/Affirmative Action Employer.

Commercial and public next-gen-seq (NGS) software

Surabhi Chaudhary — Tue, 03 Jun 2014 20:45:11 -0500

Integrated solutions
CLCbio Genomics Workbench - de novo and reference assembly of Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Windows, Mac OS X and Linux.
Galaxy - Galaxy = interactive and reproducible genomics. A job webportal.
Genomatix - Integrated Solutions for Next Generation Sequencing data analysis.
JMP Genomics - Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
NextGENe - de novo and reference assembly of Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Win or MacOS.
Partek - Commercial software for NGS, microarray, and qPCR data analysis. Streamlined analysis workflows for: ChIP-Seq, RNA-Seq, DNA-Seq, DNA Methylation, Gene Expression, Exon, miRNA Expression, Copy Number, Allele-Specific Copy Number, LOH, Association, Trio Analysis, and Tiling. Supports all commercial sequencing and microarray technologies.
SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
SHORE - SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A suite created by the 1001 Genomes project. Source for POSIX.
SlimSearch - Fledgling commercial product.
Synamatix has SXOligoSearch (http://synasite.mgrc.com.my:8080/sxo...ligoSearch.php)
The SWIFT suit is a software collection for fast index-based sequence comparison. It contains the following programs: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences; SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. http://bibiserv.techfak.uni-bielefeld.de/swift/
biolib.is library and a set of script targeted to NGS. There are modules to: clean sequences (sanger, 454, ilumina), parse caf, ace and bowtie map files, clean and filter contigs, look for snps and indels., filter snps, do statistics for: reads, contigs and snps.

Align/Assemble to a reference
BFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
BWA - Heng Lee's BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.
ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.
Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.
GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.
MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source
MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.
MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.
Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.
PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's.
SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.
Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.
SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.
SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.
Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.
Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data.
NCGR uses GMAP (http://www.gene.com/share/gmap/) to alignment Solexa reads. GMAP is free, though.
Exonerate (http://www.ebi.ac.uk/~guy/exonerate/)
MUMmer (http://mummer.sourceforge.net/)
The mapping short reads called gnumap (http://dna.cs.byu.edu/gnumap/) made to increase the accuracy with duplicate matches. Open source, creates viewable output (with Affy's Integrated Genome Browser), and produces results very similar to novocraft's.
SOCS (short oligonucleotides in color space)
BFAST https://secure.genome.ucla.edu/index.php/BFAST

De novo Align/Assemble
ABySS - Assembly By Short Sequences. ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. By Simpson JT and others at the Canada's Michael Smith Genome Sciences Centre. C++ as source.
ALLPATHS - ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can generate high quality assemblies from short reads. Assemblies are presented in a graph form that retains ambiguities, such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. Broad Institute.
Edena - Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Edena is based on the traditional overlap layout paradigm. By D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
EULER-SR - Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
SEQAN - A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
SSAKE - The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
SOAPdenovo - Part of the SOAP suite. See above.
VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).
SOAP (http://soap.genomics.org.cn) by Ruiqiang Li, as has been pointed by ECO.
Euler-SR (Euler-Short Reads Assembly, http://euler-assembler.ucsd.edu/portal/) by Mark J. Chaisson and Pavel A. Pevzner from UCSD. (published in Genome Research)
RMAP (A program for mapping Solexa reads, http://rulai.cshl.edu/rmap/) by Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics)
Short read aligner called Bowtie (http://bowtie-bio.sourceforge.net/) designed for fast mapping of Illumina reads

SNP/Indel Discovery
ssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.
PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.
Maq is also able to find SNPs with its own alignment. It has a graphical viewer, but again for its own alignment format.
SSAHA has been optimized for short-reads, too. But yes, SSAHASNP appears in your "SNP/INDEL discovery" category.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
EagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.
LookSeq - LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data. LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. From the Sanger Centre.
MapView - MapView: visualization of short reads alignment on desktop computer. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China. Linux.
SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here
XMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
BS-Seq - The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.
CHiPSeq - Program used by Johnson et al. (2007) in their Science publication
CNV-Seq - CNV-seq, a new method to detect copy number variation using high-throughput sequencing. Chao Xie and Martti T Tammi at the National University of Singapore. Perl/R.
FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada's Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis Package
MACS - Model-based Analysis for ChIP-Seq. MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.
PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a two-pass approach for scoring ChIP-Seq data relative to controls. The first pass identifies putative binding sites and compensates for variation in the mappability of sequences across the genome. The second pass filters out sites that are not significantly enriched compared to the normalized input DNA and computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)
SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.
SeqMap (http://biogibbs.stanford.edu/~jiangh/SeqMap/) - work like ELand, can do 3 or more bp mismatches and also insdel
ChIPSeq analysis is: http://dir.nhlbi.nih.gov/papers/lmi/epigenomes/sissrs/

See also this thread for ChIP-Seq, until I get time to update this list.

Alternate Base Calling
Rolexa - R-based framework for base calling of Solexa data. Project publication
Alta-cyclic - "a novel Illumina Genome-Analyzer (Solexa) base caller"

Transcriptomics
ERANGE - Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie, BLAT and ELAND. From the Wold lab.
G-Mo.R-Se - G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. From CNS in France.
MapNext - MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China.
QPalma - Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang at Stanford University.
TopHat - TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland and the University of California, Berkeley
NGS-Trex: Next Generation Sequencing Transcriptome profile explorer http://www.biomedcentral.com/1471-2105/14/S7/S10

Reference

Illumina has a software list: http://www.illumina.com/pagesnrn.ilmn?ID=245.

Some softwares in his blog (http://www.fejes.ca/labels/DNA.html)

http://seqanswers.com/wiki/Software

Variant Calling Pipeline

LEGE — Sat, 19 Oct 2024 12:23:40 -0500

The variantcalling.nf nextflow script will take any number of samples with paired-end reads in FASTQ format, map reads using Bowtie2, process BAM files, and finally call variants using BCFtools v1.21 and/or Freebayes v1.3.6. If part of the pipeline is unsuccessful for a sample then these errors are ignored.

Pipeline flowchart:

Dependencies (version tested)

Nextflow (24.04.4)
Java (18.0.2.1)
Python (3.10)
Perl (5.32.1)
Bowtie2 (2.5.3)
SAMtools (1.19.2)
GATK4 (4.5)
BCFtools (1.21)
Freebayes (1.3.6)

Address of the bookmark: https://github.com/Tom-Jenkins/nextflow-pipelines/blob/main/docs/variant-calling.md