BOL: Related items

GA4GH Data Working Group

Jitendra Prajapati — Sun, 20 Mar 2016 23:13:07 -0500

GA4GH Data Working Group

Led by David Haussler (UCSC) and Richard Durbin (Sanger Institute), the Data Working Group (DWG) of the Global Alliance brings together the leading Genome Institutes and Centers with IT industry leaders to create global standards and tools for the secure, privacy respecting and interoperable sharing of Genomic data.

More at http://ga4gh.org/#/

Address of the bookmark: http://ga4gh.org/#/

Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data

Shruti Paniwala — Thu, 25 Oct 2018 06:14:55 -0500

Platypus is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, it has been run on very large datasets as part of the Thousand Genomes and WGS500 projects, and is being used in clinical sequencing trials in the Mainstreaming Cancer Genetics programme.

Tutorial https://github.com/andyrimmer/Platypus/blob/master/misc/README.txt

Address of the bookmark: http://www.well.ox.ac.uk/platypus

Comparative Genomics Data Set Including 240 Mammals Released !

Jit — Thu, 19 Nov 2020 06:45:39 -0600

The genome of 130 mammals was sequenced by a large international consortium and the data was analyzed together with 110 existing genomes to allow scientists to identify the important positions in the DNA. This report, published in Nature today will help advance research on human disease mutations and inform how best to protect endangered species.

In addition to the knowledge of the human genome, all these genomes, widely sampled across mammals, can be used to research how particular organisms respond to different conditions. Some otters, for example, have a thick, water-resistant shell, and some rodents, but not all, have adapted to hibernation. These animal traits will help us to understand human traits, such as metabolic diseases.

With climate change and more animal ecosystems being threatened by human activity, the protection of endangered species is becoming increasingly important. Scientists have historically researched several people in various populations of a species to understand the genetic variation that occurs in that species. This is important for understanding how particular species can be protected. In this study, animals on the Red List of Endangered Species of the International Union for Conservation of Nature had fewer differences in their genomes, which is consistent with their endangered status.

Ref @ A comparative genomics multitool for scientific discovery and conservation https://www.nature.com/articles/s41586-020-2876-6

Data at http://zoonomiaproject.org/

AMR Database !

LEGE — Tue, 04 Jun 2024 13:37:21 -0500

ARG-ANNOT. PMID: 24145532
CARD. PMID: 23650175
MEGARes PMID: 27899569
NCBI BioProject: PRJNA313047
plasmidfinder PMID: 24777092
resfinder. PMID: 22782487
VFDB. PMID: 26578559
SRST2's version of ARG-ANNOT. PMID: 25422674.
VirulenceFinder PMID: 24574290.

Address of the bookmark: https://github.com/sanger-pathogens/ariba/wiki/Task%3A-getref

What is Data Science? — A Bioinformatics Perspective

Abhi — Mon, 16 Jun 2025 01:44:34 -0500

In today’s era of big biology, we’re generating more data than ever before—genomes, transcriptomes, proteomes, metabolomes, microbiomes… you name it. But raw biological data doesn’t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.

So, What Is Data Science?
At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.

Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes—these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.

Data Science Meets Bioinformatics
Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:

Clean and process massive datasets

Discover patterns in high-dimensional data

Build predictive models (e.g., for disease classification)

Visualize complex biological networks and trends

Integrate diverse data types (e.g., transcriptomic + epigenomic data)

The Bioinformatics Toolkit
Here’s what data science typically looks like in bioinformatics:

Task Data Science Role
Sequence alignment Efficient algorithms, indexing, parallel processing
Gene expression analysis Statistical modeling (e.g., DESeq2, limma)
Variant calling Data filtering, probabilistic models
Clustering of cells in single-cell data Unsupervised learning
Protein structure prediction Deep learning models (e.g., AlphaFold)
Metagenomics Data integration, classification, dimensionality reduction

Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow—often working together in reproducible workflows.

It's Not Just About Coding
A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:

Understanding experimental design

Asking biologically meaningful questions

Choosing the right statistical or machine learning models

Communicating findings effectively (e.g., plots, dashboards, papers)

In other words, data science in bioinformatics is where biology, statistics, and computer science converge.

Why It Matters
The real power of data science in bioinformatics is its ability to scale discovery.

Instead of studying one gene, we can study thousands.

Instead of analyzing one species, we can explore entire ecosystems.

Instead of waiting months for lab results, we can generate hypotheses in days.

From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.

Final Thoughts
If you’re a biologist who’s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground—and data science is your toolkit.

In bioinformatics, data science isn’t just useful. It’s essential.

S-plot2: Rapid Visual and Statistical Analysis of Genomic Sequences

Abhimanyu Singh — Tue, 02 Oct 2018 17:57:27 -0500

S-plot2 creates an interactive, two-dimensional heatmap capturing the similarities and dissimilarities in nucleotide usage between genomic sequences (partial or complete). In S-plot2, whole eukaryotic chromosomes and smaller prokaryotic genomes can be efficiently compared. The tool includes functionality to extract, analyze, and automate BLAST queries of regions of interest within the heatmap. This facilitates the investigation of quickly evolving coding regions, novel coding regions, and laterally transferred elements.

Address of the bookmark: https://bitbucket.org/lkalesinskas/splot

ULTRA (ULTRA Locates Tandemly Repetitive Areas) : Effective Labeling of Repetitive Genomic Sequence

Abhi — Sat, 08 Jun 2024 16:03:39 -0500

ULTRA is a tool to find and annotate tandem repeats inside genomic sequence. It is able to find repeats of any length and of any period (up to a maximum period of 4000). It can find highly decayed repeats missed by other software, and it will also be able to find very large repeats in highly repetitive sequence, regardless of the size of sequence or length of repeats. ULTRA offers meaningful annotation scores and can produce annotation P-values at user request.

More at https://www.biorxiv.org/content/10.1101/2024.06.03.597269v1

Address of the bookmark: https://github.com/TravisWheelerLab/ULTRA

Web Apollo: a web-based genomic annotation editing platform

Jit — Fri, 28 Jul 2017 04:48:17 -0500

Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

Address of the bookmark: http://genomearchitect.github.io/

MGcV: the microbial genomic context viewer for comparative genome analysis

Jit — Mon, 29 Jan 2018 04:55:46 -0600

MGcV is an interactive web-based visalization tool tailored to facilitate small scale genome analysis. To start using MGcV:

Supply your genes/genomic segments/phylogenetic tree of interest in the input-box by
- selecting the type of identifier and pasting identifiers (one per line)
- or by using the gene ID search tool
- or with the BLAST search tool
Click "Visualize context".

Consult the documentation to learn more about MGcV.

Address of the bookmark: http://mgcv.cmbi.ru.nl/

swgis v2.0 : a seqword genomic island sniffer

Abhimanyu Singh — Thu, 01 Nov 2018 12:35:52 -0500

swgis v2.0 is the modified version of the seqword genomic island sniffer. this version is specifically optimized for predicting genomic islands in eukaryotic genomes. swgis v2.0 was tested on several eukaryotic species of different lineages. all identified genomic islands were deposited in the eugi database.

download swgis v2.0

Address of the bookmark: http://eugi.bi.up.ac.za/eugi_download_swgis.php