BOL: Related items

Submit your SARS-CoV-2 sequence data to GenBank

Neel — Thu, 09 Apr 2020 18:28:25 -0500

Submit your SARS-CoV-2 sequence data to GenBank and SRA with our new submission landing page. Submission is simple and streamlined *and* there’s a rapid turnaround. https://submit.ncbi.nlm.nih.gov/sarscov2/

Quickly and easily add your SARS-CoV-2 sequence data to the growing public archive with new, special features and support from NCBI. new SARS-CoV-2 sequence submission landing page will help you get started. GenBank submissions are accessioned and released in approximately 1-2 working days, and Sequence Read Archive (SRA) submissions typically processed and released within hours. Submission is simple!

More information is available on NCBI Insights. https://ncbiinsights.ncbi.nlm.nih.gov/2020/04/09/sars-cov2-data-streamlined-submission-rapid-turnaround/

What is Data Science? — A Bioinformatics Perspective

Abhi — Mon, 16 Jun 2025 01:44:34 -0500

In today’s era of big biology, we’re generating more data than ever before—genomes, transcriptomes, proteomes, metabolomes, microbiomes… you name it. But raw biological data doesn’t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.

So, What Is Data Science?
At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.

Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes—these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.

Data Science Meets Bioinformatics
Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:

Clean and process massive datasets

Discover patterns in high-dimensional data

Build predictive models (e.g., for disease classification)

Visualize complex biological networks and trends

Integrate diverse data types (e.g., transcriptomic + epigenomic data)

The Bioinformatics Toolkit
Here’s what data science typically looks like in bioinformatics:

Task Data Science Role
Sequence alignment Efficient algorithms, indexing, parallel processing
Gene expression analysis Statistical modeling (e.g., DESeq2, limma)
Variant calling Data filtering, probabilistic models
Clustering of cells in single-cell data Unsupervised learning
Protein structure prediction Deep learning models (e.g., AlphaFold)
Metagenomics Data integration, classification, dimensionality reduction

Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow—often working together in reproducible workflows.

It's Not Just About Coding
A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:

Understanding experimental design

Asking biologically meaningful questions

Choosing the right statistical or machine learning models

Communicating findings effectively (e.g., plots, dashboards, papers)

In other words, data science in bioinformatics is where biology, statistics, and computer science converge.

Why It Matters
The real power of data science in bioinformatics is its ability to scale discovery.

Instead of studying one gene, we can study thousands.

Instead of analyzing one species, we can explore entire ecosystems.

Instead of waiting months for lab results, we can generate hypotheses in days.

From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.

Final Thoughts
If you’re a biologist who’s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground—and data science is your toolkit.

In bioinformatics, data science isn’t just useful. It’s essential.

SAMHAR-COVID19 Hackathon

BioStar — Fri, 17 Apr 2020 06:47:10 -0500

Centre for Development of Advanced Computing (C-DAC) under the aegis of the National Supercomputing Mission (NSM), a Ministry of Electronics & Information Technology (MeitY) and Department of Science & Technology (DST) initiative, in association with NVIDIA & OpenACC, announces the SAMHAR-COVID19 Hackathon.

Pandemic outbreak such as Coronavirus outbreak can create huge challenges for the Government and Public Health Officials to gather information quickly and coordinate a response. In such a situation, Artificial Intelligence (AI) can play a huge role in predicting, minimizing and stalling its spread of the virus.

C-DAC has embarked on a program SAMHAR-COVID19 (Supercomputing using AI, ML, Healthcare Analytics based Research for combating COVID19). This opportunity will provide researchers to find solutions for Identifying, Tracking and Forecasting outbreaks of COVID19 and Facilitating Drug Discovery as well.

Participants can update submissions multiple times till the Registration End date.
Each entry can be submitted by a Team comprising of minimum 3 and maximum 5 members (Including the Team Lead).
Participants will have to share the complete work activities with C-DAC. And C-DAC will have right to use the submitted application/solution for SAMHAR-COVID19 programs.
The Award will be given to the Selected/Winning Entry irrespective of the number of members in the Team (members may choose to distribute the amount among themselves).
The decision of the Eminent Jury on the I3 Award will be final and binding.
Award can be for the Team/Company/Institution, as submitted in the Application and cannot be changed later.
Submissions will be considered void if they are in whole or part ill-eligible, incomplete, damaged, altered, counterfeit, obtained through fraud or late submission.

More at https://samhar-covid19hackathon.cdac.in/

Phytozome v12.1: plant science community hub for accessing palnts genomic data

Surabhi Chaudhary — Tue, 17 Mar 2020 07:30:17 -0500

Phytozome, the Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute, provides JGI users and the broader plant science community a hub for accessing, visualizing and analyzing JGI-sequenced plant genomes, as well as selected genomes and datasets that have been sequenced elsewhere. As of release v12.1.6, Phytozome hosts 93 assembled and annotated genomes, from 82 Viridiplantae species. More than half of these genomes have been sequenced, assembled and/or annotated with JGI Plant Science program resources. By integrating this large collection of plant genomes into a single resource and performing comprehensive and uniform annotation and analyses, Phytozome facilitates accurate and insightful comparative genomics studies.

Address of the bookmark: https://phytozome.jgi.doe.gov/pz/portal.html

NASA Open Science Data Repository

Abhi — Wed, 18 Dec 2024 11:54:47 -0600

The NASA Open Science Data Repository (OSDR) enables access to space-related data from experiments and missions that investigate biological and health responses of terrestrial life to spaceflight. The goal of OSDR is to enable multi-modal and multi-hierarchical fundamental space life science data be reused toward basic science, applied science, and operational outcomes for space exploration and knowledge discovery. These data include ‘omics, phenotypic, physiological, behavioral, hardware, environmental telemetry; raw, processed; tabular, text, code, bioimaging, and video.

https://www.nasa.gov/reference/osdr-data-processing/

Address of the bookmark: https://www.nasa.gov/osdr/

karyoploteR: plot whole genomes with arbitrary data

Abhimanyu Singh — Fri, 02 Feb 2018 03:24:28 -0600

karyoploteR is an R package to create karyoplots, that is, representations of whole genomes with arbitrary data plotted on them. It is inspired by the R base graphics system and does not depend on other graphics packages. The aim of karyoploteR is to offer the user an easy way to plot data along the genome to get broad genome-wide view to facilitate the identification of genome wide relations and distributions.

Address of the bookmark: https://bernatgel.github.io/karyoploter_tutorial/

Scripts for the analysis of HGT in genome sequence data.

Jit — Wed, 29 Nov 2017 16:44:10 -0600

Scripts for the analysis of HGT in genome sequence data

Address of the bookmark: https://github.com/reubwn/hgt

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

Jit — Thu, 19 Apr 2018 08:06:03 -0500

Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both end of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html).

Address of the bookmark: https://github.com/meiji-bioinf/heap

Epiviz: an interactive visualization tool for functional genomics data.

Jit — Mon, 09 Jul 2018 05:27:39 -0500

Epiviz is an interactive visualization tool for functional genomics data. It supports genome navigation like other genome browsers, but allows multiple visualizations of data within genomic regions using scatterplots, heatmaps and other user-supplied visualizations. It also includes data from the Gene Expression Barcode project for transcriptome visualization. It has a flexible plugin framework so users can addd3 visualizations. You can see a video tour here.

https://bioconductor.org/packages/release/bioc/html/epivizr.html

https://github.com/epiviz

https://github.com/epiviz/epiviz

Address of the bookmark: https://epiviz.github.io/

VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules

Rahul Nayak — Thu, 04 Oct 2018 16:30:44 -0500

VariantBam is a tool to extract/count specific sets of sequencing reads from next-generational sequencing files. To save money, disk space and I/O, one may not want to store an entire BAM on disk. In many cases, it would be more efficient to store only those read-pairs or reads who intersect some region around the variant locations. Alternatively, if your scientific question is focused on only one aspect of the data (e.g. breakpoints), many reads can be removed without losing the information relevant to the problem.

Address of the bookmark: https://github.com/broadinstitute/VariantBam