BOL: Related items

Julia Programming Language, a Python and R rival

Radha Agarkar — Sat, 25 Aug 2018 04:46:39 -0500

Big data has grown to become one of the most lucrative fields. In fact, data scientists are some of the most sought people. They are usually hired to analyze, control and parse large chunks of data. Implementing these actions using traditional techniques is not a walk in the park. This is why most data scientists prefer using programming languages such as R and Python. However, there is one more programming language that can do the job. That is Julia programming language.

What Is Julia Language?

Julia is a programming language that came into the limelight in 2012. It is a general-purpose programming language that was designed for solving scientific computations. Julia was meant to be an alternative to Python, R and other programming languages that were mainly used for manipulating data. This is because it has numerous features that can minimize the complexities of numerical computations.

Julia optimizes on the best features of Python and R while at the same time overlooks their weaknesses. This explains why it is viewed as an alternative to these programming languages. For instance, it utilizes the readability and simplicity of Python then performs faster.

Julia is the most preferred programming language for data scientists and mathematicians. This is because its core features are similar to the ones that are used on most data software. Also, the language is ideal for these two subjects because its syntax is similar to the standard mathematical formulas.

Key Features Of Julia Language
Uses JIT Compilation
Parallelism
Dynamic Typing
Simple Syntax
Allows Metaprogramming
Accessible to Libraries
-1-Array Indexing

Julia Vs Python And R Programming Languages
1. Speed
Julia is faster than both Python and R. This is a very critical aspect that is given special attention in the big data programming. The high speed of Julia is because of JIT compilers. You will need to install external libraries on Python to achieve similar speed.

2. Syntax
Julia has a math-friendly syntax. The syntax of this programming language is similar to the mathematical formulas hence can be used to perform mathematical and scientific computations. This syntax makes it easier to learn than Python.

3. Parallelism
Although both Python and R use parallelism, Julia uses a top-level parallelism. Julia allows the processor to perform to the optimum level than what Python and R can achieve.

4. Versatility
Julia programming language is more versatile than Python and R. It allows a programmer to move from different codes and functions with ease.

The only area that Python and R are superior to Julia is in terms of community. Given that Julia is a new programming language, it has a small community as compared to others which have been around for years.

In overall Julia programming language is a better alternative that you can use to handle Big data projects. Despite having a small community, it is one of those programming languages that you can easily learn.

EyeChrom: Visualizing Chromosome Count Data From Plants

Jit — Tue, 08 Jan 2019 10:20:54 -0600

It's goal is to show chromosmal data per genus. Select the genus, and the plot will show the records found for it in the Chromosome Counts Database. note: Report an issue via Gihub: github.com/roszenil/CCDBcurator and github.com/RodrigoRivero/EyeChrom

https://bsapubs.onlinelibrary.wiley.com/doi/pdf/10.1002/aps3.1207

Address of the bookmark: http://eyechrom.com:3838/EyeChrom/

DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution

Rahul Nayak — Tue, 03 Mar 2020 01:12:47 -0600

DeepHiC is a GAN-based model for enhancing Hi-C data resolution. We developed this server for helping researchers to enhance their own low-resolution data by a few steps of clicks. Ab initio training could be performed according to our published code. We provided trained models for various depth of low-coverage sequencing Hi-C data. The depth of input data is estimated by its distribution comparing with those of the downsampled Hi-C data we used in training

Address of the bookmark: http://sysomics.com/deephic

AMR Database !

LEGE — Tue, 04 Jun 2024 13:37:21 -0500

ARG-ANNOT. PMID: 24145532
CARD. PMID: 23650175
MEGARes PMID: 27899569
NCBI BioProject: PRJNA313047
plasmidfinder PMID: 24777092
resfinder. PMID: 22782487
VFDB. PMID: 26578559
SRST2's version of ARG-ANNOT. PMID: 25422674.
VirulenceFinder PMID: 24574290.

Address of the bookmark: https://github.com/sanger-pathogens/ariba/wiki/Task%3A-getref

What is Data Science? — A Bioinformatics Perspective

Abhi — Mon, 16 Jun 2025 01:44:34 -0500

In today’s era of big biology, we’re generating more data than ever before—genomes, transcriptomes, proteomes, metabolomes, microbiomes… you name it. But raw biological data doesn’t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.

So, What Is Data Science?
At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.

Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes—these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.

Data Science Meets Bioinformatics
Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:

Clean and process massive datasets

Discover patterns in high-dimensional data

Build predictive models (e.g., for disease classification)

Visualize complex biological networks and trends

Integrate diverse data types (e.g., transcriptomic + epigenomic data)

The Bioinformatics Toolkit
Here’s what data science typically looks like in bioinformatics:

Task Data Science Role
Sequence alignment Efficient algorithms, indexing, parallel processing
Gene expression analysis Statistical modeling (e.g., DESeq2, limma)
Variant calling Data filtering, probabilistic models
Clustering of cells in single-cell data Unsupervised learning
Protein structure prediction Deep learning models (e.g., AlphaFold)
Metagenomics Data integration, classification, dimensionality reduction

Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow—often working together in reproducible workflows.

It's Not Just About Coding
A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:

Understanding experimental design

Asking biologically meaningful questions

Choosing the right statistical or machine learning models

Communicating findings effectively (e.g., plots, dashboards, papers)

In other words, data science in bioinformatics is where biology, statistics, and computer science converge.

Why It Matters
The real power of data science in bioinformatics is its ability to scale discovery.

Instead of studying one gene, we can study thousands.

Instead of analyzing one species, we can explore entire ecosystems.

Instead of waiting months for lab results, we can generate hypotheses in days.

From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.

Final Thoughts
If you’re a biologist who’s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground—and data science is your toolkit.

In bioinformatics, data science isn’t just useful. It’s essential.

Carrot2 clustering engine

LEGE — Fri, 07 Apr 2023 13:11:24 -0500

This is the demo application of the Carrot² clustering engine. It uses Carrot²'s algorithms to organize search results into thematic folders.

User interfaces

Web Search Clustering organizes search results from public search engines into clusters; offers treemap- and pie-chart visualizations of the clusters.
Clustering Workbench clusters content from local files in JSON or Excel format, Solr or Elasticsearch; allows tuning of clustering parameters and exporting results as Excel or JSON.

Search engines

Web: web search results provided by etools.ch. Extensive use may require special arrangements with the owner of the etools.ch service.
PubMed: abstracts of medical papers from the PubMed database provided by NCBI.
Local file: content read from a local file in Carrot2 XML, JSON, CSV or Excel format.
Solr: queries an Apache Solr instance.
Elasticsearch: queries an Elasticsearch instance.

Clustering algorithms

Lingo: creates well-described flat clusters. Does not scale beyond a few thousand search results. Available as part of the open source Carrot² framework.
STC: the classic search results clustering algorithm. Produces flat cluster with adequate description, very fast. Available as part of the open source Carrot² framework
k-means: base line clustering algorithm, produces bag-of-words style cluster descriptions. Available as part of the open source Carrot² framework

Address of the bookmark: https://search.carrot2.org/#/search/web

poRe: an R package for the visualization and analysis of nanopore sequencing data

Jit — Thu, 23 Nov 2017 09:55:57 -0600

Motivation: The Oxford Nanopore MinION device represents a unique sequencing technology. As a mobile sequencing device powered by the USB port of a laptop, the MinION has huge potential applications. To enable these applications, the bioinformatics community will need to design and build a suite of tools specifically for MinION data.

Results: Here we present poRe, a package for R that enables users to manipulate, organize, summarize and visualize MinION nanopore sequencing data. As a package for R, poRe has been tested on Windows, Linux and MacOSX. Crucially, the Windows version allows users to analyse MinION data on the Windows laptop attached to the device.

Availability and implementation: poRe is released as a package for R at http://sourceforge.net/projects/rpore/ . A tutorial and further information are available at https://sourceforge.net/p/rpore/wiki/Home/

Contact:mick.watson@roslin.ed.ac.uk

Address of the bookmark: https://academic.oup.com/bioinformatics/article/31/1/114/2365693

GPOPSIM: a simulation tool for whole-genome genetic data

Jit — Wed, 17 Jan 2018 03:47:46 -0600

GPOPSIM is a simulation tool for pedigree, phenotypes, and genomic data, with a variety of population and genome structures and trait genetic architectures. It provides flexible parameter settings for a wide discipline of users, especially can simulate multiple genetically correlated traits with desired genetic parameters and underlying genetic architectures.

Address of the bookmark: https://github.com/SCAU-AnimalGenetics/GPOPSIM

BFC: a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data

Jit — Thu, 31 May 2018 09:35:23 -0500

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes. The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC. https://github.com/lh3/bfc

Address of the bookmark: https://github.com/lh3/bfc

NanoPack: visualizing and processing long-read sequencing data

Jit — Fri, 10 Aug 2018 18:41:34 -0500

The NanoPack tools are written in Python3 and released under the GNU GPL3.0 License. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for Linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.

https://academic.oup.com/bioinformatics/article/34/15/2666/4934939

Address of the bookmark: https://github.com/wdecoster/nanoQC