BOL: Related items

What is Data Science? — A Bioinformatics Perspective

Abhi — Mon, 16 Jun 2025 01:44:34 -0500

In today’s era of big biology, we’re generating more data than ever before—genomes, transcriptomes, proteomes, metabolomes, microbiomes… you name it. But raw biological data doesn’t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.

So, What Is Data Science?
At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.

Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes—these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.

Data Science Meets Bioinformatics
Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:

Clean and process massive datasets

Discover patterns in high-dimensional data

Build predictive models (e.g., for disease classification)

Visualize complex biological networks and trends

Integrate diverse data types (e.g., transcriptomic + epigenomic data)

The Bioinformatics Toolkit
Here’s what data science typically looks like in bioinformatics:

Task Data Science Role
Sequence alignment Efficient algorithms, indexing, parallel processing
Gene expression analysis Statistical modeling (e.g., DESeq2, limma)
Variant calling Data filtering, probabilistic models
Clustering of cells in single-cell data Unsupervised learning
Protein structure prediction Deep learning models (e.g., AlphaFold)
Metagenomics Data integration, classification, dimensionality reduction

Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow—often working together in reproducible workflows.

It's Not Just About Coding
A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:

Understanding experimental design

Asking biologically meaningful questions

Choosing the right statistical or machine learning models

Communicating findings effectively (e.g., plots, dashboards, papers)

In other words, data science in bioinformatics is where biology, statistics, and computer science converge.

Why It Matters
The real power of data science in bioinformatics is its ability to scale discovery.

Instead of studying one gene, we can study thousands.

Instead of analyzing one species, we can explore entire ecosystems.

Instead of waiting months for lab results, we can generate hypotheses in days.

From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.

Final Thoughts
If you’re a biologist who’s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground—and data science is your toolkit.

In bioinformatics, data science isn’t just useful. It’s essential.

Data Mining in Bioinformatics

Jitendra Narayan — Tue, 16 Jul 2013 03:21:28 -0500

Data mining, the extraction of hidden predictive information from large databases. Data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Data Mining for Bioinformatics enables researchers to meet the challenge of mining vast amounts of biomolecular data to discover real knowledge. In other words, you’re a bioinformatician, and data has been dumped in your lap. Find the patterns, trend, answers, or what ever meaningful knowledge the data is hiding. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.This page Covering theory, algorithms, and methodologies, as well as data mining technologies. Unfortunately life is never simple. In molecular biology, it’s becoming more common to generate reams of data then ask someone in bioinformatics to produce an answer. This is exploratory data analysis, one of the most difficult things to do well. Especially if you’re thrown in at the deep end.

Data mining commonly involves four classes of tasks:

Classification - Arranges the data into predefined groups. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include decision tree learning, nearest neighbor, naive Bayesian classification and neural networks.
Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together.
Regression - Attempts to find a function which models the data with the least error.
Association rule learning - Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
From experience, I can say that is one of the most frustrating positions to be in. Data mining is a huge field and can easily be bewildering for a beginner. However, high through-put techniques in molecular biology require, more and more, that bioinformatics is required to interpret the data. Furthermore, people working in bioinformatics generally come from computer science, or biology backgrounds. Data mining, however, involves statistics to one degree or another, which means entering a field that is may not be your strong point.
Excel is fine for creating graphs. If you’re serious about data mining though, you’ll need something more heavy weight. I use R, free, and with good data mining packages such as vegan and labdsv. For beginners R can be impenetrable, I recommend this book an introduction to R as well as the underlying statistics.
Any of us can rush head on into a land of support vector machines, hidden markov models and neural networks. But coming back to the first point, what are you trying to prove? Always question what are you doing, how does it fit in to the wider picture? Try to regularly review, and keep track of where you are going? This will prevent you from falling into data mining despair.

Data Mining Resources on the net:

A laboratory of data mining and bioinformatics is headed by Prof. Ambuj Singh. There are currently seven graduate students in the research group. Our research focuses on image informatics and scalable querying and mining of graphs.For more detail visit: http://www.cs.ucsb.edu/~dbl/

Here are the materials (Lecture notes) from several past courses on data mining and/or Web mining by Stanford: For detail visit: http://infolab.stanford.edu/~ullman/mining/mining.html
Statistical Data Mining Tutorial Slides by Andrew Moore The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms. For detail visit: http://www.autonlab.org/tutorials/

A tutorial on Introduction to Data Mining for Discovering hidden value in your data warehouse:http://www.thearling.com/text/dmwhite/dmwhite.htm
Wiki Links: http://en.wikipedia.org/wiki/Data_mining
Bioinformatics with Clementine http://www.spss.ch/upload/1051192224_inseratClemBio.pdf
Causal Data Mining in Bioinformatics by Ioannis Tsamardinos: http://www.forth.gr/ics/bmi/In_the_News/2007/EN69-4.pdf

Report on ACM Text Mining in Bioinformatics (TMBIO 006) http://www.sigir.org/forum/2007J/2007j_sigirforum_song.pdf
BIOKDD 2002: Recent Advances in Data Mining for
Bioinformatics: http://www.acm.org/sigs/sigkdd/explorations/issue4-2/zaki.pdf

Bioinformatics and Medical Informatics:

Tools for Mining and Applying Genetic Information in Patient Care:http://www.biomedtechalliance.org/pdfs/03_03_05/03_03_05.pdf

DATA MINING OF MICROARRAY DATABASES FOR HUMAN LUNG CANCER: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.385&rep=rep1&type=pdf

Towards knowledge-based gene expression data mining: http://www.ailab.si/blaz/papers/2007-JBI-BellazziZupan.pdf

DRAFT Accepted for publication in 'Data Mining in Bioinformatics'
Jason Wang, Mohammed Zaki, Hannu Toivonen, and Dennis Shasha (Eds.), Springer:http://www.cs.helsinki.fi/u/htoivone/pubs/gene_mapping_by_pattern_discovery.pdf

Data Mining and Text Mining for Bioinformatics: Proceedings of the European Workshop: http://www.rok.informatik.hu-berlin.de/wbi/research/publications/2003/proceedings_ws_mining.pdf

Biological Network Analysis:

Graph Mining in Bioinformatics: http://agbs.kyb.tuebingen.mpg.de/wikis/bg/BNA-5.pdf.

Text mining in bioinformatics: http://agbs.kyb.tuebingen.mpg.de/wikis/bg/4.pdf

Some datamining books that are available on google books:

Data mining and bioinformatics: first international workshop, VDMB 2006 By Mehmet M. Dalkilic

Data mining: concepts and techniques By Jiawei Han, Micheline Kamber

Predicting Pathogen Virulence Using Bioinformatics Tools

BioStar — Tue, 04 Nov 2025 07:55:53 -0600

In the genomic era, the ability to predict the virulence potential of pathogens has become an indispensable part of infectious disease research. With the exponential growth of microbial genome data, bioinformatics tools now enable scientists to identify virulence factors, model pathogen behavior, and even forecast outbreak risks — all from sequence data.

In an age where pathogens continue to evolve and cross boundaries, understanding what makes them virulent—that is, capable of causing disease—has become a critical focus in modern microbiology and genomics. Virulence prediction bridges computational biology, genomics, and machine learning to forecast the pathogenic potential of microbes before they strike.

What Is Virulence?

Virulence refers to the degree of damage a pathogen can inflict on its host. It is determined by a combination of genetic factors—called virulence factors (VFs)—that allow the organism to attach, invade, evade, and harm the host. These include genes coding for toxins, secretion systems, adhesins, and enzymes that disrupt host defenses.

Understanding virulence factors not only helps in deciphering the mechanisms of infection but also provides early warning signs for emerging threats.

Why Predict Virulence?

Traditional virulence studies relied heavily on experimental infection models, which, although accurate, are time-consuming, expensive, and ethically constrained.
Today, the availability of whole-genome sequences and large-scale pathogen databases has paved the way for in silico virulence prediction—a computational approach that can screen thousands of genomes within hours.

This approach enables researchers to:

Rapidly identify potential high-risk strains.
Prioritize pathogens for containment, surveillance, or further study.
Guide vaccine development and drug target discovery.
Support One Health frameworks, linking animal, human, and environmental health data.

How Is Virulence Predicted?

Virulence prediction combines bioinformatics pipelines with machine learning and comparative genomics. The process generally involves:

Genome Annotation: Identifying genes and coding sequences in microbial genomes.
Feature Extraction: Comparing sequences with curated databases like VFDB (Virulence Factor Database), PATRIC, or Victors.
Pattern Recognition: Using algorithms (e.g., Random Forest, SVM, or deep learning models) to classify genes or strains as virulent or non-virulent based on sequence patterns, motifs, and protein domains.
Scoring and Visualization: Assigning a virulence score or confidence level and visualizing it through heatmaps or genome maps.

Tools and Resources for Virulence Prediction

A number of tools and databases make virulence prediction accessible to the scientific community:

VFanalyzer – For identifying virulence genes based on VFDB.
PathoFact – Predicts virulence, antimicrobial resistance (AMR), and toxin genes from metagenomic data.
Pangenome-based models – Identify virulence-associated gene clusters across strains.
Machine learning models – Use features like GC content, codon usage bias, or protein domains to predict pathogenicity.

Emerging tools now integrate multi-omic data—including transcriptomics, proteomics, and metabolomics—to understand virulence in a systems biology framework.

Applications in the Real World

Virulence prediction has major implications across public health and research sectors:

Epidemic preparedness: Early identification of virulent strains in outbreak samples.
AMR surveillance: Linking virulence profiles with antibiotic resistance determinants.
Environmental monitoring: Predicting pathogenic potential of soil or waterborne microbes.
Clinical diagnostics: Supporting personalized treatment through pathogen profiling.

For instance, integrating virulence prediction pipelines into national surveillance networks could enable faster risk assessment and response to infectious outbreaks.

The Road Ahead

As machine learning and genomics advance, virulence prediction will evolve from simple gene-based detection to dynamic, context-aware models that account for host–pathogen interactions, environmental signals, and evolutionary adaptation.

Future tools may predict not just if a strain is virulent, but under what conditions it expresses that virulence—bridging the gap between genotype and phenotype.

In Summary

Virulence prediction is redefining how we understand and anticipate infectious diseases. By coupling genomic insights with computational intelligence, researchers can identify potential threats earlier, design smarter interventions, and ultimately, strengthen our preparedness against emerging pathogens.

Genomics for Bioinformatician

Jitendra Narayan — Sat, 20 Jul 2013 07:03:00 -0500

Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.

Genomics was established by Fred Sanger when he first sequenced the complete genomes of a virus and a mitochondrion. His group established techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in the 1970-1980s. A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics. A related concept is materiomics, which is defined as the study of the material properties of biological materials (e.g. hierarchical protein structures and materials, mineralized biological tissues, etc.) and their effect on the macroscopic function and failure in their biological context, linking processes, structure and properties at multiple scales through a materials science approach. The actual term 'genomics' is thought to have been coined by Dr. Tom Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) over beer at a meeting held in Maryland on the mapping of the human genome in 1986.

The outcome of almost two years of intense discussions with literally hundreds of scientists and members of the public, has three major areas of focus: Genomics to Biology, Genomics to Health, and Genomics to Society.

Genomics to Biology:
The human genome sequence provides foundational information that now will allow development of a comprehensive catalog of all of the genome's components, determination of the function of all human genes, and deciphering of how genes and proteins work together in pathways and networks.

Genomics to Health:
Completion of the human genome sequence offers a unique opportunity to understand the role of genetic factors in health and disease, and to apply that understanding rapidly to prevention, diagnosis, and treatment. This opportunity will be realized through such genomics-based approaches as identification of genes and pathways and determining how they interact with environmental factors in health and disease, more precise prediction of disease susceptibility and drug response, early detection of illness, and development of entirely new therapeutic approaches.

Genomics to Society:
Just as the HGP has spawned new areas of research in basic biology and in health, it has created new opportunities in exploring the ethical, legal, and social implications (ELSI) of such work. These include defining policy options regarding the use of genomic information in both medical and non-medical settings and analysis of the impact of genomics on such concepts as race, ethnicity, kinship, individual and group identity, health, disease, and "normality" for traits and behaviors.

This vision for the future of genomics is not just about the NHGRI. It encompasses the whole field of genomics, including the work of all the other Institutes and Centers at the NIH and of a number of other federal agencies. All of the NIH Institutes are already taking full advantage of the sequence and will apply its data to the better understanding of both rare and common diseases, almost all of which have a genetic component. A recent example of the way that the HGP and the knowledge and new technologies it has spawned are already facilitating science is the extremely rapid sequencing by groups in Canada and at the Centers for Disease Control and Prevention (CDC) in Atlanta of the genome of the virus that causes Severe Acute Respiratory Syndrome (SARS). The sequencing of the SARS virus genome provides insight into this new and deadly disease at a speed never before possible in science. In turn, this should lead to the rapid development of diagnostic tests and, in time, vaccines and effective treatments.

Links for the addition material available on Net

Genomes and genomics:

Bioinformatics and Genomics:

Structural genomics tutorial:

Comparative Genomics Tutorial:

GENOME TUTORIAL:

Tools and resources for identifying protein families, domains and motifs

Bioinformatics Tools
Tips, Tutorials, and Terminology for Using Selected Resources in Genome Database Guide:

A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes:

Free Online Tutorials Teach Anyone How to Use Genome Databases:

Circos to create concise, explanatory, unique and print-ready visualizations of your data:

Genomics and Comparative Genomics Learning Module:

Computational Challenges in Comparative Genomics

A Tutorial:

A Comparative Genomics Resource for Grains:

PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants:

VISTA :

Software for Genomics

Artemis Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.
Chromas It will display and prints chromatogram files from ABI automated DNA sequencers, and Staden SCF files which the analysis programs for ALF, Li-Cor and Visible Genetics OpenGene sequencers can create.
Glimmer A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea.Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DN
Glimmer HMM A fast and accurate gene finder based on a GHMM architecture, developed specifically for eukaryotes. It incorporates splice site models adapted from the GeneSplicer program and uses interpolated Markov models for evaluating the coding regions.
Glimmer M A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programming algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The d
MUMmer MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.
pDRAW pDRAW32 is being developed as a free time hobby project. It is far from finished, but as it has reached a point where it could be helpful for many labs, it is now available to the scientific community.
Sequin Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. It is capable of handling simple submissions that contain a single short mRNA sequence, and complex submissio
Staden The Staden Package consists of a series of tools for DNA sequence preparation (pregap4), assembly (gap4), editing (gap4) and DNA/protein sequence analysis (spin).

For more software @ http://bioinformaticsonline.com/bookmarks/view/926/list-of-popular-bioinformatics-softwaretools

Postdoctoral Position in Evolutionary Genomics and Bioinformatics, at the Center for Interdisciplinary Neuroscience at University of Valparaiso, Valparaiso, Chile.

Wed, 22 Apr 2026 02:36:00 -0500

The Center for Interdisciplinary Neuroscience of Valparaiso (CINV)
in Valparaiso, Chile, invites postdoctoral researchers to apply for
a Postdoctoral Fellowship focusing on understanding the evolution of
genes and molecular pathways that play a role on inflammatory processes
driving diseases affecting the central nervous system.

The postdoctoral researcher will contribute to this project using
a combination of evolutionary and comparative genomics, as well as a
diverse set of bioinformatic approaches for data analysis and integration
(e.g., transcriptomics, genomics, phenotypic data). This position offers
a unique opportunity to integrate diverse state-of-the-art genomic and
phenotypic datasets across different model organisms to understand the
role of genes, molecular pathways in the origin of complex diseases.

CINV provides a highly collaborative and multidisciplinary environment
using a variety of computational and experimental approaches,
including genetically tractable animal models as well as expertise in
genetics, behavior, glia-neuron communication, metabolism, biophysics,
genomics, bioinformatics, host-microbe communication, and biomolecular
modelling. The new postdoc will be part of one of our labs which focuses
more generally on the intersection between molecular evolution and
disease biology.

Required qualifications are a PhD in evolutionary biology, computational
biology, bioinformatics, or closely related fields. Candidates must have
excellent verbal and written communication skills (working language
is English), as well as an established record of productivity (e.g.,
at least one previous peer-reviewed publication). Candidates with a
past record of publications in bioinfomatics, computational biology,
population genetics or evolutionary genomics are strongly preferred. Ideal
candidates should have experience in analyzing genomic and phenomic
data, performing comparative evolution or population genomic analyses,
as well as in collaborating with experimentalists.

Interested candidates should first contact Evandro Ferrada at
. Please include the following: (1) a cover
letter addressing your interest in the position and how your expertise
meets the position requirements, (2) a CV, (3) contact information of
at least 2 references. A short online interview will follow to discuss
specific proposals. Candidate materials will be reviewed as soon as
possible until the position is filled.

For further information, please visit:
https://cinv.uv.cl/cinv-postdoctoral-fellowship-program-2026/

Dr. Evandro Ferrada
Associate Profesor

Centro Interdisciplinario de Neurociencia (CINV)

Facultad de Ciencias, Universidad de Valpara�so.

Pasaje Harrington 287, Playa Ancha, Valpara�so, Chile.

Tel. +56 (32) 250 8453

www.cinv.cl

Research with help of bioinformatics helpful

Jit — Fri, 02 Aug 2013 11:20:24 -0500

Endocrinologist G.R. Sridhar says

Research with the help of bioinformatics with a trans-disciplinary approach is yielding good results.
http://www.thehindu.com/features/education/research/research-with-help-of-bioinformatics-helpful/article2295629.ece

What Junk DNA? It’s an Operating System

Rahul Agarwal — Mon, 19 Aug 2013 15:24:26 -0500

The report adds to growing experimental support for the idea that all that extra stuff in the human genes, once referred to as “junk DNA,” is more than functionless, space-filling material that happens to make up nearly 98% of the genome. The paper adds to a growing body of knowledge establishing a considerable role for this material in the regulation of gene expression and its potential role in human disease.

Address of the bookmark: http://www.genengnews.com/keywordsandtools/print/3/32115/

Five points for bioinformatics software/tools

Jitendra Narayan — Mon, 05 Aug 2013 04:12:32 -0500

In the bioinformatics sector we mostly spend time on computational analysis of huge amounts of data and try to make sense of it, biologically. But, most of the newbie bioinformaticians are faced with dilemma when they receive biological sequence data for the first time. They mostly found confusing over open source, user friendly GUI, and commercial bioinformatics software. Don’t be surprise this is true and also not an easy task to decide, because analytical step is the most crucial part and believe to be the biggest bottleneck in publishing paper in high impact journals. Through this blog I would like to address the pros and cons of both kind of software/tools and try to assist (Hmmm not really, It looks convince) you to make decision on your software selections.

The most common newbie questions are:

Should I try to use these free open source programs? Why are we not trying GUI software for computational analysis? Should I use commercial bioinformatics programs/software?”

1. Let’s be open

We generally think free and cheap are useless. But this concept is not applicable when we discuss open source software. Mostly, the bioinformatics software is developed by highly competitive biological programmers who believe in open sharing of knowledge. They come under Open Bioinformatics Foundation or O|B|F which is a non-profit, volunteer run organization focused on supporting open source programming in bioinformatics. The best part about open source tools/software is that they’re free to download the source code and read exactly what the program does. If you are so inclined, you can view all of the parts of the program and see the logical flow of the pipeline. In addition, open source makes an excellent learning tool for any beginning bioinformatician. Moreover, you can modify existing open source programs to deal with cutting-edge problems or to customize your pipeline. Apart from your computational and analysis work, most of the reviewer also prefers the open source based results so that they can validate the results if validation required.

2. Code headache

As a bioinformatician you are supposed to know the basics of programming languages, and if you are not good at it, then please learn it as soon as possible because you are not a bio-analyst but biological programmers. The open source programs usually lack dedicated service and support teams (often because they were the product of an overworked doc/postdoc!) so you are responsible for troubleshooting your own errors most of the time. We commonly receive the HELP email to support and assist to setup the pipeline; you can also find this kind of request on any QA forum. I personally believe this coding horror brings the biggest downside of open-source programs; where you need some programming skills in order to implement the program in your pipeline. But, if you are not able to fix the pipeline and modify the open source code according to your requirements them you should re-think on your bioinformatician name tag!!!

3. Dive into the codes

Some of the biologist turn bioinformatician says “if you can do the same thing with commercial software then why to get migraine with weird codes”, well this statement looks to me that guys are keen to learn swimming but still don’t like to get wet. If you are still using paid software and doing your work by customer support and clicking some of the well-designed GUI button then perhaps you are not interested in learning and trying new and challenging bioinformatics works. You are missing the basic flavour of bioinformatics. Let’s dive into the coding world, I am sure your will enjoy it. I recommend your to swim freely in code’s sea, and enjoy the journey; do not merely watch it from the outside.

4. Paid does not mean better

The bioinformatics company which are specializes in bioinformatics solutions develop well designed/packed, user friendly software by using a large number of specialised scientist, programmers and support staff. They also provide good services to accomplice your biological analysis work. This means that if you hit a ‘snag’ with your data, help is likely only a phone call away! These companies price their products competitively against the cost of a dedicated bioinformatician. You may be able to afford the program, but not the additional staff! Additionally, most of the functionality that you need in your analysis is already coded into the program. Need to plot a graph? Just click this button right here. It is that easy. But, as a bioinformatician this is not generally well encouraged approach in biological analysis work, because the software is not available to everyone and your data can’t be validated. Moreover, there is very less chances that anyone will repeat your work or love to do similar kind of research (because not all the labs in the world are rich like yours).

5. Take a caution

In biological analysis work, in which you deal GB/TB of data are having maximum chances of getting errors, so please be careful and always cross check your data before coming to any conclusion. Even an error in two line code can alter your entire analysis and display weird results. Some of the scientist blindly believes on commercial software, which is entirely wrong. Using proprietary tools does not absolve you of the need to actually read and research the type of analysis that you are doing. This is particularly true in the case of genome assembly and annotation.

At the end, I would like to tell only one think that open source solutions allows you to do more cutting edge analysis than the commercial tools. So let’s go for it.

Disclaimer:

This is my personal view. I have nothing to do with any company or open source community. The views expressed on these pages are mine alone and not those of my current/past employers. I do reserve the right to remove comments left by spammers or off-topic comments.

Bioinformatics market in India

Rahul Agarwal — Fri, 23 Aug 2013 07:08:49 -0500

Key Topics Covered in the Report:

The market size of the Indian Bioinformatics Industry , FY’2007-FY’2013
Market segmentation of India bioinformatics industry by application by sectors, FY’2007-FY’2013
Market Segmentation of India bioinformatics industry by products and services,FY’2007-FY’2013
Market Segmentation of India bioinformatics industry by applications of bioinformatics ,FY’2007-FY’2013
India bioinformatics industry trends and developments
Government regulations and initiatives of India bioinformatics industry
Major bioinformatics research institutes in India
Market Share of leading players in bioinformatics industry in India,FY’2013
Company profiles of major players in India bioinformatics industry
Future outlook and projections on the basis of revenue in India bioinformatics market, FY’2014-FY’2018

(Source: Ken Research)

Address of the bookmark: http://www.kenresearch.com/healthcare/biotechnology/india-bioinformatics-industry-research-report/392-91.html

Prime Minister’s 100k Genome Project

Jitendra Narayan — Thu, 08 Aug 2013 09:40:39 -0500

Genomics Ebgland is destined to sequence 100,000 patients over the next five year in England. A landmark project by british government.

Genomics England will play a key role in building on the UK’s long track record as leader in medical science advances to push the boundaries by unlocking the power of DNA data. The UK will become the first ever country to introduce this technology in its mainstream health system – leading the global race for better tests, better drugs and above all better, more personalised care.

http://www.genomicsengland.co.uk/100k-genome-project/