Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Which are the best statistical programming languages to study for a bioinformatician?

In Bio-informatics based genome sequencing and predicting metabolic pathways research jobs I used Matlab, SAS, SPSS, R and several Bioconductor packages. Matlab had a lot of powerful tools and was easy to use, whereas SPSS is for non-programmers and R need programming skills. I am wondering what other people think is best? or there might not be one specific language but a few that lend themselves best to Bio-informatics work that is math heavy and deals with a large amount of data.

Replies

  • Jit 4115 days ago

    R is by far the best known open source statistical programming language for bioinformatician. However, you can not ignore MATLAB Bioinformatics Toolbox.

    I am a big fan of Perl, and love to do all sort of analysis using Perl, therefore I prefer PDL ("Perl Data Language"), which gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing.

  • Rahul Nayak 4115 days ago

    Who cares which language is the more popular, programming languages are tools, if it does what I need it to do, it's fine by me.

  • Abhimanyu Singh 3951 days ago

    Lisp-Stat is an extensible environment for statistical computing and dynamic graphics based on the Lisp language. XLISP-STAT is a version of Lisp-Stat based on a dialect of Lisp called XLISP.

    http://homepage.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html

    http://lib.stat.cmu.edu/xlispstat/

  • John Parker 3783 days ago

    I like the R language. The following table comparing the statistical capabilities of software packages: http://stanfordphd.com/Statistical_Software.html In stastistical language war, a/c to this metric, R wins

    TYPE OF STATISTICAL ANALYSIS

     MATLAB

    SAS 

    STATA 

     SPSS

     Nonparametric Tests

     Yes

     Yes

     Yes

     Yes

     Yes

     T-test

     Yes

     Yes

     Yes

     Yes

     Yes

     ANOVA & MANOVA

     Yes

     Yes

     Yes

     Yes

     Yes

     ANCOVA & MANCOVA

     Yes

     Yes

     Yes

     Yes

     Yes

     Linear Regression

     Yes

     Yes

     Yes

     Yes

     Yes

     Generalized Least Squares

     Yes

     Yes

     Yes 

     Yes

     Yes

     Ridge Regression

     Yes

     Yes

     Yes 

     

     

     Lasso

     Yes

     Yes

     Yes 

     

     

     Generalized Linear Models

     Yes

     Yes

     Yes

     Yes

     Yes

     Mixed Effects Models

     Yes

     Yes

     Yes

     Yes

     Yes

     Logistic Regression

     Yes

     Yes

     Yes

     Yes

     Yes

     Nonlinear Regression

     Yes

     Yes

     Yes 

     

     

     Discriminant Analysis

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Nearest Neighbor

     Yes

     Yes

     Yes 

     

     Yes

     Factor & Principal Components Analysis

     Yes

     Yes

     Yes

     Yes

     Yes

     Copula Models

     Yes

     Yes

     Experimental

     

     

     Cross-Validation

     Yes

     Yes

     Yes 

     

     

     Bayesian Statistics

     Yes

     Yes

     Limited

     

     

     Monte Carlo, Classic Methods

     Yes

     Yes

     Yes 

     Yes 

     Limited

     Markov Chain Monte Carlo

     Yes

     Yes

     Yes 

     

     

     Bootstrap & Jackknife

     Yes

     Yes

     Yes 

     Yes 

     

     EM Algorithm

     Yes

     Yes

     Yes 

     

     

     Missing Data Imputation

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Outlier Diagnostics

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Robust Estimation

     Yes

     Yes

     Yes 

     Yes 

     

     Longitudinal (Panel) Data

     Yes

     Yes

     Yes 

     Yes 

     Limited

     Survival Analysis

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Path Analysis

     Yes

     Yes

     Yes 

     

     

     Propensity Score Matching

     Yes

     Yes

     Limited 

     Limited 

     

     Stratified Samples (Survey Data)

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Experimental Design

     Yes

     Yes

     

     

     

     Quality Control

     Yes

     Yes

     

     Yes 

     Yes

     Reliability Theory

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Univariate Time Series

     Yes

     Yes

     Yes 

     Yes 

     Limited

     Multivariate Time Series

     Yes

     Yes

     Yes 

     Yes 

     

     Markov Chains

     Yes

     Yes

     

     

     

     Hidden Markov Models

     Yes

     Yes

     

     

     

     Stochastic Volatility Models

     Yes

     Yes

     Limited

     Limited 

     Limited

     Diffusions

     Yes

     Yes

     

     

     

     Counting Processes

     Yes

     Yes

     Yes 

     

     

     Filtering

     Yes

     Yes

     Limited 

     Limited

     

     Instrumental Variables

     Yes

     Yes

     Yes

     Yes 

     

     Simultaneous Equations

     Yes

     Yes

     Yes 

     Yes 

     

     Splines

     Yes

     Yes

     Yes 

     Yes

     

     Nonparametric Smoothing Methods

     Yes

     Yes

     Yes 

     Yes 

     

     Extreme Value Theory

     Yes

     Yes

     

     

     

     Variance Stabilization

     Yes

     Yes

     

     

     

     Cluster Analysis

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Neural Networks

     Yes

     Yes

     Yes 

     

     Limited

     Classification & Regression Trees

     Yes

     Yes

     Yes 

     

     Limited

     Boosting Classification & Regression Trees

     Yes

     Yes

     

     

     

     Random Forests

     Yes

     Yes

     

     

     

     Support Vector Machines

     Yes

     Yes

     Yes

     

     

     Signal Processing

     Yes

     Yes

     

     

     

     Wavelet Analysis

     Yes

     Yes

     Yes

     

     

     ROC Curves

     Yes

     Yes

     Yes 

     Yes 

     Yes

     Optimization

     Yes

     Yes

     Yes 

     Limited

     

  • Neel 3776 days ago

    R Passes SPSS in Scholarly Use, Stata Growing Rapidly http://www.r-bloggers.com/r-passes-spss-in-scholarly-use-stata-growing-rapidly/

  • Jitendra Narayan 3639 days ago

    The recent article on Nature explain it better ... R becoming the most popular language amongst biological researchers http://www.nature.com/news/programming-tools-adventures-with-r-1.16609?