In Bio-informatics based genome sequencing and predicting metabolic pathways research jobs I used Matlab, SAS, SPSS, R and several Bioconductor packages. Matlab had a lot of powerful tools and was easy to use, whereas SPSS is for non-programmers and R need programming skills. I am wondering what other people think is best? or there might not be one specific language but a few that lend themselves best to Bio-informatics work that is math heavy and deals with a large amount of data.
Phylogenetics in R
R in Ecology and Evolution – http://r-eco-evo.blogspot.com.au/ R bloggers: Recology http://www.r-bloggers.com/author/recology-r/ Talk introducing phylogenetics in R: http://www.r-bloggers.com/my-talk-on-doing-phylogenetics-in-r-2/ Finding meaningful clusters in trees phytools blog, “Phylogenetic Tools for Comparative Biology”
R is by far the best known open source statistical programming language for bioinformatician. However, you can not ignore MATLAB Bioinformatics Toolbox.
I am a big fan of Perl, and love to do all sort of analysis using Perl, therefore I prefer PDL ("Perl Data Language"), which gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing.
Who cares which language is the more popular, programming languages are tools, if it does what I need it to do, it's fine by me.
Lisp-Stat is an extensible environment for statistical computing and dynamic graphics based on the Lisp language. XLISP-STAT is a version of Lisp-Stat based on a dialect of Lisp called XLISP.
http://homepage.stat.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html
I like the R language. The following table comparing the statistical capabilities of software packages: http://stanfordphd.com/Statistical_Software.html In stastistical language war, a/c to this metric, R wins
TYPE OF STATISTICAL ANALYSIS |
R |
MATLAB |
SAS |
STATA |
SPSS |
Nonparametric Tests |
Yes |
Yes |
Yes |
Yes |
Yes |
T-test |
Yes |
Yes |
Yes |
Yes |
Yes |
ANOVA & MANOVA |
Yes |
Yes |
Yes |
Yes |
Yes |
ANCOVA & MANCOVA |
Yes |
Yes |
Yes |
Yes |
Yes |
Linear Regression |
Yes |
Yes |
Yes |
Yes |
Yes |
Generalized Least Squares |
Yes |
Yes |
Yes |
Yes |
Yes |
Ridge Regression |
Yes |
Yes |
Yes |
|
|
Lasso |
Yes |
Yes |
Yes |
|
|
Generalized Linear Models |
Yes |
Yes |
Yes |
Yes |
Yes |
Mixed Effects Models |
Yes |
Yes |
Yes |
Yes |
Yes |
Logistic Regression |
Yes |
Yes |
Yes |
Yes |
Yes |
Nonlinear Regression |
Yes |
Yes |
Yes |
|
|
Discriminant Analysis |
Yes |
Yes |
Yes |
Yes |
Yes |
Nearest Neighbor |
Yes |
Yes |
Yes |
|
Yes |
Factor & Principal Components Analysis |
Yes |
Yes |
Yes |
Yes |
Yes |
Copula Models |
Yes |
Yes |
Experimental |
|
|
Cross-Validation |
Yes |
Yes |
Yes |
|
|
Bayesian Statistics |
Yes |
Yes |
Limited |
|
|
Monte Carlo, Classic Methods |
Yes |
Yes |
Yes |
Yes |
Limited |
Markov Chain Monte Carlo |
Yes |
Yes |
Yes |
|
|
Bootstrap & Jackknife |
Yes |
Yes |
Yes |
Yes |
|
EM Algorithm |
Yes |
Yes |
Yes |
|
|
Missing Data Imputation |
Yes |
Yes |
Yes |
Yes |
Yes |
Outlier Diagnostics |
Yes |
Yes |
Yes |
Yes |
Yes |
Robust Estimation |
Yes |
Yes |
Yes |
Yes |
|
Longitudinal (Panel) Data |
Yes |
Yes |
Yes |
Yes |
Limited |
Survival Analysis |
Yes |
Yes |
Yes |
Yes |
Yes |
Path Analysis |
Yes |
Yes |
Yes |
|
|
Propensity Score Matching |
Yes |
Yes |
Limited |
Limited |
|
Stratified Samples (Survey Data) |
Yes |
Yes |
Yes |
Yes |
Yes |
Experimental Design |
Yes |
Yes |
|
|
|
Quality Control |
Yes |
Yes |
|
Yes |
Yes |
Reliability Theory |
Yes |
Yes |
Yes |
Yes |
Yes |
Univariate Time Series |
Yes |
Yes |
Yes |
Yes |
Limited |
Multivariate Time Series |
Yes |
Yes |
Yes |
Yes |
|
Markov Chains |
Yes |
Yes |
|
|
|
Hidden Markov Models |
Yes |
Yes |
|
|
|
Stochastic Volatility Models |
Yes |
Yes |
Limited |
Limited |
Limited |
Diffusions |
Yes |
Yes |
|
|
|
Counting Processes |
Yes |
Yes |
Yes |
|
|
Filtering |
Yes |
Yes |
Limited |
Limited |
|
Instrumental Variables |
Yes |
Yes |
Yes |
Yes |
|
Simultaneous Equations |
Yes |
Yes |
Yes |
Yes |
|
Splines |
Yes |
Yes |
Yes |
Yes |
|
Nonparametric Smoothing Methods |
Yes |
Yes |
Yes |
Yes |
|
Extreme Value Theory |
Yes |
Yes |
|
|
|
Variance Stabilization |
Yes |
Yes |
|
|
|
Cluster Analysis |
Yes |
Yes |
Yes |
Yes |
Yes |
Neural Networks |
Yes |
Yes |
Yes |
|
Limited |
Classification & Regression Trees |
Yes |
Yes |
Yes |
|
Limited |
Boosting Classification & Regression Trees |
Yes |
Yes |
|
|
|
Random Forests |
Yes |
Yes |
|
|
|
Support Vector Machines |
Yes |
Yes |
Yes |
|
|
Signal Processing |
Yes |
Yes |
|
|
|
Wavelet Analysis |
Yes |
Yes |
Yes |
|
|
ROC Curves |
Yes |
Yes |
Yes |
Yes |
Yes |
Optimization |
Yes |
Yes |
Yes |
Limited |
|
R Passes SPSS in Scholarly Use, Stata Growing Rapidly http://www.r-bloggers.com/r-passes-spss-in-scholarly-use-stata-growing-rapidly/
The recent article on Nature explain it better ... R becoming the most popular language amongst biological researchers http://www.nature.com/news/programming-tools-adventures-with-r-1.16609?
A nice comparision between R, SAS and Python http://www.datasciencecentral.com/forum/topics/which-one-is-best-r-sas-or-python-for-data-science