I will be doing NGS in the course of my research work and I will like to learn a programming language which is compatible with most bioinformatics tools or software. I basically want to do de-novo assembly, map reads, align reads, and expression analysis. Recommendations welcomed. Which languages would you recommend to a student wishing to enter the world of bioinformatics?
Please login if you would like to vote in this poll.
Total number of votes: 27
The language is mostly a matter of personal taste, but I'd like to vote for Perl and BioPerl as an excellent language for bioinformatics. For me personally, the combination of Perl with the BioPerl is simply a joy to use. I would recommend every budding bioinformatician to try it out for yourself.
Why Perl? Flexible, with a global repository (CPAN), so it is trivial install new modules. It has Bioperl (http://www.bioperl.org/wiki/Main_Page), one of the first biological module repositories that increase the usability from, for example, change formats (Bio::SeqIO) to do phylogenetic analysis. There are some biological software that uses Perl such as GBrowse (http://gmod.org/wiki/GBrowse) so may be an interesting language if you need to interact with it. Good test modules (Test::More).
Disadvantages: Sometimes it is not a clear language. Probably there is as many ways to use Perl as programers are, so it can be very simple (just scripting like the manual) or very complex (object programming using Moose). Some Bioperl modules not always works.
Well, if you are doing NGS analysis, C and C++ is a good option. It is a lower level language compared to R, PERL, & Python. But, due to the huge amount of data content in NGS, the efficiency of your code can make a huge difference in computation speed, and hardware requirement. For NGS computations I suggest you to use R, as it has a lot of helpful packages and procedures. Here is manual, hope you find it useful: http://manuals.bioinformatics.ucr.edu/home/ht-seq
I agree with Jitendra, choosing programming platform is more personal and also depend on how long you are working with your favourite language. But for beginners, I would recommend python because of easier syntax and simple oops approach. Now python can be integrate with any language like Cython (with C),PyPy, PyR, etc and python now has very enrich library stores with which you can do anything. for NGS , python is highly recommended as many new good tools made recently in python. R is good for downstream analysis like doing statistical analysis, create fancy graphs(ggplot2), work with rnaseq data (Deseq,degseq,etc), pathway analysis, etc, however R itself just a statistical package not generally into programming lang. Ruby is also getting popular these days because of simple programming style. But someone want to master in programming should go for C/C++,java,C# or python.
I would recommend python and lower level language C because of easier syntax and simple oops approach. Moreover, I do a lot of boinformatics programming, but I don’t use any of the “Bio*” projects, as they never seem to have much that is of use to me, and what little is useful to me is easier for me to code myself than to install any Bio* project and deal with its idiosyncracies.