Bio++ : C Language libraries for your biological need

C has always been a language that never attempts to tie a programmer down - it allows for easy implementation, it comes with a genuinely useful standard library that can itself be implemented in C, and it is both efficient and portable. C has always appealed to systems programmers who like the terse, concise manner in which powerful expressions can be coded. C was widely distributed with an Operating System (Unix) that was actually largely written in C itself. Also, C allowed programmers to (while sacrificing portability) have direct access to many machine-level features that would otherwise require the use of Assembly Language.

As Dennis Ritchie writes in his paper, "The Development of the C Language",
C is quirky, flawed, and an enormous success. While accidents of history surely helped, it evidently satisfied a need for a system implementation language efficient enough to displace assembly language, yet sufficiently abstract and fluent to describe algorithms and interactions in a wide variety of environments.

C++ has its basis in C - extending it by supporting features meant to encourage and support the development of large programs. Perhaps most importantly, it supports object-oriented programming in a familiar setting and framework (that of C). When C++ was created, one of the initial aims was to retain compatibility with C to as large an extent as possible, and retain its spirit and efficiency. It was possible to convert from C to C++ gradually, thus making use of C++ (initally, at least) as a "better C", and moving on to using other features. This allowed many C programmers to learn C++ quickly (though using C++ effectively requires a major mind-shift for many C programmers)
Are you really interested in C/C++ language for the biological programming? If yes there is good news for you. Bio++ 1.9.0 is available with amazing libraries that can help you to solve approximately all problems related with biology.

Some of the new feature has been added in the latest version, these are as follows:

Support for codon models (including non-homogenous models),
Tools for manipulating Hidden Markov Models,
Improved numerical tools (numerical derivatives, parameter transforms...),
A new library, Bio++ RAA (Remote Acnuc Access), allowing you to fetch public databases like GenBank, EMBL or SwissProt,
Algorithms for plotting trees, with support for vector formats like SVG, Fig or LaTeX-PGF.
So get relax and solve the HMM problems with an ease with Bio++. J
Now the time has been change, the biological programmers are ready to use the C++ libraries of biology. These library are designed in order to reduce the C++ long codes in a small and handy for the biological programmers. Basically, Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis,phylogenetics, molecular evolution and population genetics.
Bio++ is designed in an extensible object-oriented way, in the C++ language.

Some of the unique features of the libraries are as follows:
Sequence analysis

Sequence and Site objects, with various Alphabet support (DNA, RNA, Proteins, Codons, any 'Word' of a given size).
Several containers available for inner storage, with several implementations. Support for alignments.
Various I/O formats supported: Fasta, Mase, CLustal, Phylip, DCSE, GenBank (sequence only).
Sequence manipulation: truncation, concatenation, sub-sequences, etc.
In silico molecular biology: (reverse) transcription, translation, replication.
Several genetic codes availables: Standard and mitochondrial (vertebrates, echinoderms and other invertabrates)
Amino acids properties: volume, polarity and charge + physico-chemical distance (Miyata and Grantham) + import from any AAIndex entry.
Consensus sequences.
Pairwise alignment.
Similarity score computation.
Sequence bootstrap.
Homogeneity test (Bowker's test).
etc.

Phylogenetics and molecular evolution 
Data structure and IO

Phylogenetic trees.
IO from newick files, with support for multiple entries.

Phylogenetic reconstuction methods

Parsimony (NNI)
Distance matrices estimation and I/O to files in Phylip format.
Distance methods: (U/W)PGMA, NJ, BioNJ.
Maximum likelihood (NNI, including a PhyML-like algorithm).
Mixed distance/ML tree reconstruction (iterative approaches).
Tree consensus methods, bipartitions, bootstrap value computations.

Substitution models

JC, K80, T92, F84, HKY85, TN93, GTR and more for nucleotides,
JC, DSO78, JTT92 + any PAML-formated model description for proteins, with possibility to estimate equilibrium frequencies.
Various codon models: Muse & Gaut 1994, Yang & Nielsen 1998, Goldman & Yang 1994 + user-defined.
Support for rate-across sites models, with virtually any probability distribution, allowing for invariant classes.
Covarion models.
Model including gaps.
Global clock tree likelihood models.
Virtually any kind of non-homogeneous model is supported!
Mixed models (beta).


Molecular evolution tools

Parameter estimation under maximum likelihood.
Ancestral states reconstructions: Marginal likelihood.
(Weighted) substitution mapping.
Sequences simulation under any substitution model, homogeneous or not.

Population genetics

A new file format to deal with codominant markers and bio-sequence data for individuals.
Import and export methods with various population genetics software.
Specific containers for polymorphism data.
Diversity and polymorphism statistics for codominant and sequence data.
Estimation of Wright F-statistics and pairwise genetic distance on codominant markers.
Statistics on synonymous and non synonymous sites for coding sequences
Various 'Neutrality' statistics on sequence data (Tajima, Fu and Li, Rand and Kann ...).
Various measures of linkage disequilibrium.
etc.

Numerical calculus

Numerical tools: extended functions (log, factorial, etc.)
Vector tools: element-wise functions, statistics (mean, var, sd, correlation, information theory)
Classes for matrices implementation.
Linear algebra: eigen decomposition, LU decomposition, inversion, etc.
Random number generation: Quick & Dirty (32bits only), Wichmann and Hill, Knuth. Samplers from probability distributions (uniform, normal, gamma, etc.).
Function object implementation, with first and second order derivatives.
Numerical derivatives computation.
Optimization algorithms: Golden section search, Brent's algorithm, Powell's and Downhill simplex method, but also methods using derivatives like conjugate gradient and Newton's method. Object implementation of these methods, using the event-driven Optmizer interface (works with Function objects).
Statistics: DataTable object, with I/O from CSV files, probability distributions.
etc.

Utils

Files: working on file paths, getting file extensions and names, testing existence, open and store in string arrays, etc.
Text: convert text to any other type and vice versa, remove spaces, tokenize, switch between upper/lower case, etc.
Applications: read options from a file or command line
etc.


Some of the libraries are under development that will be updated by Bio++ developers on there websites.
C/C++ Tutorial
http://www.cbcb.umd.edu/~jeallen/bioinfo/
Tutorial on Bio++
http://162.38.181.25/BioPP/articles/tutorial/index.html
Download Links
http://162.38.181.25/BioPP/articles/download/index.html


Reference:
http://162.38.181.25/BioPP/index.html
Dutheil J, Boussau B. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol Biol. 2008 Sep 22;8(1):255
Dutheil J, Gaillard S, Bazin E, Glémin S, Ranwez V, Galtier N, Belkhir K. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics. 2006 Apr 4;7:188.
Dutheil JY, Ganapathy G, Hobolth A, Mailund T, Uyenoyama MK, Schierup MH. Ancestral population genomics: the coalescent hidden Markov model approach. Genetics. 2009 Sep;183(1):259-74.
Nabholz B, Mauffrey J-F, Bazin E, Galtier N, Glémin S. Determination of Mitochondrial Genetic Diversity in Mammals. Genetics. 2008 January; 178(1): 351-361.
Galtier N. A model of horizontal gene transfer and the bacterial phylogeny problem. Syst Biol. 2007 Aug;56(4):633-42.
Dutheil J, Galtier N. Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol. 2007; 7: 242.
Boussau B, Gouy M. Efficient likelihood computations with nonreversible models of evolution. Syst Biol. 2006 Oct;55(5):756-68.
Dutheil J, Pupko T, Jean-Marie A, Galtier N. A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol. 2005 Sep;22(9):1919-28.

Comments