BOL: Bio++ : C Language libraries for your biological need

Pages
Jitendra Narayan
BioProgramming
Bio++ : C Language libraries for your biological need

Bio++ : C Language libraries for your biological need

Last updated 4751 days ago by Jai Singh Comments (7)

C has always been a language that never attempts to tie a programmer down - it allows for easy implementation, it comes with a genuinely useful standard library that can itself be implemented in C, and it is both efficient and portable. C has always appealed to systems programmers who like the terse, concise manner in which powerful expressions can be coded. C was widely distributed with an Operating System (Unix) that was actually largely written in C itself. Also, C allowed programmers to (while sacrificing portability) have direct access to many machine-level features that would otherwise require the use of Assembly Language.

As Dennis Ritchie writes in his paper, "The Development of the C Language",
C is quirky, flawed, and an enormous success. While accidents of history surely helped, it evidently satisfied a need for a system implementation language efficient enough to displace assembly language, yet sufficiently abstract and fluent to describe algorithms and interactions in a wide variety of environments.

C++ has its basis in C - extending it by supporting features meant to encourage and support the development of large programs. Perhaps most importantly, it supports object-oriented programming in a familiar setting and framework (that of C). When C++ was created, one of the initial aims was to retain compatibility with C to as large an extent as possible, and retain its spirit and efficiency. It was possible to convert from C to C++ gradually, thus making use of C++ (initally, at least) as a "better C", and moving on to using other features. This allowed many C programmers to learn C++ quickly (though using C++ effectively requires a major mind-shift for many C programmers)
Are you really interested in C/C++ language for the biological programming? If yes there is good news for you. Bio++ 1.9.0 is available with amazing libraries that can help you to solve approximately all problems related with biology.

Some of the new feature has been added in the latest version, these are as follows:

Support for codon models (including non-homogenous models),
Tools for manipulating Hidden Markov Models,
Improved numerical tools (numerical derivatives, parameter transforms...),
A new library, Bio++ RAA (Remote Acnuc Access), allowing you to fetch public databases like GenBank, EMBL or SwissProt,
Algorithms for plotting trees, with support for vector formats like SVG, Fig or LaTeX-PGF.
So get relax and solve the HMM problems with an ease with Bio++. J
Now the time has been change, the biological programmers are ready to use the C++ libraries of biology. These library are designed in order to reduce the C++ long codes in a small and handy for the biological programmers. Basically, Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis,phylogenetics, molecular evolution and population genetics.
Bio++ is designed in an extensible object-oriented way, in the C++ language.

Some of the unique features of the libraries are as follows:
Sequence analysis

Sequence and Site objects, with various Alphabet support (DNA, RNA, Proteins, Codons, any 'Word' of a given size).
Several containers available for inner storage, with several implementations. Support for alignments.
Various I/O formats supported: Fasta, Mase, CLustal, Phylip, DCSE, GenBank (sequence only).
Sequence manipulation: truncation, concatenation, sub-sequences, etc.
In silico molecular biology: (reverse) transcription, translation, replication.
Several genetic codes availables: Standard and mitochondrial (vertebrates, echinoderms and other invertabrates)
Amino acids properties: volume, polarity and charge + physico-chemical distance (Miyata and Grantham) + import from any AAIndex entry.
Consensus sequences.
Pairwise alignment.
Similarity score computation.
Sequence bootstrap.
Homogeneity test (Bowker's test).
etc.

Phylogenetics and molecular evolution
Data structure and IO

Phylogenetic trees.
IO from newick files, with support for multiple entries.

Phylogenetic reconstuction methods

Parsimony (NNI)
Distance matrices estimation and I/O to files in Phylip format.
Distance methods: (U/W)PGMA, NJ, BioNJ.
Maximum likelihood (NNI, including a PhyML-like algorithm).
Mixed distance/ML tree reconstruction (iterative approaches).
Tree consensus methods, bipartitions, bootstrap value computations.

Substitution models

JC, K80, T92, F84, HKY85, TN93, GTR and more for nucleotides,
JC, DSO78, JTT92 + any PAML-formated model description for proteins, with possibility to estimate equilibrium frequencies.
Various codon models: Muse & Gaut 1994, Yang & Nielsen 1998, Goldman & Yang 1994 + user-defined.
Support for rate-across sites models, with virtually any probability distribution, allowing for invariant classes.
Covarion models.
Model including gaps.
Global clock tree likelihood models.
Virtually any kind of non-homogeneous model is supported!
Mixed models (beta).

Molecular evolution tools

Parameter estimation under maximum likelihood.
Ancestral states reconstructions: Marginal likelihood.
(Weighted) substitution mapping.
Sequences simulation under any substitution model, homogeneous or not.

Population genetics

A new file format to deal with codominant markers and bio-sequence data for individuals.
Import and export methods with various population genetics software.
Specific containers for polymorphism data.
Diversity and polymorphism statistics for codominant and sequence data.
Estimation of Wright F-statistics and pairwise genetic distance on codominant markers.
Statistics on synonymous and non synonymous sites for coding sequences
Various 'Neutrality' statistics on sequence data (Tajima, Fu and Li, Rand and Kann ...).
Various measures of linkage disequilibrium.
etc.

Numerical calculus

Numerical tools: extended functions (log, factorial, etc.)
Vector tools: element-wise functions, statistics (mean, var, sd, correlation, information theory)
Classes for matrices implementation.
Linear algebra: eigen decomposition, LU decomposition, inversion, etc.
Random number generation: Quick & Dirty (32bits only), Wichmann and Hill, Knuth. Samplers from probability distributions (uniform, normal, gamma, etc.).
Function object implementation, with first and second order derivatives.
Numerical derivatives computation.
Optimization algorithms: Golden section search, Brent's algorithm, Powell's and Downhill simplex method, but also methods using derivatives like conjugate gradient and Newton's method. Object implementation of these methods, using the event-driven Optmizer interface (works with Function objects).
Statistics: DataTable object, with I/O from CSV files, probability distributions.
etc.

Utils

Files: working on file paths, getting file extensions and names, testing existence, open and store in string arrays, etc.
Text: convert text to any other type and vice versa, remove spaces, tokenize, switch between upper/lower case, etc.
Applications: read options from a file or command line
etc.

Some of the libraries are under development that will be updated by Bio++ developers on there websites.
C/C++ Tutorial
http://www.cbcb.umd.edu/~jeallen/bioinfo/
Tutorial on Bio++
http://162.38.181.25/BioPP/articles/tutorial/index.html
Download Links
http://162.38.181.25/BioPP/articles/download/index.html

Reference:
http://162.38.181.25/BioPP/index.html
Dutheil J, Boussau B. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol Biol. 2008 Sep 22;8(1):255
Dutheil J, Gaillard S, Bazin E, GlÃ©min S, Ranwez V, Galtier N, Belkhir K. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics. 2006 Apr 4;7:188.
Dutheil JY, Ganapathy G, Hobolth A, Mailund T, Uyenoyama MK, Schierup MH. Ancestral population genomics: the coalescent hidden Markov model approach. Genetics. 2009 Sep;183(1):259-74.
Nabholz B, Mauffrey J-F, Bazin E, Galtier N, GlÃ©min S. Determination of Mitochondrial Genetic Diversity in Mammals. Genetics. 2008 January; 178(1): 351-361.
Galtier N. A model of horizontal gene transfer and the bacterial phylogeny problem. Syst Biol. 2007 Aug;56(4):633-42.
Dutheil J, Galtier N. Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol. 2007; 7: 242.
Boussau B, Gouy M. Efficient likelihood computations with nonreversible models of evolution. Syst Biol. 2006 Oct;55(5):756-68.
Dutheil J, Pupko T, Jean-Marie A, Galtier N. A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol. 2005 Sep;22(9):1919-28.

Comments

- Jitendra Narayan@admin
Jitendra Narayan 4715 days ago
Parse FASTA/FASTQ file with C
http://lh3lh3.users.sourceforge.net/parsefastq.shtml
- Jitendra Narayan@admin
Jitendra Narayan 4711 days ago
Biocoder: A programming language for standardizing and automating biology protocols. BioCoder, a C++ library that enables biologists to express the exact steps needed to execute a protocol. In addition to being suitable for automation, BioCoder converts the code into a readable, English-language description for use by biologists.
http://www.ncbi.nlm.nih.gov/pubmed/21059251
This microsoft new language can help to automate protocols http://research.microsoft.com/en-us/um/india/projects/biocoder/
http://www.jbioleng.org/content/4/1/13
- Archana Malhotra@archana
Archana Malhotra 4711 days ago
Bioinformatics tools written with the Bio++ libraries
http://home.gna.org/bppsuite/
- Jitendra Narayan@admin
Jitendra Narayan 4711 days ago
The main website for Bio++ project http://biopp.univ-montp2.fr/
- Surajeet@surajeet
Surajeet 4630 days ago
Bio++ Tutorial & Cookbook by Julien Dutheil & Sylvain Gaillard http://biopp.univ-montp2.fr/Documents/Tutorial/Tutorial.pdf
http://www.biotnet.org/sites/biotnet.org/files/documents/25/biopython_intro.pdf
- Rahul Nayak@rahul
Rahul Nayak 4629 days ago
The Bio++ Program Suite is a package of programs using the Bio++ libraries and dedicated to Phylogenetics and Molecular Evolution.

Bio++ Suite contains the following components:

bppPars (Optimize a phylogenetic tree according to maximum parsimony).
bppDist (Estimate a distance matrix and build a phylogenetic tree according to several model of evolution and reconstruction methods).
bppML (Optimize a phylogenetic tree and other model parameters according to maximum likelihood. Several models are supported).
bppSeqGen (Simulate data sets according to a phylogenetic tree and an evolutionary model).
bppAncestor (Reconstruct ancestral sequences).
bppConsense (Build consensus trees and compute bootstrap values).
bppSeqMan (Sequence manipulation and file format conversion).
bppPhySamp (Sample sequences from a file according to a phylogenetic tree).
bppReRoot (Reroot all trees in a file according to a user-specified list of outgroups).

All programs share a common option file format.

More at http://gna.org/projects/bppsuite

http://home.gna.org/bppsuite/

http://biopp.univ-montp2.fr/manual/pdf/bppsuite.pdf
- Jitendra Narayan@admin
Jitendra Narayan 4043 days ago
MafFilter is a program to process genome alignment in the Multiple Alignment Format. Current version is 1.1.2 http://biopp.univ-montp2.fr/forge/maffilter

BOL

Jitendra Narayan

Navigation

Our Sponsors

Bio++ : C Language libraries for your biological need

Bio++ : C Language libraries for your biological need

Comments