Hidden Markov Models, the Viterbi Algorithm, and CpG Islands (in VB6)
Problem :
The CG island is a stretch of DNA (usually longer than 200 bases) in which the frequency of the CG sequence is higher than other regions. It is also called the CpG island, where "p" simply indicates that "C" and "G" are connected by a phosphodiester bond.
CpG islands are often located around the promoters of housekeeping genes (which are essential for general cell functions) or other genes frequently expressed in a cell. At these locations, the CG sequence is not methylated. By contrast, the CG sequences in inactive genes are usually methylated to suppress their expression. The methylated cytosine may be converted to thymine by accidental deamination. Unlike the cytosine to uracil mutation which is efficiently repaired, the cytosine to thymine mutation can be corrected only by the mismatch repair which is very inefficient. Hence, over evolutionary time scales, the methylated CG sequence will be converted to the TG sequence.
Find step wise explanationand implementation steps at http://dna.cs.byu.edu/bio465/Labs/hmm.shtml
Source code with explanation http://www.tannerhelland.com/1187/hidden-markov-models-viterbi-algorithm-cpg-islands-in-vb6/
Fore detail understanding of HMM read this excellent tutorial http://www.cs.ubc.ca/~murphyk/Software/HMM/labman2.pdf
Viterbi Algo at http://en.wikipedia.org/wiki/Viterbi_path
For firther reading Wiki page http://en.wikipedia.org/wiki/Hidden_Markov_model
On CpG island paper and for indepth understanding http://www.biomedcentral.com/1471-2164/12/S2/S10
If you are more interested in exploring Markov Chain Exploration and understand it with graphical version please visit http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=75049&lngWId=1
Reference:
Comments
Thanks for this links, really informative.
Playing with the codes is my favourite hobby, will try this. Thanks for sharing useful algorithms links.