Computational Methods of Hidden Markov Models With Respect To CpG Island Prediction in DNA Sequences

Roberto Angel Ortega, University of Texas at El Paso

Abstract

Hidden Markov models (HMM's) are a specific case of Markov models where, contrary to Markov chains, the observer is unaware of what state the model was in when the symbol is observed. Like Markov chains, HMM's assume that the future state of a sequence is dependent only on the current state of the sequence. The parameters associated with HMM's are transition and emission probabilities, where transition probabilities are associated with the probability of transitioning from one state to another, and emission probabilities are the probabilities associated with observing a symbol given it came from a specific state. The structure of DNA sequences is made up of the nucleotides adenine (A), cytosine (C), guanine (G), and thymine (T). CpG islands are regions within the DNA sequence where there is a higher occurrence of the CG dinucleotide. The HMM algorithms used to analyze the DNA sequences were the Viterbi, Baum-Welch, and Viterbi training algorithms. The Viterbi algorithm determines the state-sequence that is most likely to have produced the given sequence, given the model. The Baum-Welch and Viterbi training algorithms estimate the parameters associated with an HMM. In specific, we have assessed the accuracy of the aforementioned Viterbi algorithm at predicting the location of CpG islands within DNA sequences as well as determine the strength of the parameter estimating algorithms at recovering the model parameters.

Subject Area

Statistics|Bioinformatics

Recommended Citation

Ortega, Roberto Angel, "Computational Methods of Hidden Markov Models With Respect To CpG Island Prediction in DNA Sequences" (2011). ETD Collection for University of Texas, El Paso. AAI1498307.
https://scholarworks.utep.edu/dissertations/AAI1498307

Share

COinS