6.4 The Three Fundamental Problems of Hidden Markov Models
6.4 The Three Fundamental Problems of Hidden Markov Models
Section titled “6.4 The Three Fundamental Problems of Hidden Markov Models”In the previous sections, we introduced Hidden Markov Models as generative models for biological sequences and highlighted the crucial distinction between observed data and hidden state sequences. This distinction naturally leads to a set of fundamental computational questions.
Given an HMM and an observed sequence, we are faced with three different but closely related problems:
- Decoding: What is the most likely hidden state sequence?
- Evaluation: How likely is the observed sequence under the model?
- Learning: How can we estimate the model parameters from data?
Each of these problems captures a different aspect of inference in Hidden Markov Models, and each requires a dedicated algorithmic solution.
6.4.1 Decoding: Inferring the Most Likely State Sequence
Section titled “6.4.1 Decoding: Inferring the Most Likely State Sequence”We begin with the decoding problem.
Given:
- an observed sequence ( X = (x_1, x_2, \dots, x_n) )
- a model defined by transition, emission, and initial probabilities
we seek the hidden state sequence ( S = (s_1, s_2, \dots, s_n) ) that best explains the data.
Formally, we want to compute:
That is, we search for the state sequence that maximizes the joint probability of the observed sequence and the hidden states.
Interpretation
Section titled “Interpretation”In biological terms, this corresponds to assigning a label to each position in the sequence.
For example, in the promoter-background model:
- each position is classified as either promoter () or background ()
- the result is a segmentation of the sequence into functional regions
This turns HMM decoding into a sequence classification problem.
Why this is non-trivial
Section titled “Why this is non-trivial”The difficulty arises from the fact that the number of possible state sequences grows exponentially with sequence length. For a model with states and a sequence of length , there are possible paths.
Exhaustively evaluating all paths is therefore infeasible.
Solution: The Viterbi Algorithm
Section titled “Solution: The Viterbi Algorithm”The Viterbi algorithm solves this problem efficiently using dynamic programming.
The key idea is to compute, for each position and each state, the probability of the most likely path that ends in that state. Instead of enumerating all possible paths, we reuse intermediate results.
At each step, we:
- consider all possible previous states
- select the one that maximizes the probability
- propagate this information forward
In this way, the algorithm constructs the optimal path incrementally.
Conceptual summary
Section titled “Conceptual summary”- Input: sequence , model
- Output: most likely hidden state sequence ( S^* )
- Method: dynamic programming
6.4.2 Evaluation: Computing Sequence Likelihood
Section titled “6.4.2 Evaluation: Computing Sequence Likelihood”The second problem addresses a different question.
Instead of asking for the best explanation, we ask:
How likely is the observed sequence under the model?
Formally, we compute:
Summing over all possible paths
Section titled “Summing over all possible paths”Unlike the decoding problem, we do not select a single state sequence. Instead, we must consider all possible hidden paths:
Each path contributes to the total probability, weighted by how likely it is.
Why this is difficult
Section titled “Why this is difficult”As before, the number of possible paths grows exponentially with sequence length. Direct computation is therefore infeasible.
Solution: The Forward Algorithm
Section titled “Solution: The Forward Algorithm”The forward algorithm addresses this problem using dynamic programming.
Instead of enumerating paths explicitly, it computes intermediate quantities:
which represent the probability of observing the first symbols and ending in state .
These values can be computed recursively:
- initialize using initial and emission probabilities
- propagate forward using transition and emission probabilities
- sum over all final states
This allows us to compute efficiently.
Interpretation
Section titled “Interpretation”The forward algorithm answers a fundamentally different question than Viterbi:
- Viterbi finds the single best path
- Forward sums over all possible paths
A useful intuition is:
Viterbi gives the most plausible explanation, while the forward algorithm measures the overall support of the model for the data.
Backward Algorithm
Section titled “Backward Algorithm”A closely related method is the backward algorithm, which computes probabilities starting from the end of the sequence. It follows the same principles and yields the same final result.
Both forward and backward algorithms are often used together in parameter estimation.
6.4.3 Learning: Estimating Model Parameters
Section titled “6.4.3 Learning: Estimating Model Parameters”The third problem addresses the situation where the model parameters are unknown.
Given:
- a set of observed sequences
- a model structure (states and connectivity)
we want to estimate:
- transition probabilities
- emission probabilities
- initial probabilities
The challenge
Section titled “The challenge”Unlike standard parameter estimation problems, we do not observe the hidden states. This makes direct estimation difficult, since we cannot simply count transitions and emissions.
Solution: The Baum–Welch Algorithm
Section titled “Solution: The Baum–Welch Algorithm”The Baum–Welch algorithm provides a solution by iteratively estimating the hidden structure and updating the model parameters.
It follows the general principle of the Expectation–Maximization (EM) algorithm:
-
Expectation step Estimate how likely different hidden paths are, given the current parameters
-
Maximization step Update the parameters based on these estimates
This process is repeated until convergence.
Intuition
Section titled “Intuition”A helpful way to think about Baum–Welch is as an alternating procedure:
- assume a tentative explanation of the data
- compute statistics based on this explanation
- refine the model
- repeat
Over time, the model becomes better at explaining the observed sequences.
Practical considerations
Section titled “Practical considerations”An important aspect of Baum–Welch is that it does not guarantee a global optimum. The result may depend on the initial parameter values.
In practice, this is addressed by:
- running the algorithm multiple times with different initializations
- comparing the resulting models
6.4.4 Summary of the Three Problems
Section titled “6.4.4 Summary of the Three Problems”The three fundamental problems of Hidden Markov Models can be summarized as follows:
| Problem | Question | Algorithm |
|---|---|---|
| Decoding | What is the best hidden path? | Viterbi |
| Evaluation | How likely is the sequence? | Forward/Backward |
| Learning | What are the model parameters? | Baum–Welch |
Together, these problems define the computational framework for working with HMMs.
Conceptual Integration
Section titled “Conceptual Integration”These three problems correspond directly to the challenges introduced earlier:
- Inferring hidden structure → Decoding
- Evaluating model fit → Evaluation
- Building models from data → Learning
They transform the conceptual model of Hidden Markov Models into a practical computational tool.
Self-Check Questions
Section titled “Self-Check Questions”- What is the difference between maximizing ( P(X, S) ) and summing over all ( P(X, S) )?
- Why is the Viterbi algorithm more efficient than brute-force enumeration?
- What does the forward algorithm compute at each step?
- Why is parameter estimation difficult when states are hidden?
- How does the Baum–Welch algorithm relate to the EM principle?