6.5 Decoding with the Viterbi Algorithm
6.5 Decoding with the Viterbi Algorithm
Section titled “6.5 Decoding with the Viterbi Algorithm”Once a Hidden Markov Model has been specified, one of the first questions we want to answer is the following:
Given an observed sequence, what is the most likely sequence of hidden states that generated it?
This is the decoding problem, and the classical solution is the Viterbi algorithm.
At first glance, this problem seems straightforward. If the observed sequence is short and the number of hidden states is small, one might imagine listing all possible hidden paths, computing their probabilities, and choosing the best one. However, this quickly becomes infeasible. For a sequence of length and an HMM with hidden states, there are possible hidden state sequences. Even for moderate values of , exhaustive search becomes impossible.
The Viterbi algorithm solves this problem by exploiting a key structural property of Hidden Markov Models: although the total number of complete paths is enormous, many of these paths share common prefixes. This creates the kind of overlapping subproblem structure that can be handled by dynamic programming.
6.5.1 The Decoding Problem Revisited
Section titled “6.5.1 The Decoding Problem Revisited”Let the observed sequence be
and let the hidden state sequence be
For a fixed model , the decoding problem is to find
That is, among all possible hidden paths, we seek the one that maximizes the joint probability of both the observations and the path itself.
This formulation is important. We are not simply asking which path is plausible in isolation. We are asking which path provides the most probable explanation for the observed data under the model.
In the promoter-background example, this means that we want to assign to each observed nucleotide the state that best explains it, while also respecting the transition structure of the model. A path with excellent emissions but impossible transitions will not be favored. Likewise, a path with likely transitions but poor emissions will also not be optimal. The Viterbi algorithm balances both contributions.
6.5.2 Why Brute Force Fails
Section titled “6.5.2 Why Brute Force Fails”To appreciate the value of the algorithm, it is worth examining what a naive solution would require.
Suppose our HMM has only two hidden states:
and the observed sequence has length five. Then there are already
possible hidden paths.
For a length of twenty, there would be
possible paths.
For biologically realistic sequence lengths, exhaustive evaluation is no longer practical. Moreover, most of the work would be redundant. Many different paths share the same partial prefix, and repeatedly recomputing the best continuation of such prefixes would be wasteful.
The Viterbi algorithm avoids this redundancy by storing, at each sequence position and for each state, the probability of the best path ending there.
6.5.3 Core Idea of the Algorithm
Section titled “6.5.3 Core Idea of the Algorithm”The central observation is simple but powerful.
Suppose that at position we want to know the best path ending in state . Then this best path must have come from one of the states at position . Among all possible predecessor states, only one leads to the optimal path ending in .
This means that, instead of considering all complete paths at once, we can compute the best partial path to each state step by step.
Define the Viterbi score
as the probability of the most likely path that generates the first observations and ends in state .
This quantity summarizes exactly the information we need to continue the dynamic programming recursion.
6.5.4 Recurrence Relation
Section titled “6.5.4 Recurrence Relation”The algorithm consists of three parts: initialization, recursion, and termination.
Initialization
Section titled “Initialization”For the first observation , the best path ending in state is simply the probability of starting in that state and emitting the symbol:
where
- is the initial probability of state
- is the emission probability of symbol from state
Recursion
Section titled “Recursion”For each subsequent position , we compute
This equation has a very natural interpretation.
To arrive at state at position , we must
- have followed the best path to some predecessor state at position
- transition from to
- emit the observed symbol from state
Among all predecessor states , we choose the one that maximizes this product.
Backpointers
Section titled “Backpointers”To recover the actual path rather than only its probability, we must also store which predecessor state achieved the maximum. Thus, alongside the Viterbi scores, we maintain a backpointer table:
This table records, for each position and state, where the optimal path came from.
Termination
Section titled “Termination”Once the last observation has been processed, the score of the best complete path is
and the final state of the best path is
Starting from this final state, we reconstruct the full path by following the backpointers backward through the table.
6.5.5 A Worked Example
Section titled “6.5.5 A Worked Example”We now walk through the kind of example used in the lecture, using a simple two-state HMM with a promoter state and a background state .
Assume the following initial probabilities:
Transition probabilities:
Emission probabilities:
For the promoter state ,
For the background state ,
Now consider the observed sequence
We compute the best path step by step.
6.5.6 Step 1: Initialization
Section titled “6.5.6 Step 1: Initialization”For the first symbol ,
So after observing the first symbol, the best path ending in has much higher probability than the best path ending in . Intuitively, this means that the model initially explains the first more naturally as background than as promoter.
6.5.7 Step 2: Second Observation
Section titled “6.5.7 Step 2: Second Observation”The second symbol is .
To compute the best path ending in ,
Substituting the numbers gives
Thus the best predecessor is .
Now for the background state,
Thus the best predecessor is again .
At this point the best path to is slightly better than the best path to , meaning that after seeing ( AC ), the model now favors being in the promoter state.
6.5.8 Step 3: Third Observation
Section titled “6.5.8 Step 3: Third Observation”The third symbol is again .
For the promoter state,
Thus the best predecessor is .
For the background state,
Thus the best predecessor is .
Now the promoter path is clearly favored.
6.5.9 Step 4: Fourth Observation
Section titled “6.5.9 Step 4: Fourth Observation”The fourth symbol is .
For the promoter state,
Thus the best predecessor is .
For the background state,
Thus the best predecessor is .
Now the best path to is better than the best path to , indicating a transition back into background.
6.5.10 Step 5: Fifth Observation
Section titled “6.5.10 Step 5: Fifth Observation”The fifth symbol is .
For the promoter state,
Thus the best predecessor is .
For the background state,
Thus the best predecessor is .
The overall best final state is therefore , because
6.5.11 Traceback
Section titled “6.5.11 Traceback”We now reconstruct the best path by following the backpointers backward.
From the calculations above, the best final state is . The predecessor choices were:
- at position 5,
- at position 4,
- at position 3,
- at position 2,
- at position 1, start in
Thus the most likely hidden path is
This is exactly the kind of output we want from a decoding algorithm. The observed sequence is explained as starting in background, passing through a short promoter-like region, and returning to background.
6.5.12 Interpretation of the Result
Section titled “6.5.12 Interpretation of the Result”This example illustrates several important points.
First, the most likely state at a given position does not depend only on the symbol observed at that position. It also depends on the probability of reaching that state from the preceding path. This is why the Viterbi algorithm cannot be replaced by a simple position-wise classification rule.
Second, the algorithm balances emission evidence against transition structure. For example, a symbol that is moderately compatible with the promoter state may still be assigned to background if the transition into promoter is unlikely, or vice versa.
Third, the output is not just a score but an interpretable segmentation of the sequence. In biological terms, this segmentation can correspond to regions of different functional character.
6.5.13 Dynamic Programming Structure
Section titled “6.5.13 Dynamic Programming Structure”It is useful to pause and make the dynamic programming principle explicit.
At each cell of the Viterbi table, we store only the best path ending in that state. Why is it safe to discard all other partial paths ending there?
Because any future continuation depends only on the current state and current score, not on the detailed history of how that state was reached. If one partial path to state is worse than another, it can never overtake it later, since both will face the same future transition and emission options.
This is the optimal substructure property that makes Viterbi possible.
6.5.14 Computational Complexity
Section titled “6.5.14 Computational Complexity”Suppose an HMM has states and the observed sequence has length .
At each position, for each state, we consider all predecessor states. This leads to a time complexity of
This is dramatically better than the exponential complexity of exhaustive enumeration.
The memory cost is typically
if the full table and backpointers are stored, which is usually acceptable for moderate state spaces.
6.5.15 Conceptual Summary
Section titled “6.5.15 Conceptual Summary”The Viterbi algorithm solves the decoding problem by combining three ideas:
- state-specific emissions
- state-to-state transitions
- dynamic programming
Its output is the single most likely hidden path explaining the observed sequence.
For biological sequence analysis, this makes it a powerful tool for:
- promoter detection
- gene structure prediction
- protein secondary structure labeling
- profile-based sequence classification
In all of these applications, the goal is not merely to score a sequence, but to reconstruct the hidden biological organization underlying it.
Self-Check Questions
Section titled “Self-Check Questions”- Why is exhaustive search over all hidden paths infeasible?
- What does the quantity ( v_i(j) ) represent in the Viterbi algorithm?
- Why are backpointers needed?
- In what sense does the Viterbi algorithm use dynamic programming?
- Why can the most likely state at a position depend on earlier positions?