6.5.16 Log-Space Implementation and Numerical Stability

In the previous sections, we formulated the Viterbi algorithm in terms of probabilities that are multiplied along a path. While this formulation is conceptually straightforward, it leads to a serious practical problem.

The problem of numerical underflow

Probabilities are values between 0 and 1. When we multiply many such values, the result decreases exponentially with sequence length.

For example, even moderate probabilities such as

0.3 \cdot 0.4 \cdot 0.2 \cdot 0.5 \cdot 0.1

already produce a small number. For realistic biological sequences with hundreds or thousands of positions, the resulting values quickly become so small that they cannot be represented accurately in floating-point arithmetic. This phenomenon is known as numerical underflow.

As a consequence, a direct implementation of the Viterbi algorithm in probability space is unstable and unreliable for real-world applications.

Logarithmic Transformation

To address this issue, we transform the computation into logarithmic space.

The key observation is that multiplication of probabilities becomes addition in log-space:

\log(a \cdot b) = \log a + \log b

Applying this transformation to the Viterbi recursion yields a numerically stable formulation.

Log-Space Viterbi Scores

We define the log-Viterbi score:

\ell_i(j) = \log v_i(j)

Using this definition, the initialization step becomes:

\ell_1(j) = \log \pi_j + \log e_j(x_1)

Recursion in log-space

The recursive step becomes:

\ell_{i+1}(j) = \max_k \bigl( \ell_i(k) + \log t_{kj} + \log e_j(x_{i+1}) \bigr)

This expression replaces products with sums, which are numerically stable even for long sequences.

Backpointers remain unchanged

Importantly, the backpointer computation is unaffected:

b_{i+1}(j) = \arg\max_k \bigl( \ell_i(k) + \log t_{kj} + \log e_j(x_{i+1}) \bigr)

Thus, the structure of the algorithm remains identical. Only the arithmetic changes.

Interpretation

Working in log-space also provides a useful conceptual reinterpretation.

Instead of maximizing a product of probabilities, we maximize a sum of log-probabilities
This can be viewed as accumulating a score along a path

In this perspective, each transition and emission contributes an additive term to the total score. The Viterbi algorithm then finds the path with the highest total score.

This formulation is often more intuitive when thinking about the relative contributions of different parts of the model.

Handling Zero Probabilities

A practical detail arises when probabilities are zero. Since

\log 0 = -\infty

such transitions or emissions are effectively forbidden. In implementation, this is typically handled by representing impossible events with a sufficiently large negative number or by explicitly encoding (-\infty).

This interpretation is consistent with the probabilistic model: a zero probability means that a particular transition or emission cannot occur.

Comparison to the Forward Algorithm

It is worth noting that while the Viterbi algorithm uses the maximum operator in its recursion, the forward algorithm uses a sum over all predecessor states.

In log-space, this leads to an important difference:

Viterbi: [ \max_k (\cdot) ]
Forward: [ \log \sum_k \exp(\cdot) ]

The latter expression is known as the log-sum-exp operation and requires special care for numerical stability. We will return to this point when discussing the forward algorithm.

Practical Recommendation

In practice, implementations of the Viterbi algorithm should always be carried out in log-space. This ensures:

numerical stability for long sequences
robustness against underflow
improved interpretability as additive scoring

Conceptual Summary

The log-space formulation preserves the structure of the Viterbi algorithm while making it computationally feasible for realistic biological data.

Multiplication → addition
Probabilities → log-scores
Same dynamic programming structure

This transformation is a standard technique in probabilistic modeling and will reappear in other algorithms throughout this chapter.

Self-Check Questions

Why does multiplying probabilities lead to numerical underflow?
How does the logarithmic transformation address this issue?
What changes in the Viterbi recursion when moving to log-space?
Why does the backpointer computation remain unchanged?
What is the conceptual difference between maximizing probabilities and maximizing log-scores?