Skip to content

6.5.16 Log-Space Implementation and Numerical Stability

6.5.16 Log-Space Implementation and Numerical Stability

Section titled “6.5.16 Log-Space Implementation and Numerical Stability”

In the previous sections, we formulated the Viterbi algorithm in terms of probabilities that are multiplied along a path. While this formulation is conceptually straightforward, it leads to a serious practical problem.

Probabilities are values between 0 and 1. When we multiply many such values, the result decreases exponentially with sequence length.

For example, even moderate probabilities such as

0.30.40.20.50.10.3 \cdot 0.4 \cdot 0.2 \cdot 0.5 \cdot 0.1

already produce a small number. For realistic biological sequences with hundreds or thousands of positions, the resulting values quickly become so small that they cannot be represented accurately in floating-point arithmetic. This phenomenon is known as numerical underflow.

As a consequence, a direct implementation of the Viterbi algorithm in probability space is unstable and unreliable for real-world applications.


To address this issue, we transform the computation into logarithmic space.

The key observation is that multiplication of probabilities becomes addition in log-space:

log(ab)=loga+logb\log(a \cdot b) = \log a + \log b

Applying this transformation to the Viterbi recursion yields a numerically stable formulation.


We define the log-Viterbi score:

i(j)=logvi(j)\ell_i(j) = \log v_i(j)

Using this definition, the initialization step becomes:

1(j)=logπj+logej(x1)\ell_1(j) = \log \pi_j + \log e_j(x_1)

The recursive step becomes:

i+1(j)=maxk(i(k)+logtkj+logej(xi+1))\ell_{i+1}(j) = \max_k \bigl( \ell_i(k) + \log t_{kj} + \log e_j(x_{i+1}) \bigr)

This expression replaces products with sums, which are numerically stable even for long sequences.


Importantly, the backpointer computation is unaffected:

bi+1(j)=argmaxk(i(k)+logtkj+logej(xi+1))b_{i+1}(j) = \arg\max_k \bigl( \ell_i(k) + \log t_{kj} + \log e_j(x_{i+1}) \bigr)

Thus, the structure of the algorithm remains identical. Only the arithmetic changes.


Working in log-space also provides a useful conceptual reinterpretation.

  • Instead of maximizing a product of probabilities, we maximize a sum of log-probabilities
  • This can be viewed as accumulating a score along a path

In this perspective, each transition and emission contributes an additive term to the total score. The Viterbi algorithm then finds the path with the highest total score.

This formulation is often more intuitive when thinking about the relative contributions of different parts of the model.


A practical detail arises when probabilities are zero. Since

log0=\log 0 = -\infty

such transitions or emissions are effectively forbidden. In implementation, this is typically handled by representing impossible events with a sufficiently large negative number or by explicitly encoding (-\infty).

This interpretation is consistent with the probabilistic model: a zero probability means that a particular transition or emission cannot occur.


It is worth noting that while the Viterbi algorithm uses the maximum operator in its recursion, the forward algorithm uses a sum over all predecessor states.

In log-space, this leads to an important difference:

  • Viterbi: [ \max_k (\cdot) ]

  • Forward: [ \log \sum_k \exp(\cdot) ]

The latter expression is known as the log-sum-exp operation and requires special care for numerical stability. We will return to this point when discussing the forward algorithm.


In practice, implementations of the Viterbi algorithm should always be carried out in log-space. This ensures:

  • numerical stability for long sequences
  • robustness against underflow
  • improved interpretability as additive scoring

The log-space formulation preserves the structure of the Viterbi algorithm while making it computationally feasible for realistic biological data.

  • Multiplication → addition
  • Probabilities → log-scores
  • Same dynamic programming structure

This transformation is a standard technique in probabilistic modeling and will reappear in other algorithms throughout this chapter.


  1. Why does multiplying probabilities lead to numerical underflow?
  2. How does the logarithmic transformation address this issue?
  3. What changes in the Viterbi recursion when moving to log-space?
  4. Why does the backpointer computation remain unchanged?
  5. What is the conceptual difference between maximizing probabilities and maximizing log-scores?