6.7 Parameter Estimation with the Baum–Welch Algorithm

So far, we have assumed that the parameters of our Hidden Markov Model are known. In particular, we assumed that we are given:

the transition probabilities
the emission probabilities
the initial state distribution

In practice, however, this is rarely the case. Typically, we are given only a set of observed sequences, and the goal is to learn the model parameters from data.

This leads to the third fundamental problem of Hidden Markov Models:

How can we estimate the parameters of an HMM when the hidden states are not observed?

6.7.1 Why Learning Is Difficult

If the hidden state sequence were known, parameter estimation would be straightforward.

Transition probabilities could be estimated by counting transitions between states
Emission probabilities could be estimated by counting how often each symbol is emitted from each state
Initial probabilities could be estimated from the starting states

However, in an HMM, the state sequence is hidden. We only observe the emitted symbols. This means that we do not know:

which state emitted which symbol
how often transitions between states occurred

As a result, we cannot directly count frequencies.

6.7.2 A First Idea: Guess the Hidden States

A natural starting point is to approximate the hidden state sequence.

Suppose we somehow assign a state label to each position in the observed sequence. For example:

\text{Observed: } A \quad C \quad C \quad T \quad A

\text{Hidden (guess): } B \quad P \quad P \quad B \quad B

Given such a labeling, we can immediately estimate the model parameters:

count how often each symbol appears in each state
count how often each transition occurs
normalize the counts to obtain probabilities

This idea is simple and appealing, but it raises an obvious question:

Where does the labeling come from?

The Baum–Welch algorithm answers this question by turning the problem into an iterative procedure.

The idea is to alternate between:

Inferring a hidden structure given the current model
Updating the model parameters based on this inferred structure

More concretely:

Start with an initial guess for the model parameters
Use the model to infer hidden state sequences
Use these inferred sequences to update the parameters
Repeat until convergence

This procedure gradually improves the model.

Box 6.2 — Learning as “Guess → Count → Update”

A useful way to understand Baum–Welch is to think of it as a repeated cycle of guessing and refinement.

Step 1: Guess a hidden state sequence

Start with an initial model (often random). Use it to assign states to the observed sequence, for example using the Viterbi algorithm.

\text{Observed: } A \quad C \quad C \quad T \quad A

\text{Hidden (guess): } B \quad P \quad P \quad B \quad B

Step 2: Count frequencies

From this guessed labeling, compute:

how often each symbol appears in each state
how often transitions between states occur

For example:

count how many times $A$ is emitted from $B$
count how many times we transition from $P$ to $B$

Step 3: Update parameters

Convert counts into probabilities:

normalize emission counts to obtain emission probabilities
normalize transition counts to obtain transition probabilities

Step 4: Repeat

Use the updated model to compute a new hidden state sequence, and repeat the process.

Key insight

Each iteration improves the model’s ability to explain the observed data.

6.7.4 From Hard Assignments to Soft Assignments

The procedure described above is intuitive but slightly simplified. It corresponds to making a hard assignment of each position to a single state.

However, as we saw earlier, multiple hidden paths may explain the same observed sequence. Instead of committing to a single path, it is often better to consider all possible paths simultaneously, weighted by their probabilities.

This leads to the idea of soft assignments.

Instead of assigning a position entirely to one state, we compute:

the probability that position $i$ was generated by state $s_j$
the probability that a transition ( s_i \rightarrow s_j ) occurred

These quantities can be computed efficiently using the forward and backward algorithms.

6.7.5 The Baum–Welch Algorithm

The Baum–Welch algorithm implements this idea in a principled way. It is a special case of the Expectation–Maximization (EM) algorithm.

Expectation step (E-step)

Using the current model, compute:

expected counts of emissions
expected counts of transitions

This is done by combining forward and backward probabilities to estimate how likely each state and transition is at each position.

Maximization step (M-step)

Update the model parameters:

normalize expected emission counts → emission probabilities
normalize expected transition counts → transition probabilities
update initial probabilities

Iteration

Repeat the E-step and M-step until the parameters converge.

6.7.6 Interpretation

Baum–Welch can be interpreted as a refinement of the intuitive “guess → count → update” procedure:

Instead of guessing a single hidden path, we consider all possible paths
Instead of counting occurrences directly, we compute expected counts
Instead of updating once, we iterate until convergence

This makes the algorithm both more robust and more principled.

6.7.7 Convergence and Limitations

An important property of the Baum–Welch algorithm is that it guarantees improvement of the likelihood at each iteration. However, it does not guarantee convergence to the global optimum.

In practice, this means:

the result depends on the initial parameter values
the algorithm may converge to a local optimum

To address this, it is common to:

run the algorithm multiple times with different initializations
compare the resulting models

6.7.8 Biological Interpretation

From a biological perspective, Baum–Welch allows us to:

learn motif models from sequence data
estimate transition patterns between functional regions
refine models iteratively as more data becomes available

In particular, it provides a principled way to move from:

observed sequences only to
fully parameterized probabilistic models

6.7.9 Conceptual Summary

The Baum–Welch algorithm solves the learning problem for Hidden Markov Models by combining:

probabilistic inference (forward and backward algorithms)
parameter estimation (normalization of expected counts)
iterative refinement (EM framework)

It transforms HMMs from fixed models into learnable models, making them applicable to real biological data.

Self-Check Questions

Why is parameter estimation difficult when hidden states are not observed?
What is the intuition behind the “guess → count → update” procedure?
What is the difference between hard and soft assignments of states?
How do the forward and backward algorithms contribute to parameter estimation?
Why can Baum–Welch converge to a local optimum?

6.7 Parameter Estimation with the Baum–Welch Algorithm

6.7 Parameter Estimation with the Baum–Welch Algorithm

6.7.1 Why Learning Is Difficult

6.7.2 A First Idea: Guess the Hidden States

6.7.3 Iterative Refinement: The Core Idea

Box 6.2 — Learning as “Guess → Count → Update”

Step 1: Guess a hidden state sequence

Step 2: Count frequencies

Step 3: Update parameters

Step 4: Repeat

Key insight

6.7.4 From Hard Assignments to Soft Assignments

6.7.5 The Baum–Welch Algorithm

Expectation step (E-step)

Maximization step (M-step)

Iteration

6.7.6 Interpretation

6.7.7 Convergence and Limitations

6.7.8 Biological Interpretation

6.7.9 Conceptual Summary

Self-Check Questions