5.7 A Simple Worked Example: The Coin Toss Problem
5.7 A Simple Worked Example: The Coin Toss Problem
Section titled “5.7 A Simple Worked Example: The Coin Toss Problem”Learning Objectives
Section titled “Learning Objectives”After reading this section, you should be able to:
- apply maximum likelihood estimation in a fully observed setting
- understand why missing data prevents direct estimation
- explain the coin toss problem as an example of hidden variables
- perform one iteration of the EM algorithm conceptually
- relate the coin toss example to motif discovery
A Simple Model with Unknown Origins
Section titled “A Simple Model with Unknown Origins”To understand how the EM idea operates in practice, it is helpful to examine a simple example in which all calculations can be followed explicitly. Although this example is not biological, it captures the essential structure of the motif discovery problem.
Imagine that we are given two coins, which we will call coin A and coin B. Each coin has its own probability of producing heads, denoted by and . These probabilities are unknown, and our goal is to estimate them.
We perform a series of experiments. In each experiment, one of the two coins is selected, and then tossed multiple times. The outcome of each experiment is a sequence of heads and tails. After repeating this process several times, we obtain a collection of such sequences.
At first glance, this may seem like a simple estimation problem. However, the key question is: do we know which coin was used in each experiment?
The Fully Observed Case: Maximum Likelihood Estimation
Section titled “The Fully Observed Case: Maximum Likelihood Estimation”Let us first consider the simpler case in which this information is available. Suppose that for each sequence of coin tosses, we know whether it was generated by coin A or coin B.
In this setting, parameter estimation is straightforward. We simply count how many heads and tails were produced by each coin and compute the fraction of heads. This corresponds to maximum likelihood estimation, where we choose parameter values that maximize the probability of the observed data .
For example, if coin A produces heads and tails, the maximum likelihood estimate is
An analogous expression holds for coin B.
The key point is that this procedure works because the data are complete. Each observation can be assigned unambiguously to one of the two coins.
The Hidden Case: Missing Assignments
Section titled “The Hidden Case: Missing Assignments”Now consider the more interesting case. Suppose that we observe exactly the same sequences of heads and tails, but we are no longer told which coin was used in each experiment.
The data now consist only of the outcomes, without any information about their origin. As a result, we can no longer separate the observations into two groups. The simple counting procedure used in maximum likelihood estimation is no longer applicable.
This situation closely mirrors the motif discovery problem. There, we observe sequences but do not know where the motif occurs. Here, we observe outcomes but do not know which coin generated them. In both cases, the missing information corresponds to hidden variables.
Evaluating Possible Explanations
Section titled “Evaluating Possible Explanations”To proceed, we introduce a probabilistic model. Suppose a sequence consists of tosses with heads. The probability of observing this outcome given a coin with parameter is described by the binomial distribution:
Using this expression, we can evaluate how likely a given sequence is under coin A and coin B, given current estimates of and .
However, rather than making a hard decision about which coin generated the sequence, we compare these likelihoods and convert them into probabilities. This gives us a soft assignment that reflects how strongly each coin is supported by the data.
The Expectation Step
Section titled “The Expectation Step”Using the current parameter estimates, we compute for each sequence the probability that it was generated by coin A or coin B. These probabilities form a distribution over the hidden variable (the coin identity).
Instead of assigning a sequence entirely to one coin, we allow it to contribute partially to both. For example, a sequence might contribute 70% of its counts to coin A and 30% to coin B.
From these probabilities, we compute the expected number of heads and tails for each coin. This transforms the observed data into a weighted representation that reflects our uncertainty.
This step corresponds to the Expectation step, or E-step.
The Maximization Step
Section titled “The Maximization Step”Once we have these expected counts, we update the parameter estimates. For each coin, we compute the fraction of expected heads among the total expected tosses:
This step is identical in form to maximum likelihood estimation, except that we now use expected counts instead of observed counts. It produces new parameter values that better explain the data under the current probabilistic assignments.
This is the Maximization step, or M-step.
Iterative Refinement
Section titled “Iterative Refinement”The E-step and M-step are applied repeatedly. Starting from an initial guess of the parameters, we alternate between estimating the hidden variables and updating the model.
At first, the assignments may be uncertain and the parameter estimates inaccurate. However, as the iterations proceed, the model becomes more consistent with the data. Sequences that are strongly associated with one coin contribute more to its parameter estimate, which in turn reinforces the assignment in subsequent iterations.
This iterative refinement continues until the parameter values stabilize and no longer change significantly.
Returning to Motif Discovery
Section titled “Returning to Motif Discovery”Although the coin toss example may appear artificial, it captures exactly the structure of the motif discovery problem.
- The unknown coin assignments correspond to unknown motif positions
- The coin parameters correspond to the motif model
- The sequences of tosses correspond to candidate subsequences
- The E-step corresponds to evaluating how likely each position is a motif
- The M-step corresponds to updating the motif model
In both cases, we are faced with the same challenge: the data are incomplete, and part of the generative process is hidden. The EM algorithm provides a systematic way to resolve this problem by iteratively refining both the hidden structure and the model parameters.
In the next section, we will formalize this procedure mathematically and generalize it beyond this specific example.
Self-Check Questions
Section titled “Self-Check Questions”- How does maximum likelihood estimation work when the coin assignments are known?
- Why does this approach fail when the assignments are unknown?
- What role does the binomial distribution play in this example?
- What is meant by a soft assignment in the E-step?
- How are parameter estimates updated in the M-step?
- How does the coin toss example relate to motif discovery?