Skip to content

5.7 A Simple Worked Example: The Coin Toss Problem

5.7 A Simple Worked Example: The Coin Toss Problem

Section titled “5.7 A Simple Worked Example: The Coin Toss Problem”

After reading this section, you should be able to:

  • apply maximum likelihood estimation in a fully observed setting
  • understand why missing data prevents direct estimation
  • explain the coin toss problem as an example of hidden variables
  • perform one iteration of the EM algorithm conceptually
  • relate the coin toss example to motif discovery

To understand how the EM idea operates in practice, it is helpful to examine a simple example in which all calculations can be followed explicitly. Although this example is not biological, it captures the essential structure of the motif discovery problem.

Imagine that we are given two coins, which we will call coin A and coin B. Each coin has its own probability of producing heads, denoted by θA\theta_A and θB\theta_B. These probabilities are unknown, and our goal is to estimate them.

We perform a series of experiments. In each experiment, one of the two coins is selected, and then tossed multiple times. The outcome of each experiment is a sequence of heads and tails. After repeating this process several times, we obtain a collection of such sequences.

At first glance, this may seem like a simple estimation problem. However, the key question is: do we know which coin was used in each experiment?


The Fully Observed Case: Maximum Likelihood Estimation

Section titled “The Fully Observed Case: Maximum Likelihood Estimation”

Let us first consider the simpler case in which this information is available. Suppose that for each sequence of coin tosses, we know whether it was generated by coin A or coin B.

In this setting, parameter estimation is straightforward. We simply count how many heads and tails were produced by each coin and compute the fraction of heads. This corresponds to maximum likelihood estimation, where we choose parameter values that maximize the probability of the observed data .

For example, if coin A produces hAh_A heads and tAt_A tails, the maximum likelihood estimate is

θ^A=hAhA+tA.\hat{\theta}_A = \frac{h_A}{h_A + t_A}.

An analogous expression holds for coin B.

The key point is that this procedure works because the data are complete. Each observation can be assigned unambiguously to one of the two coins.


Now consider the more interesting case. Suppose that we observe exactly the same sequences of heads and tails, but we are no longer told which coin was used in each experiment.

The data now consist only of the outcomes, without any information about their origin. As a result, we can no longer separate the observations into two groups. The simple counting procedure used in maximum likelihood estimation is no longer applicable.

This situation closely mirrors the motif discovery problem. There, we observe sequences but do not know where the motif occurs. Here, we observe outcomes but do not know which coin generated them. In both cases, the missing information corresponds to hidden variables.


To proceed, we introduce a probabilistic model. Suppose a sequence consists of nn tosses with kk heads. The probability of observing this outcome given a coin with parameter θ\theta is described by the binomial distribution:

P(kθ)=(nk)θk(1θ)nk.P(k \mid \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n-k}.

Using this expression, we can evaluate how likely a given sequence is under coin A and coin B, given current estimates of θA\theta_A and θB\theta_B .

However, rather than making a hard decision about which coin generated the sequence, we compare these likelihoods and convert them into probabilities. This gives us a soft assignment that reflects how strongly each coin is supported by the data.


Using the current parameter estimates, we compute for each sequence the probability that it was generated by coin A or coin B. These probabilities form a distribution over the hidden variable (the coin identity).

Instead of assigning a sequence entirely to one coin, we allow it to contribute partially to both. For example, a sequence might contribute 70% of its counts to coin A and 30% to coin B.

From these probabilities, we compute the expected number of heads and tails for each coin. This transforms the observed data into a weighted representation that reflects our uncertainty.

This step corresponds to the Expectation step, or E-step.


Once we have these expected counts, we update the parameter estimates. For each coin, we compute the fraction of expected heads among the total expected tosses:

θ^Anew=expected heads for Aexpected total tosses for A.\hat{\theta}_A^{\text{new}} = \frac{\text{expected heads for A}}{\text{expected total tosses for A}}.

This step is identical in form to maximum likelihood estimation, except that we now use expected counts instead of observed counts. It produces new parameter values that better explain the data under the current probabilistic assignments.

This is the Maximization step, or M-step.


The E-step and M-step are applied repeatedly. Starting from an initial guess of the parameters, we alternate between estimating the hidden variables and updating the model.

At first, the assignments may be uncertain and the parameter estimates inaccurate. However, as the iterations proceed, the model becomes more consistent with the data. Sequences that are strongly associated with one coin contribute more to its parameter estimate, which in turn reinforces the assignment in subsequent iterations.

This iterative refinement continues until the parameter values stabilize and no longer change significantly.


Although the coin toss example may appear artificial, it captures exactly the structure of the motif discovery problem.

  • The unknown coin assignments correspond to unknown motif positions
  • The coin parameters correspond to the motif model
  • The sequences of tosses correspond to candidate subsequences
  • The E-step corresponds to evaluating how likely each position is a motif
  • The M-step corresponds to updating the motif model

In both cases, we are faced with the same challenge: the data are incomplete, and part of the generative process is hidden. The EM algorithm provides a systematic way to resolve this problem by iteratively refining both the hidden structure and the model parameters.

In the next section, we will formalize this procedure mathematically and generalize it beyond this specific example.


  1. How does maximum likelihood estimation work when the coin assignments are known?
  2. Why does this approach fail when the assignments are unknown?
  3. What role does the binomial distribution play in this example?
  4. What is meant by a soft assignment in the E-step?
  5. How are parameter estimates updated in the M-step?
  6. How does the coin toss example relate to motif discovery?