7.10 Assessing Tree Reliability

Learning Objectives

After reading this section, you should be able to:

explain why phylogenetic trees are inherently uncertain
understand the principle of bootstrap resampling
interpret bootstrap support values correctly
distinguish between model fit and structural confidence
critically assess the robustness of inferred trees

Why a Single Tree Is Not the Whole Story

In the previous sections, we developed several methods for constructing phylogenetic trees. Each method produces a tree that best fits the data according to a particular criterion, whether distance, parsimony, or likelihood.

At this point, it is tempting to treat the resulting tree as the answer.

However, this would be misleading.

Phylogenetic reconstruction is based on incomplete and noisy data. The observed sequences provide only a partial record of the underlying evolutionary process. As a result, different trees may explain the data almost equally well.

This raises a crucial question:

How confident can we be in the structure of a reconstructed tree?

Sources of Uncertainty

Uncertainty in phylogenetic inference arises at multiple levels.

The data themselves may be limited. Alignments may contain relatively few informative positions, especially when sequences are short or highly conserved.

In addition, evolutionary processes introduce ambiguity. Multiple substitutions, reversals, and convergent evolution can obscure the signal of shared ancestry.

Finally, the reconstruction methods introduce their own assumptions. Different models and algorithms may produce different trees from the same data.

Taken together, these factors mean that any single reconstructed tree should be interpreted with caution.

A Strategy for Assessing Robustness

To assess reliability, we need a way to determine how sensitive the inferred tree is to variations in the data.

A powerful idea for doing this is to generate slightly different versions of the dataset and observe how the resulting trees change.

If a particular grouping of taxa appears consistently across many such variations, we gain confidence that it reflects a genuine signal.

If it changes frequently, this suggests that the grouping is unstable.

Bootstrap Resampling

The most widely used method for implementing this idea is the bootstrap.

The procedure begins with a multiple sequence alignment of length ( n ). A new alignment is then generated by sampling ( n ) columns from the original alignment with replacement. This means that some columns may appear multiple times, while others may be omitted.

From this resampled alignment, a phylogenetic tree is reconstructed using the same method as before.

This process is repeated many times, typically hundreds or thousands of times, producing a collection of trees.

From Replicates to Support Values

Once a set of bootstrap trees has been generated, we examine how often particular groupings appear.

For each internal branch in the original tree, we determine the proportion of bootstrap trees in which the same grouping is present.

This proportion is called the bootstrap support value.

For example, if a clade appears in 950 out of 1000 bootstrap trees, its support value is 95%.

These values are typically displayed on the branches of the tree, providing a visual indication of confidence.

Box 7.8 — What Bootstrap Values Mean

A bootstrap value reflects the stability of a grouping under resampling of the data.

A high value indicates that the grouping is consistently supported by the data. A low value indicates that small changes in the data often lead to different groupings.

It is important to note what bootstrap values do not represent. They are not the probability that a clade is correct. Instead, they measure how strongly the available data support that clade under the chosen method.

Interpreting Support Values

Bootstrap values provide a practical way to assess which parts of a tree are reliable.

Branches with high support are considered stable and are often interpreted with greater confidence. Branches with low support should be treated cautiously, as they may not reflect robust relationships.

In practice, thresholds are sometimes used as guidelines. For example, support values above 70% are often considered reasonably reliable, while lower values indicate uncertainty.

However, such thresholds should not be interpreted rigidly. The significance of a support value depends on the context, including the data and the method used.

Model Fit vs. Structural Confidence

It is useful to distinguish between two different aspects of phylogenetic inference.

The first is how well a tree fits the data according to a chosen criterion, such as likelihood or least squares. This measures the overall agreement between the model and the data.

The second is how stable the structure of the tree is under variations in the data. This is what bootstrap analysis addresses.

A tree may have a high likelihood but still contain branches with low bootstrap support. Conversely, some parts of the tree may be highly stable even if the overall model is imperfect.

Understanding this distinction helps avoid overinterpreting phylogenetic results.

Conceptual Interpretation

Bootstrap analysis reinforces a central theme of this chapter.

Phylogenetic trees are models inferred from data, and like all models, they are subject to uncertainty. Rather than seeking a single definitive answer, we aim to understand which aspects of the model are strongly supported and which are not.

This perspective encourages a more nuanced interpretation of phylogenetic trees.

Conceptual Summary

Phylogenetic reconstruction produces models that explain observed data, but these models are not unique. Bootstrap resampling provides a way to assess how robust different parts of a tree are to variations in the data.

By distinguishing between stable and unstable groupings, we can interpret phylogenetic trees more critically and more effectively.

Self-Check Questions

Why is it not sufficient to report a single phylogenetic tree?
What are the main sources of uncertainty in phylogenetic reconstruction?
How does bootstrap resampling generate new datasets?
What does a bootstrap support value represent?
Why does a high bootstrap value not guarantee correctness?
How does bootstrap analysis complement measures of model fit?