In this lecture we revert back to the measure-theoretic point of view. Given a probability space $(X, \mathscr{A}, \mu)$ we let $\mathscr{P}$ denote the space of all (equivalence classes of) finite measurable partitions of $(X, \mathscr{A},\mu)$. Thus elements of $\mathscr{P}$ are (equivalence classes) of finite tuples
$$\xi = { C_1 ,\dots , C_p }$$
where each set $C_k \in \mathscr{A}$ is a measurable set, and
$$\mu(C_i \cap C_j) = 0 \qquad \text{if } i \ne j, \qquad \text{and} \qquad \mu \left(X \setminus \bigcup_{k=1}^p C_k \right) = 0.$$
One can think of a partition $\xi = { C_1 , \dots ,C_p}$ as representing an “experiment” on our probability space $(X, \mathscr{A}, \mu)$. The possible outcomes of this experiment are given by the sets $C_i$, and the probability of $C_i$ happening is given by $\mu(C_i)$.

We define the entropy $\mathsf{H}( \xi)$ of a partition $\xi= \{C_1, \dots, C_p \}$ via the formula
$$\mathsf{H}(\xi) := - \sum_{i=1}^p \mu(C_i) \log \mu(C_i).$$
The entropy can be thought of as measuring the “uncertainty” of the experiment $\xi$.

There is also an analogous quantity $\mathsf{H}(\xi|\eta)$ associated to two partitions $\xi$ and $\eta$ called the conditional entropy. This can be thought of as measuring the uncertainty about the outcome of $\xi$ under the assumption that we already know what happened when we did $\eta$.

The main result of today's lecture is the rather pretty fact that the formula

$$d_{\operatorname{R}}(\xi, \eta) := \mathsf{H} ( \xi | \eta) + \mathsf{H}(\eta | \xi).$$

defines a metric on the space $\mathscr{P}$. This is called the Rokhlin metric, after the Azerbaijani mathematician V. A. Rokhlin.