In this lecture we revert back to the measure-theoretic point of view. Given a probability space $(X, \mathscr{A}, \mu)$ we let $\mathscr{P}$ denote the space of all (equivalence classes of) finite measurable partitions of $(X, \mathscr{A},\mu)$. Thus elements of $\mathscr{P}$ are (equivalence classes) of finite tuples
$$
\xi = { C_1 ,\dots , C_p }
$$
where each set $C_k \in \mathscr{A}$ is a measurable set, and
$$
\mu(C_i \cap C_j) = 0 \qquad \text{if } i \ne j, \qquad \text{and} \qquad \mu \left(X \setminus \bigcup_{k=1}^p C_k \right) = 0.
$$
One can think of a partition $ \xi = { C_1 , \dots ,C_p}$ as representing an “experiment” on our probability space $(X, \mathscr{A}, \mu)$. The possible outcomes of this experiment are given by the sets $C_i$, and the probability of $C_i$ happening is given by $ \mu(C_i)$.

We define the entropy $ \mathsf{H}( \xi)$ of a partition $ \xi= \{C_1, \dots, C_p \}$ via the formula
$$
\mathsf{H}(\xi) :=  - \sum_{i=1}^p \mu(C_i) \log \mu(C_i).
$$
The entropy can be thought of as measuring the “uncertainty” of the experiment $ \xi$.

There is also an analogous quantity $ \mathsf{H}(\xi|\eta)$ associated to two partitions $ \xi$ and $ \eta$ called the conditional entropy. This can be thought of as measuring the uncertainty about the outcome of $ \xi$ under the assumption that we already know what happened when we did $ \eta$.

The main result of today's lecture is the rather pretty fact that the formula

$$
d_{\operatorname{R}}(\xi, \eta)  := \mathsf{H} ( \xi | \eta) + \mathsf{H}(\eta | \xi).
$$

defines a metric on the space $\mathscr{P}$. This is called the Rokhlin metric, after the Azerbaijani mathematician V. A. Rokhlin.



Comments and questions?