Chapter Summary

Key Points

1.
The conditional expectation $\mathbb{E}[X|Y]$ is a random variable — a function of $Y$ , not a number. Its key properties are linearity, the tower property ( $\mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[X]$ ), pulling out what is known, and invariance under independence.
2.
The MMSE estimator of $X$ given $Y$ is the conditional expectation: $\hat{X}_{\text{MMSE}} = \mathbb{E}[X|Y]$ . It minimizes the mean square error over all measurable functions of $Y$ .
3.
The orthogonality principle states that the MMSE estimation error $X - \mathbb{E}[X|Y]$ is orthogonal to every function of $Y$ . Geometrically, $\mathbb{E}[X|Y]$ is the projection of $X$ onto the subspace of functions of $Y$ .
4.
The LMMSE estimator restricts to affine functions and requires only first and second moments: $\hat{X}_{\text{LMMSE}} = \mu_X + \mathbf{C}_{XY}\mathbf{C}_{YY}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_Y)$ . For jointly Gaussian data, LMMSE equals MMSE.
5.
The law of total variance decomposes $\text{Var}(X) = \mathbb{E}[\text{Var}(X|Y)] + \text{Var}(\mathbb{E}[X|Y])$ into unexplained and explained components. The MMSE equals the average conditional variance $\mathbb{E}[\text{Var}(X|Y)]$ .

Looking Ahead

Chapter 13 introduces stochastic processes — random functions of time. The conditional expectation and LMMSE tools from this chapter become the foundation for Wiener filtering (optimal linear prediction of stationary processes) and Kalman filtering (recursive state estimation for dynamical systems), treated in the FSI book.

Conditional Variance and the Law of Total Variance Exercises