Chapter Summary

Key Points

1.
Cramér's theorem: for i.i.d. $X_1, \ldots, X_n$ , the tail probability $\mathbb{P}(\bar{X}_n \geq a)$ decays exponentially as $e^{-nI(a)}$ , where the rate function $I(a) = \sup_\theta\{\theta a - \Lambda(\theta)\}$ is the Legendre-Fenchel transform of the cumulant generating function $\Lambda(\theta) = \log \mathbb{E}[e^{\theta X}]$ .
2.
Sanov's theorem: the probability that the empirical distribution falls in a set $\mathcal{E}$ decays as $e^{-nD(Q^* \| P)}$ , where $Q^*$ is the I-projection (the distribution in $\mathcal{E}$ closest to $P$ in KL divergence). KL divergence is the universal rate function for empirical distributions.
3.
Sub-Gaussian random variables have MGF-bounded tails: $\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2}$ . Bounded random variables are sub-Gaussian by Hoeffding's lemma. Hoeffding's and Bernstein's inequalities are the main concentration tools for sums of independent bounded variables.
4.
The matrix Bernstein inequality extends scalar concentration to random matrices, with only a factor- $d$ (dimension) price — a tool essential for covariance estimation in massive MIMO and compressed sensing.
5.
Stein's lemma: in hypothesis testing with constrained Type I error, the optimal Type II error exponent is $D(P_0 \| P_1)$ . The Chernoff information governs the symmetric Bayesian exponent. These results connect large deviations to detection theory and channel coding error exponents.

Looking Ahead

Chapter 21 applies the probabilistic tools developed throughout this book to random matrices: the Marchenko-Pastur law describes the limiting eigenvalue distribution of large random matrices, and deterministic equivalents enable computation of ergodic MIMO capacity without Monte Carlo simulation. The concentration inequalities from this chapter will be essential for controlling the fluctuations of empirical spectral distributions around their deterministic limits.

Applications to Error Probability Analysis Exercises