Chapter Summary

Chapter Summary

Key Points

  • 1.

    Cramér's theorem: for i.i.d. X1,,XnX_1, \ldots, X_n, the tail probability P(Xˉna)\mathbb{P}(\bar{X}_n \geq a) decays exponentially as enI(a)e^{-nI(a)}, where the rate function I(a)=supθ{θaΛ(θ)}I(a) = \sup_\theta\{\theta a - \Lambda(\theta)\} is the Legendre-Fenchel transform of the cumulant generating function Λ(θ)=logE[eθX]\Lambda(\theta) = \log \mathbb{E}[e^{\theta X}].

  • 2.

    Sanov's theorem: the probability that the empirical distribution falls in a set E\mathcal{E} decays as enD(QP)e^{-nD(Q^* \| P)}, where QQ^* is the I-projection (the distribution in E\mathcal{E} closest to PP in KL divergence). KL divergence is the universal rate function for empirical distributions.

  • 3.

    Sub-Gaussian random variables have MGF-bounded tails: E[etX]eσ2t2/2\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2}. Bounded random variables are sub-Gaussian by Hoeffding's lemma. Hoeffding's and Bernstein's inequalities are the main concentration tools for sums of independent bounded variables.

  • 4.

    The matrix Bernstein inequality extends scalar concentration to random matrices, with only a factor-dd (dimension) price — a tool essential for covariance estimation in massive MIMO and compressed sensing.

  • 5.

    Stein's lemma: in hypothesis testing with constrained Type I error, the optimal Type II error exponent is D(P0P1)D(P_0 \| P_1). The Chernoff information governs the symmetric Bayesian exponent. These results connect large deviations to detection theory and channel coding error exponents.

Looking Ahead

Chapter 21 applies the probabilistic tools developed throughout this book to random matrices: the Marchenko-Pastur law describes the limiting eigenvalue distribution of large random matrices, and deterministic equivalents enable computation of ergodic MIMO capacity without Monte Carlo simulation. The concentration inequalities from this chapter will be essential for controlling the fluctuations of empirical spectral distributions around their deterministic limits.