Chapter Summary
Chapter Summary
Key Points
- 1.
Cramér's theorem: for i.i.d. , the tail probability decays exponentially as , where the rate function is the Legendre-Fenchel transform of the cumulant generating function .
- 2.
Sanov's theorem: the probability that the empirical distribution falls in a set decays as , where is the I-projection (the distribution in closest to in KL divergence). KL divergence is the universal rate function for empirical distributions.
- 3.
Sub-Gaussian random variables have MGF-bounded tails: . Bounded random variables are sub-Gaussian by Hoeffding's lemma. Hoeffding's and Bernstein's inequalities are the main concentration tools for sums of independent bounded variables.
- 4.
The matrix Bernstein inequality extends scalar concentration to random matrices, with only a factor- (dimension) price — a tool essential for covariance estimation in massive MIMO and compressed sensing.
- 5.
Stein's lemma: in hypothesis testing with constrained Type I error, the optimal Type II error exponent is . The Chernoff information governs the symmetric Bayesian exponent. These results connect large deviations to detection theory and channel coding error exponents.
Looking Ahead
Chapter 21 applies the probabilistic tools developed throughout this book to random matrices: the Marchenko-Pastur law describes the limiting eigenvalue distribution of large random matrices, and deterministic equivalents enable computation of ergodic MIMO capacity without Monte Carlo simulation. The concentration inequalities from this chapter will be essential for controlling the fluctuations of empirical spectral distributions around their deterministic limits.