Chapter Summary

Chapter 10 Summary: Probability Inequalities

Key Points

1.
Markov's inequality ( $\mathbb{P}(X \geq a) \leq \mathbb{E}[X]/a$ ) is the simplest tail bound, using only the first moment of a non-negative RV. Every other inequality in this chapter is a corollary.
2.
Chebyshev's inequality ( $\mathbb{P}(|X-\mu| \geq \epsilon) \leq \sigma^2/\epsilon^2$ ) follows from applying Markov to $(X-\mu)^2$ . It uses two moments but is still loose for light-tailed distributions.
3.
The Chernoff bound ( $\mathbb{P}(X \geq a) \leq \inf_{t>0} e^{-ta} M_X(t)$ ) leverages the full MGF via the exponential tilt trick. It is exponentially tight and captures the correct decay rate for most practical distributions.
4.
Hoeffding's inequality provides exponential concentration for sums of bounded independent RVs. It is the workhorse for non-asymptotic sample complexity bounds in statistics and learning theory.
5.
Jensen's inequality ( $g(\mathbb{E}[X]) \leq \mathbb{E}[g(X)]$ for convex $g$ ) connects convexity with expectation. It implies $D(P\|Q) \geq 0$ in information theory and $\mathbb{E}[\log(1+\gamma)] \leq \log(1+\mathbb{E}[\gamma])$ for fading channels.
6.
The hierarchy of tightness: Markov (loosest, $O(1/a)$ ) $\to$ Chebyshev ( $O(1/a^2)$ ) $\to$ Chernoff/Hoeffding (exponential in $a$ or $n$ ). Choose the bound that matches the available information.

Looking Ahead

Chapter 11 builds on these inequalities to establish the different modes of convergence of random variables: almost sure, in probability, in $L^r$ , and in distribution. Chebyshev gives the simplest proof of the Weak Law of Large Numbers, while the Chernoff/Hoeffding bounds connect to the Strong Law and large deviations theory.

Jensen's Inequality Exercises