Hoeffding's Inequality
From Single Variables to Sums
The Chernoff bound applies to individual random variables. In practice we often want to bound the tail of a sum of independent random variables --- for instance, the deviation of a sample mean from the population mean. If each summand is bounded, the MGF of the sum factorizes and each factor can be bounded by a sub-Gaussian expression. This leads to Hoeffding's inequality, a workhorse of modern statistics and learning theory.
Definition: Hoeffding's Lemma
Hoeffding's Lemma
If is a random variable with and almost surely, then for all : In other words, a bounded zero-mean random variable is sub-Gaussian with parameter .
The proof uses convexity of the exponential function to bound by a linear interpolation between and , then optimizes the resulting expression.
Theorem: Hoeffding's Inequality
Let be independent random variables with almost surely. Let . Then for any :
Each bounded summand contributes at most to the variance of the sum. The exponential tail bound decays at a rate proportional to when the ranges are equal --- concentration sharpens as grows.
Center the variables
Let , so and with range . Then .
Apply Chernoff to the sum
For any : where independence gives the product.
Apply Hoeffding's Lemma
Each factor is bounded by Hoeffding's Lemma: . Therefore:
Optimize over $s$
The exponent is a quadratic in , minimized at . Substituting: The two-sided bound follows by applying the same argument to .
Example: Sample Size for an Opinion Poll
A polling company surveys voters, each responding yes (1) or no (0) independently with unknown probability . How large must be so that ?
Apply Hoeffding
Each , so . By Hoeffding:
Solve for $n$
We need . Taking logs: , so .
Interpretation
About 2050 voters suffice to estimate the proportion within with 95% confidence --- and this holds regardless of the true , because Hoeffding uses only the bounded range, not the variance.
Channel Measurement Sample Size via Hoeffding
In wireless communications, we estimate the mean received SNR by averaging independent measurements . Hoeffding gives: to guarantee . For , , and , this gives .
Hoeffding Concentration of Sample Mean
Observe how the Hoeffding bound on tightens as the number of samples increases. Compare with the empirical tail probability from Monte Carlo simulation.
Parameters
Hoeffding vs. the CLT
The Central Limit Theorem gives the asymptotic distribution of , but says nothing about finite- accuracy. Hoeffding's inequality gives a non-asymptotic, distribution-free bound that is valid for every . The price: Hoeffding ignores the variance and uses only the range, so it can be loose for low-variance distributions. The Berry-Esseen theorem bridges the gap by quantifying the CLT approximation error.
Quick Check
For i.i.d. , how does the Hoeffding bound on scale with ?
for some
Hoeffding gives , exponential in .
Sub-Gaussian random variable
A zero-mean RV with for all . Implies Gaussian-type tail decay .
Related: {{Ref:Gloss Concentration}}