Hoeffding's Inequality

From Single Variables to Sums

The Chernoff bound applies to individual random variables. In practice we often want to bound the tail of a sum of independent random variables --- for instance, the deviation of a sample mean from the population mean. If each summand is bounded, the MGF of the sum factorizes and each factor can be bounded by a sub-Gaussian expression. This leads to Hoeffding's inequality, a workhorse of modern statistics and learning theory.

Definition:

Hoeffding's Lemma

If XX is a random variable with E[X]=0\mathbb{E}[X] = 0 and a≀X≀ba \leq X \leq b almost surely, then for all t∈Rt \in \mathbb{R}: E[etX]≀exp⁑ ⁣(t2(bβˆ’a)28).\mathbb{E}[e^{tX}] \leq \exp\!\left(\frac{t^2(b-a)^2}{8}\right). In other words, a bounded zero-mean random variable is sub-Gaussian with parameter (bβˆ’a)/2(b-a)/2.

The proof uses convexity of the exponential function to bound etxe^{tx} by a linear interpolation between etae^{ta} and etbe^{tb}, then optimizes the resulting expression.

,

Theorem: Hoeffding's Inequality

Let X1,…,XnX_1, \ldots, X_n be independent random variables with ai≀Xi≀bia_i \leq X_i \leq b_i almost surely. Let Sn=βˆ‘i=1nXiS_n = \sum_{i=1}^n X_i. Then for any t>0t > 0: P(Snβˆ’E[Sn]β‰₯t)≀exp⁑ ⁣(βˆ’2t2βˆ‘i=1n(biβˆ’ai)2),\mathbb{P}(S_n - \mathbb{E}[S_n] \geq t) \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right), P(∣Snβˆ’E[Sn]∣β‰₯t)≀2exp⁑ ⁣(βˆ’2t2βˆ‘i=1n(biβˆ’ai)2).\mathbb{P}(|S_n - \mathbb{E}[S_n]| \geq t) \leq 2\exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right).

Each bounded summand contributes at most (biβˆ’ai)2/4(b_i - a_i)^2/4 to the variance of the sum. The exponential tail bound decays at a rate proportional to t2/nt^2/n when the ranges are equal --- concentration sharpens as nn grows.

,

Example: Sample Size for an Opinion Poll

A polling company surveys nn voters, each responding yes (1) or no (0) independently with unknown probability pp. How large must nn be so that P(∣XΛ‰nβˆ’p∣β‰₯0.03)≀0.05\mathbb{P}(|\bar{X}_n - p| \geq 0.03) \leq 0.05?

⚠️Engineering Note

Channel Measurement Sample Size via Hoeffding

In wireless communications, we estimate the mean received SNR by averaging nn independent measurements Xi∈[0,SNRmax⁑]X_i \in [0, \text{SNR}_{\max}]. Hoeffding gives: nβ‰₯SNRmax⁑22Ο΅2ln⁑ ⁣(2Ξ΄)n \geq \frac{\text{SNR}_{\max}^2}{2\epsilon^2} \ln\!\left(\frac{2}{\delta}\right) to guarantee P(∣XΛ‰nβˆ’E[X]∣β‰₯Ο΅)≀δ\mathbb{P}(|\bar{X}_n - \mathbb{E}[X]| \geq \epsilon) \leq \delta. For SNRmax⁑=30Β dB\text{SNR}_{\max} = 30\text{ dB}, Ο΅=1Β dB\epsilon = 1\text{ dB}, and Ξ΄=0.01\delta = 0.01, this gives nβ‰₯2389n \geq 2389.

Hoeffding Concentration of Sample Mean

Observe how the Hoeffding bound on P(∣XΛ‰nβˆ’ΞΌβˆ£β‰₯Ο΅)\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon) tightens as the number of samples nn increases. Compare with the empirical tail probability from Monte Carlo simulation.

Parameters
100
0.1

Hoeffding vs. the CLT

The Central Limit Theorem gives the asymptotic distribution of n(XΛ‰nβˆ’ΞΌ)/Οƒβ†’N(0,1)\sqrt{n}(\bar{X}_n - \mu)/\sigma \to \mathcal{N}(0,1), but says nothing about finite-nn accuracy. Hoeffding's inequality gives a non-asymptotic, distribution-free bound that is valid for every nn. The price: Hoeffding ignores the variance and uses only the range, so it can be loose for low-variance distributions. The Berry-Esseen theorem bridges the gap by quantifying the CLT approximation error.

Quick Check

For i.i.d. Xi∈[0,1]X_i \in [0, 1], how does the Hoeffding bound on P(∣XΛ‰nβˆ’ΞΌβˆ£β‰₯Ο΅)\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon) scale with nn?

O(1/n)O(1/n)

O(eβˆ’cn)O(e^{-cn}) for some c>0c > 0

O(1/sqrtn)O(1/\\sqrt{n})

O(1/n2)O(1/n^2)

Sub-Gaussian random variable

A zero-mean RV XX with E[etX]≀eΟƒ2t2/2\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2} for all tt. Implies Gaussian-type tail decay P(∣X∣β‰₯a)≀2eβˆ’a2/(2Οƒ2)\mathbb{P}(|X| \geq a) \leq 2e^{-a^2/(2\sigma^2)}.

Related: {{Ref:Gloss Concentration}}