Ferkans — Interactive Telecom Tutor

From Single Variables to Sums

The Chernoff bound applies to individual random variables. In practice we often want to bound the tail of a sum of independent random variables --- for instance, the deviation of a sample mean from the population mean. If each summand is bounded, the MGF of the sum factorizes and each factor can be bounded by a sub-Gaussian expression. This leads to Hoeffding's inequality, a workhorse of modern statistics and learning theory.

Definition:
Hoeffding's Lemma

If $X$ is a random variable with $\mathbb{E}[X] = 0$ and $a \leq X \leq b$ almost surely, then for all $t \in \mathbb{R}$ : $\mathbb{E}[e^{tX}] \leq \exp\!\left(\frac{t^2(b-a)^2}{8}\right).$ In other words, a bounded zero-mean random variable is sub-Gaussian with parameter $(b-a)/2$ .

The proof uses convexity of the exponential function to bound $e^{tx}$ by a linear interpolation between $e^{ta}$ and $e^{tb}$ , then optimizes the resulting expression.

,

Theorem: Hoeffding's Inequality

Let $X_1, \ldots, X_n$ be independent random variables with $a_i \leq X_i \leq b_i$ almost surely. Let $S_n = \sum_{i=1}^n X_i$ . Then for any $t > 0$ : $\mathbb{P}(S_n - \mathbb{E}[S_n] \geq t) \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right),$ $\mathbb{P}(|S_n - \mathbb{E}[S_n]| \geq t) \leq 2\exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right).$

Each bounded summand contributes at most $(b_i - a_i)^2/4$ to the variance of the sum. The exponential tail bound decays at a rate proportional to $t^2/n$ when the ranges are equal --- concentration sharpens as $n$ grows.

Proof

Center the variables

Let $Y_i = X_i - \mathbb{E}[X_i]$ , so $\mathbb{E}[Y_i] = 0$ and $Y_i \in [a_i - \mathbb{E}[X_i],\, b_i - \mathbb{E}[X_i]]$ with range $b_i - a_i$ . Then $S_n - \mathbb{E}[S_n] = \sum_{i=1}^n Y_i$ .

Apply Chernoff to the sum

For any $s > 0$ : $\mathbb{P}\!\left(\sum Y_i \geq t\right) \leq e^{-st}\, \mathbb{E}\!\left[e^{s\sum Y_i}\right] = e^{-st}\, \prod_{i=1}^n \mathbb{E}[e^{sY_i}]$ where independence gives the product.

Apply Hoeffding's Lemma

Each factor is bounded by Hoeffding's Lemma: $\mathbb{E}[e^{sY_i}] \leq \exp(s^2(b_i-a_i)^2/8)$ . Therefore: $\mathbb{P}\!\left(\sum Y_i \geq t\right) \leq \exp\!\left(-st + \frac{s^2}{8}\sum_{i=1}^n (b_i-a_i)^2\right).$

Optimize over $s$

The exponent is a quadratic in $s$ , minimized at $s^* = 4t / \sum(b_i-a_i)^2$ . Substituting: $\mathbb{P}\!\left(\sum Y_i \geq t\right) \leq \exp\!\left(-\frac{2t^2}{\sum_{i=1}^n (b_i-a_i)^2}\right).$ The two-sided bound follows by applying the same argument to $-S_n$ .

,

Example: Sample Size for an Opinion Poll

A polling company surveys $n$ voters, each responding yes (1) or no (0) independently with unknown probability $p$ . How large must $n$ be so that $\mathbb{P}(|\bar{X}_n - p| \geq 0.03) \leq 0.05$ ?

Solution

Apply Hoeffding

Each $X_i \in [0, 1]$ , so $b_i - a_i = 1$ . By Hoeffding: $\mathbb{P}(|\bar{X}_n - p| \geq \epsilon) \leq 2\exp\!\left(-2n\epsilon^2\right).$

Solve for $n$

We need $2e^{-2n(0.03)^2} \leq 0.05$ . Taking logs: $-2n(0.0009) \leq \ln(0.025) = -3.689$ , so $n \geq 3.689 / 0.0018 \approx 2050$ .

Interpretation

About 2050 voters suffice to estimate the proportion within $\pm 3\%$ with 95% confidence --- and this holds regardless of the true $p$ , because Hoeffding uses only the bounded range, not the variance.

⚠️Engineering Note

Channel Measurement Sample Size via Hoeffding

In wireless communications, we estimate the mean received SNR by averaging $n$ independent measurements $X_i \in [0, \text{SNR}_{\max}]$ . Hoeffding gives: $n \geq \frac{\text{SNR}_{\max}^2}{2\epsilon^2} \ln\!\left(\frac{2}{\delta}\right)$ to guarantee $\mathbb{P}(|\bar{X}_n - \mathbb{E}[X]| \geq \epsilon) \leq \delta$ . For $\text{SNR}_{\max} = 30\text{ dB}$ , $\epsilon = 1\text{ dB}$ , and $\delta = 0.01$ , this gives $n \geq 2389$ .

Hoeffding Concentration of Sample Mean

Observe how the Hoeffding bound on $\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon)$ tightens as the number of samples $n$ increases. Compare with the empirical tail probability from Monte Carlo simulation.

Parameters

n

(number of samples)100

\epsilon

(deviation)0.1

Distribution

Hoeffding vs. the CLT

The Central Limit Theorem gives the asymptotic distribution of $\sqrt{n}(\bar{X}_n - \mu)/\sigma \to \mathcal{N}(0,1)$ , but says nothing about finite- $n$ accuracy. Hoeffding's inequality gives a non-asymptotic, distribution-free bound that is valid for every $n$ . The price: Hoeffding ignores the variance and uses only the range, so it can be loose for low-variance distributions. The Berry-Esseen theorem bridges the gap by quantifying the CLT approximation error.

Quick Check

For i.i.d. $X_i \in [0, 1]$ , how does the Hoeffding bound on $\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon)$ scale with $n$ ?

$O(1/n)$

$O(e^{-cn})$ for some $c > 0$

$O(1/\\sqrt{n})$

$O(1/n^2)$

Correction:

O(e^{-cn})

for some

c > 0

Hoeffding gives $2e^{-2n\epsilon^2}$ , exponential in $n$ .

Sub-Gaussian random variable

A zero-mean RV $X$ with $\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2}$ for all $t$ . Implies Gaussian-type tail decay $\mathbb{P}(|X| \geq a) \leq 2e^{-a^2/(2\sigma^2)}$ .

Related: {{Ref:Gloss Concentration}}

Hoeffding's Inequality