Ferkans — Interactive Telecom Tutor

A Hierarchy of Tail Behavior

The Chernoff bound from Chapter 10 gives exponential tail bounds whenever the MGF exists. But for many applications — finite blocklength information theory, compressed sensing, machine learning generalization bounds — we need a systematic way to classify random variables by their tail behavior. Sub-Gaussian random variables form the "well-behaved" class: their tails decay at least as fast as a Gaussian's, even though they may not be Gaussian at all. Bounded random variables, Rademacher variables, and any random variable with a light enough tail fall into this class — and all of them inherit the same clean concentration inequalities.

Definition:
Sub-Gaussian Random Variable

A zero-mean random variable $X$ is sub-Gaussian with parameter $\sigma_{\mathrm{sg}}$ (or $\sigma_{\mathrm{sg}}$ -sub-Gaussian) if its moment generating function satisfies $\mathbb{E}[e^{tX}] \leq \exp\!\left(\frac{\sigma_{\mathrm{sg}}^2 t^2}{2}\right) \quad \text{for all } t \in \mathbb{R}.$ The parameter $\sigma_{\mathrm{sg}}^2$ is called the sub-Gaussian variance proxy (or sub-Gaussian parameter). A general (non-zero-mean) random variable $X$ is $\sigma_{\mathrm{sg}}$ -sub-Gaussian if $X - \mathbb{E}[X]$ is.

The condition says the MGF of $X$ is dominated by the MGF of a $\mathcal{N}(0, \sigma_{\mathrm{sg}}^2)$ random variable. This does not mean $X$ is Gaussian — it means its tails are at most as heavy as a Gaussian's. The sub-Gaussian parameter $\sigma_{\mathrm{sg}}^2$ may be larger than $\text{Var}(X)$ .

Definition:
Sub-Gaussian Norm

The sub-Gaussian norm (or $\psi_2$ -norm) of a random variable $X$ is $\|X\|_{\psi_2} = \inf\!\left\{t > 0 : \mathbb{E}\!\left[\exp\!\left(\frac{X^2}{t^2}\right)\right] \leq 2\right\}.$ A random variable is sub-Gaussian if and only if $\|X\|_{\psi_2} < \infty$ . The sub-Gaussian parameter satisfies $\sigma_{\mathrm{sg}} \leq C\|X\|_{\psi_2}$ for a universal constant $C$ .

Theorem: Hoeffding's Lemma

Let $X$ be a random variable with $\mathbb{E}[X] = 0$ and $a \leq X \leq b$ almost surely. Then $X$ is sub-Gaussian with parameter $\sigma_{\mathrm{sg}} = (b-a)/2$ : $\mathbb{E}[e^{tX}] \leq \exp\!\left(\frac{t^2(b-a)^2}{8}\right) \quad \text{for all } t \in \mathbb{R}.$

A bounded random variable cannot have heavier tails than a Gaussian — the boundedness constrains the MGF to grow at most quadratically in the exponent. The factor of 8 (not 2) reflects that $(b-a)/2$ is a conservative proxy for the standard deviation.

Show Hint

Use the convexity of the exponential: $e^{tx} \leq \frac{b-x}{b-a}e^{ta} + \frac{x-a}{b-a}e^{tb}$ for $x \in [a,b]$ .

Take expectations and use $\mathbb{E}[X] = 0$ to simplify.

Apply the elementary inequality $-pu + \log(1 - p + pe^u) \leq u^2/8$ for $p \in [0,1]$ .

Proof

Convexity bound

Since $e^{tx}$ is convex in $x$ and $x \in [a, b]$ : $e^{tx} \leq \frac{b - x}{b - a}e^{ta} + \frac{x - a}{b - a}e^{tb}.$

Take expectation

$\mathbb{E}[e^{tX}] \leq \frac{b}{b-a}e^{ta} - \frac{a}{b-a}e^{tb}$ $(using$ \mathbb{E}[X] = 0 $, so$ \mathbb{E}[X-a] = -a $and$ \mathbb{E}[b-X] = b$).

Reparametrize

Let $p = -a/(b-a)$ , $u = t(b-a)$ . Then $\mathbb{E}[e^{tX}] \leq e^{-pu}\cdot(1 - p + pe^u) \triangleq e^{g(u)}$ . One shows $g(u) \leq u^2/8$ by verifying $g(0) = g'(0) = 0$ and $g''(u) \leq 1/4$ . Therefore $\mathbb{E}[e^{tX}] \leq e^{t^2(b-a)^2/8}$ . $\blacksquare$

Theorem: Hoeffding's Inequality

Let $X_1, \ldots, X_n$ be independent (not necessarily identically distributed) random variables with $a_i \leq X_i \leq b_i$ . Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ . Then for any $t > 0$ : $\mathbb{P}\!\left(\bar{X}_n - \mathbb{E}[\bar{X}_n] \geq t\right) \leq \exp\!\left(-\frac{2n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right).$ For identically bounded variables ( $a_i = a$ , $b_i = b$ for all $i$ ): $\mathbb{P}\!\left(|\bar{X}_n - \mu| \geq t\right) \leq 2\exp\!\left(-\frac{2nt^2}{(b-a)^2}\right).$

The sum of independent sub-Gaussian variables is sub-Gaussian, and the sub-Gaussian parameters add. Hoeffding's inequality is the Chernoff bound applied to the sub-Gaussian MGF bound.

Proof

Center and apply Chernoff

Let $Y_i = X_i - \mathbb{E}[X_i]$ . For $s > 0$ : $\mathbb{P}\!\left(\sum_{i=1}^n Y_i \geq nt\right) \leq e^{-snt}\prod_{i=1}^n \mathbb{E}[e^{sY_i}].$

Apply Hoeffding's lemma

Each $Y_i \in [a_i - \mathbb{E}[X_i], b_i - \mathbb{E}[X_i]]$ , so by Hoeffding's lemma: $\mathbb{E}[e^{sY_i}] \leq \exp\!\left(\frac{s^2(b_i - a_i)^2}{8}\right).$

Optimize over $s$

$\mathbb{P}\!\left(\bar{X}_n - \mu \geq t\right) \leq \exp\!\left(-snt + \frac{s^2}{8}\sum_i (b_i - a_i)^2\right).$ $Minimizing over$ s $:$ s^* = 4nt / \sum_i (b_i - a_i)^2 $, yielding the result.$ \blacksquare$

Example: Polling Accuracy via Hoeffding's Inequality

A poll samples $n = 1000$ voters, each independently supporting candidate A with unknown probability $p$ . The empirical proportion is $\hat{p} = \bar{X}_n$ . How large must $n$ be to guarantee $\mathbb{P}(|\hat{p} - p| \geq 0.03) \leq 0.05$ ?

Solution

Apply Hoeffding

Each $X_i \in [0, 1]$ , so $(b-a)^2 = 1$ . By Hoeffding's inequality: $\mathbb{P}(|\hat{p} - p| \geq 0.03) \leq 2\exp(-2n \cdot 0.03^2) = 2e^{-0.0018n}.$

Solve for $n$

Set $2e^{-0.0018n} \leq 0.05$ : $n \geq \frac{\ln(2/0.05)}{0.0018} = \frac{\ln 40}{0.0018} \approx \frac{3.689}{0.0018} \approx 2050.$ About 2050 samples suffice — a distribution-free guarantee.

Comparison of Tail Bounds

Bound	Requirement	Tail Decay	Tightness
Markov	Non-negative RV	$O(1/t)$	Very loose
Chebyshev	Finite variance	$O(1/t^2)$	Loose
Chernoff (general)	Finite MGF in interval	$e^{-c t}$ for some $c$	Tight rate, loose constant
Hoeffding	Bounded RVs	$e^{-2nt^2/(b-a)^2}$	Good for bounded, loose for small-variance
Sub-Gaussian	sub-Gaussian	$e^{-t^2/(2\sigma_{\mathrm{sg}}^2)}$	Tight for Gaussian-like tails
Bernstein	Bounded + variance info	$e^{-cnt^2/(\sigma^2 + bt)}$	Best of Hoeffding and Chebyshev

Comparing Tail Bounds: Markov, Chebyshev, Hoeffding, Chernoff

Compare the quality of different tail bounds for $\mathbb{P}(\bar{X}_n \geq a)$ where $X_i$ are i.i.d. bounded random variables. See how the Hoeffding and Chernoff bounds dramatically improve on Markov and Chebyshev.

Parameters

Distribution

n

50

a

(threshold for

\bar{X}_n

)0.7

Definition:
Sub-Exponential Random Variable

A zero-mean random variable $X$ is sub-exponential with parameters $(\nu^2, \alpha)$ if $\mathbb{E}[e^{tX}] \leq \exp\!\left(\frac{\nu^2 t^2}{2}\right) \quad \text{for all } |t| < \frac{1}{\alpha}.$ Equivalently, $X$ is sub-exponential if $\|X\|_{\psi_1} < \infty$ , where $\|X\|_{\psi_1} = \inf\!\left\{t > 0 : \mathbb{E}\!\left[\exp\!\left(\frac{|X|}{t}\right)\right] \leq 2\right\}.$

Sub-exponential is a weaker condition than sub-Gaussian. The key distinction: a sub-Gaussian variable has MGF bounded by a Gaussian MGF for all $t$ , while a sub-exponential variable only needs this for $|t|$ small enough. Products and squares of sub-Gaussian variables are sub-exponential but typically not sub-Gaussian.

Theorem: Bernstein's Inequality

Let $X_1, \ldots, X_n$ be independent zero-mean random variables with $|X_i| \leq K$ almost surely. Then for any $t > 0$ : $\mathbb{P}\!\left(\frac{1}{n}\sum_{i=1}^n X_i \geq t\right) \leq \exp\!\left(-\frac{nt^2/2}{\sigma^2 + Kt/3}\right),$ where $\sigma^2 = \frac{1}{n}\sum_{i=1}^n \mathbb{E}[X_i^2]$ .

Bernstein's inequality interpolates between Hoeffding (for large $t$ , where the linear term $Kt/3$ dominates) and a variance-sensitive Gaussian bound (for small $t$ , where $\sigma^2$ dominates). When $\sigma^2 \ll K^2$ , Bernstein is much tighter than Hoeffding.

Proof

MGF bound for bounded variables

For $|X_i| \leq K$ and $\mathbb{E}[X_i] = 0$ : $\mathbb{E}[e^{sX_i}] \leq 1 + \frac{s^2\mathbb{E}[X_i^2]}{2}\sum_{k=0}^{\infty}\frac{(s K)^k}{(k+2)!/(2!)} \leq \exp\!\left(\frac{s^2\mathbb{E}[X_i^2]}{2(1 - sK/3)}\right)$ for $s < 3/K$ .

Apply Chernoff bound

$\mathbb{P}\!\left(\sum_i X_i \geq nt\right) \leq \exp\!\left(-snt + \frac{s^2 n\sigma^2}{2(1-sK/3)}\right).$ $Optimizing over$ s $yields the result.$ \blacksquare$

Definition:
Matrix-Valued Random Variables

A random Hermitian matrix $\mathbf{X} \in \mathbb{C}^{d \times d}$ with $\mathbb{E}[\mathbf{X}] = \mathbf{0}$ is called matrix sub-Gaussian if its matrix MGF satisfies $\mathbb{E}[e^{t\mathbf{X}}] \preceq \exp\!\left(\frac{t^2 \sigma^2}{2}\mathbf{I}\right)$ in the positive semidefinite (Löwner) order, for all $t \in \mathbb{R}$ .

Theorem: Matrix Bernstein Inequality

Let $\mathbf{X}_1, \ldots, \mathbf{X}_n$ be independent random Hermitian matrices of dimension $d$ with $\mathbb{E}[\mathbf{X}_k] = \mathbf{0}$ and $\|\mathbf{X}_k\| \leq K$ a.s. (operator norm). Define the matrix variance parameter $\sigma^2 = \left\|\sum_{k=1}^n \mathbb{E}[\mathbf{X}_k^2]\right\|.$ Then for any $t > 0$ : $\mathbb{P}\!\left(\left\|\sum_{k=1}^n \mathbf{X}_k\right\| \geq t\right) \leq 2d \cdot \exp\!\left(-\frac{t^2/2}{\sigma^2 + Kt/3}\right).$

This is the matrix analogue of Bernstein's inequality. The extra factor of $2d$ (the dimension) comes from a union bound over eigenvalues — a remarkably small price for extending scalar concentration to matrices. The bound is tight enough to be useful for $d$ up to thousands.

Proof

Trace exponential method

For $s > 0$ : $\mathbb{P}\!\left(\lambda_{\max}\!\left(\sum_k \mathbf{X}_k\right) \geq t\right) \leq e^{-st}\mathbb{E}\!\left[\mathrm{tr}\exp\!\left(s\sum_k \mathbf{X}_k\right)\right].$ This uses $\lambda_{\max}(\mathbf{A}) \geq t \Rightarrow \mathrm{tr}(e^{s\mathbf{A}}) \geq e^{st}$ .

Lieb's concavity and decoupling

By Lieb's concavity theorem and the Golden-Thompson inequality, the trace expectation can be bounded by a product of matrix MGFs, leading to: $\mathbb{E}\!\left[\mathrm{tr}\exp\!\left(s\sum_k \mathbf{X}_k\right)\right] \leq d \cdot \exp\!\left(\frac{s^2\sigma^2}{2(1 - sK/3)}\right).$

Optimize and symmetrize

Optimizing over $s$ and applying the same argument to $-\sum_k \mathbf{X}_k$ gives the factor of $2d$ and the stated bound. $\blacksquare$

⚠️Engineering Note

Matrix Bernstein in Massive MIMO Channel Estimation

In massive MIMO, the sample covariance matrix $\hat{\mathbf{R}} = \frac{1}{n}\sum_{k=1}^n \mathbf{h}_k\mathbf{h}_k^H$ estimates the true spatial covariance $\mathbf{R}$ . The matrix Bernstein inequality bounds the operator norm error $\|\hat{\mathbf{R}} - \mathbf{R}\|$ : with $n = O(d\log d / \epsilon^2)$ samples, the error is at most $\epsilon\|\mathbf{R}\|$ with high probability. This tells the system designer how many pilot transmissions are needed to learn the channel covariance to a given accuracy — a question that directly impacts the overhead of covariance-aided beamforming in 5G NR massive MIMO.

,

Sub-Gaussian

A random variable whose moment generating function is dominated by that of a Gaussian. Equivalently, its tails decay at least as fast as $e^{-ct^2}$ for some constant $c$ . The class includes all bounded random variables, Gaussian variables, and their sums.

Sub-Exponential

A random variable whose tails decay at least as fast as $e^{-ct}$ for some constant $c$ . Weaker than sub-Gaussian: squares of sub-Gaussian variables are sub-exponential. Examples include $\chi^2$ random variables and products of Gaussians.

Common Mistake: Sub-Gaussian Parameter $\neq$ Variance

Mistake:

Treating the sub-Gaussian parameter $\sigma_{\mathrm{sg}}^2$ as equal to $\text{Var}(X)$ .

Correction:

The sub-Gaussian parameter is an upper bound on the variance: $\text{Var}(X) \leq \sigma_{\mathrm{sg}}^2$ . For a Bernoulli( $1/2$ ) variable, $\text{Var}(X) = 1/4$ but Hoeffding's lemma gives $\sigma_{\mathrm{sg}}^2 = 1/4$ (tight in this case). For a uniform $[-1,1]$ variable, $\text{Var}(X) = 1/3$ but $\sigma_{\mathrm{sg}}^2 = 1$ (the Hoeffding bound is loose by 3x). Using variance-sensitive bounds (Bernstein) can recover this gap.

Historical Note: Wassily Hoeffding and the Art of Concentration

1963

Wassily Hoeffding (1914--1991), born in Finland to a Russian family, published his celebrated inequality in 1963 while at the University of North Carolina. His paper "Probability Inequalities for Sums of Bounded Random Variables" is one of the most cited papers in probability and statistics. The beauty of Hoeffding's approach is its simplicity: the bounded-variable MGF lemma plus the Chernoff technique yields a distribution-free concentration inequality that remains the default tool whenever variables are known to be bounded. Hoeffding also made fundamental contributions to U-statistics and nonparametric statistics.

Quick Check

If $X_1, \ldots, X_n$ are independent $\sigma$ -sub-Gaussian random variables, what is the sub-Gaussian parameter of $S_n = \sum_{i=1}^n X_i$ ?

$n\sigma^2$ (parameters add)

$n^2\sigma^2$

$\sigma^2$ (unchanged)

$\sqrt{n}\sigma^2$

Correction:

n\sigma^2

(parameters add)

For independent sub-Gaussian variables, $\mathbb{E}[e^{tS_n}] = \prod_i \mathbb{E}[e^{tX_i}] \leq \prod_i e^{\sigma^2 t^2/2} = e^{n\sigma^2 t^2/2}$ .

Key Takeaway

Sub-Gaussian random variables form a natural class for concentration inequalities: their MGFs are bounded by Gaussian MGFs, their tails decay at least as fast as Gaussian tails, and sums of independent sub-Gaussians are sub-Gaussian with additive parameters. Hoeffding's inequality, Bernstein's inequality, and the matrix Bernstein inequality are the workhorses of modern high-dimensional probability and are essential tools for analyzing wireless communication systems with finite block lengths.

Sub-Gaussian Random Variables

A Hierarchy of Tail Behavior

Definition: Sub-Gaussian Random Variable

Definition: Sub-Gaussian Norm

Theorem: Hoeffding's Lemma

Convexity bound

Take expectation

Reparametrize

Theorem: Hoeffding's Inequality

Center and apply Chernoff

Apply Hoeffding's lemma

Optimize over $s$

Example: Polling Accuracy via Hoeffding's Inequality

Apply Hoeffding

Solve for $n$

Comparison of Tail Bounds

Comparing Tail Bounds: Markov, Chebyshev, Hoeffding, Chernoff

Parameters

Definition: Sub-Exponential Random Variable

Theorem: Bernstein's Inequality

MGF bound for bounded variables

Apply Chernoff bound

Definition: Matrix-Valued Random Variables

Theorem: Matrix Bernstein Inequality

Trace exponential method

Lieb's concavity and decoupling

Optimize and symmetrize

Matrix Bernstein in Massive MIMO Channel Estimation

Sub-Gaussian

Sub-Exponential

Common Mistake: Sub-Gaussian Parameter ≠\neq= Variance

Historical Note: Wassily Hoeffding and the Art of Concentration

Quick Check

Key Takeaway

Definition:
Sub-Gaussian Random Variable

Definition:
Sub-Gaussian Norm

Definition:
Sub-Exponential Random Variable

Definition:
Matrix-Valued Random Variables

Common Mistake: Sub-Gaussian Parameter $\neq$ Variance