Sub-Gaussian Random Variables
A Hierarchy of Tail Behavior
The Chernoff bound from Chapter 10 gives exponential tail bounds whenever the MGF exists. But for many applications — finite blocklength information theory, compressed sensing, machine learning generalization bounds — we need a systematic way to classify random variables by their tail behavior. Sub-Gaussian random variables form the "well-behaved" class: their tails decay at least as fast as a Gaussian's, even though they may not be Gaussian at all. Bounded random variables, Rademacher variables, and any random variable with a light enough tail fall into this class — and all of them inherit the same clean concentration inequalities.
Definition: Sub-Gaussian Random Variable
Sub-Gaussian Random Variable
A zero-mean random variable is sub-Gaussian with parameter (or -sub-Gaussian) if its moment generating function satisfies The parameter is called the sub-Gaussian variance proxy (or sub-Gaussian parameter). A general (non-zero-mean) random variable is -sub-Gaussian if is.
The condition says the MGF of is dominated by the MGF of a random variable. This does not mean is Gaussian — it means its tails are at most as heavy as a Gaussian's. The sub-Gaussian parameter may be larger than .
Definition: Sub-Gaussian Norm
Sub-Gaussian Norm
The sub-Gaussian norm (or -norm) of a random variable is A random variable is sub-Gaussian if and only if . The sub-Gaussian parameter satisfies for a universal constant .
Theorem: Hoeffding's Lemma
Let be a random variable with and almost surely. Then is sub-Gaussian with parameter :
A bounded random variable cannot have heavier tails than a Gaussian — the boundedness constrains the MGF to grow at most quadratically in the exponent. The factor of 8 (not 2) reflects that is a conservative proxy for the standard deviation.
Use the convexity of the exponential: for .
Take expectations and use to simplify.
Apply the elementary inequality for .
Convexity bound
Since is convex in and :
Take expectation
\mathbb{E}[X] = 0\mathbb{E}[X-a] = -a\mathbb{E}[b-X] = b$).
Reparametrize
Let , . Then . One shows by verifying and . Therefore .
Theorem: Hoeffding's Inequality
Let be independent (not necessarily identically distributed) random variables with . Let . Then for any : For identically bounded variables (, for all ):
The sum of independent sub-Gaussian variables is sub-Gaussian, and the sub-Gaussian parameters add. Hoeffding's inequality is the Chernoff bound applied to the sub-Gaussian MGF bound.
Center and apply Chernoff
Let . For :
Apply Hoeffding's lemma
Each , so by Hoeffding's lemma:
Optimize over $s$
ss^* = 4nt / \sum_i (b_i - a_i)^2\blacksquare$
Example: Polling Accuracy via Hoeffding's Inequality
A poll samples voters, each independently supporting candidate A with unknown probability . The empirical proportion is . How large must be to guarantee ?
Apply Hoeffding
Each , so . By Hoeffding's inequality:
Solve for $n$
Set : About 2050 samples suffice — a distribution-free guarantee.
Comparison of Tail Bounds
| Bound | Requirement | Tail Decay | Tightness |
|---|---|---|---|
| Markov | Non-negative RV | Very loose | |
| Chebyshev | Finite variance | Loose | |
| Chernoff (general) | Finite MGF in interval | for some | Tight rate, loose constant |
| Hoeffding | Bounded RVs | Good for bounded, loose for small-variance | |
| Sub-Gaussian | sub-Gaussian | Tight for Gaussian-like tails | |
| Bernstein | Bounded + variance info | Best of Hoeffding and Chebyshev |
Comparing Tail Bounds: Markov, Chebyshev, Hoeffding, Chernoff
Compare the quality of different tail bounds for where are i.i.d. bounded random variables. See how the Hoeffding and Chernoff bounds dramatically improve on Markov and Chebyshev.
Parameters
Definition: Sub-Exponential Random Variable
Sub-Exponential Random Variable
A zero-mean random variable is sub-exponential with parameters if Equivalently, is sub-exponential if , where
Sub-exponential is a weaker condition than sub-Gaussian. The key distinction: a sub-Gaussian variable has MGF bounded by a Gaussian MGF for all , while a sub-exponential variable only needs this for small enough. Products and squares of sub-Gaussian variables are sub-exponential but typically not sub-Gaussian.
Theorem: Bernstein's Inequality
Let be independent zero-mean random variables with almost surely. Then for any : where .
Bernstein's inequality interpolates between Hoeffding (for large , where the linear term dominates) and a variance-sensitive Gaussian bound (for small , where dominates). When , Bernstein is much tighter than Hoeffding.
MGF bound for bounded variables
For and : for .
Apply Chernoff bound
s\blacksquare$
Definition: Matrix-Valued Random Variables
Matrix-Valued Random Variables
A random Hermitian matrix with is called matrix sub-Gaussian if its matrix MGF satisfies in the positive semidefinite (Löwner) order, for all .
Theorem: Matrix Bernstein Inequality
Let be independent random Hermitian matrices of dimension with and a.s. (operator norm). Define the matrix variance parameter Then for any :
This is the matrix analogue of Bernstein's inequality. The extra factor of (the dimension) comes from a union bound over eigenvalues — a remarkably small price for extending scalar concentration to matrices. The bound is tight enough to be useful for up to thousands.
Trace exponential method
For : This uses .
Lieb's concavity and decoupling
By Lieb's concavity theorem and the Golden-Thompson inequality, the trace expectation can be bounded by a product of matrix MGFs, leading to:
Optimize and symmetrize
Optimizing over and applying the same argument to gives the factor of and the stated bound.
Matrix Bernstein in Massive MIMO Channel Estimation
In massive MIMO, the sample covariance matrix estimates the true spatial covariance . The matrix Bernstein inequality bounds the operator norm error : with samples, the error is at most with high probability. This tells the system designer how many pilot transmissions are needed to learn the channel covariance to a given accuracy — a question that directly impacts the overhead of covariance-aided beamforming in 5G NR massive MIMO.
Sub-Gaussian
A random variable whose moment generating function is dominated by that of a Gaussian. Equivalently, its tails decay at least as fast as for some constant . The class includes all bounded random variables, Gaussian variables, and their sums.
Related: Sub-Exponential Random Variable, Hoeffding's Inequality
Sub-Exponential
A random variable whose tails decay at least as fast as for some constant . Weaker than sub-Gaussian: squares of sub-Gaussian variables are sub-exponential. Examples include random variables and products of Gaussians.
Related: Sub-Gaussian, Bernstein's Inequality
Common Mistake: Sub-Gaussian Parameter Variance
Mistake:
Treating the sub-Gaussian parameter as equal to .
Correction:
The sub-Gaussian parameter is an upper bound on the variance: . For a Bernoulli() variable, but Hoeffding's lemma gives (tight in this case). For a uniform variable, but (the Hoeffding bound is loose by 3x). Using variance-sensitive bounds (Bernstein) can recover this gap.
Historical Note: Wassily Hoeffding and the Art of Concentration
1963Wassily Hoeffding (1914--1991), born in Finland to a Russian family, published his celebrated inequality in 1963 while at the University of North Carolina. His paper "Probability Inequalities for Sums of Bounded Random Variables" is one of the most cited papers in probability and statistics. The beauty of Hoeffding's approach is its simplicity: the bounded-variable MGF lemma plus the Chernoff technique yields a distribution-free concentration inequality that remains the default tool whenever variables are known to be bounded. Hoeffding also made fundamental contributions to U-statistics and nonparametric statistics.
Quick Check
If are independent -sub-Gaussian random variables, what is the sub-Gaussian parameter of ?
(parameters add)
(unchanged)
For independent sub-Gaussian variables, .
Key Takeaway
Sub-Gaussian random variables form a natural class for concentration inequalities: their MGFs are bounded by Gaussian MGFs, their tails decay at least as fast as Gaussian tails, and sums of independent sub-Gaussians are sub-Gaussian with additive parameters. Hoeffding's inequality, Bernstein's inequality, and the matrix Bernstein inequality are the workhorses of modern high-dimensional probability and are essential tools for analyzing wireless communication systems with finite block lengths.