Large Deviations and Cramer's Theorem
Beyond the CLT: Rare Events
The CLT tells us that is approximately , so the probability that deviates from by more than is roughly , which decays like for an appropriate constant .
But the CLT only captures the behavior in the bulk of the distribution. For large deviations β the probability that exceeds for β the CLT's Gaussian approximation gives the wrong exponential rate. Cramer's theorem provides the exact rate, expressed through the Fenchel-Legendre transform of the cumulant generating function.
Definition: Fenchel-Legendre Transform (Rate Function)
Fenchel-Legendre Transform (Rate Function)
Let be the cumulant generating function. The Fenchel-Legendre transform (or rate function) is
For , the supremum is achieved at the unique satisfying .
Geometrically, is the gap between the line and the convex curve , maximized over . Since is convex with , the rate function for and .
Rate Function
The Fenchel-Legendre transform . It gives the exponential decay rate of for i.i.d. sums: .
Related: Cumulant Generating Function, Chernoff Bound
Theorem: The Chernoff Bound
For any and :
Apply Markov's inequality to : . Optimizing over gives the tightest exponential bound.
Markov's inequality
For : .
Evaluate the MGF
By independence: .
So .
Optimize
Take the infimum over : .
Chernoff Bound
The exponential upper bound . For sums of i.i.d. RVs, it gives .
Related: Rate Function
Theorem: Cramer's Theorem
Let be i.i.d. with mean , variance , and MGF finite in a neighborhood of . For with :
Equivalently, (exponential equivalence).
The Chernoff bound gives the upper bound. The matching lower bound comes from an exponential change of measure (tilting): reweight the distribution so that becomes the new mean, apply the CLT under the tilted measure, and translate back. The two bounds meet, proving that the Chernoff bound is exponentially tight.
Upper bound
This is exactly the Chernoff bound (Theorem TThe Chernoff Bound): .
Tilted distribution
Let solve . Define the tilted CDF: .
Under , the mean shifts to and the variance is .
Change of measure
.
Lower bound via CLT on tilted measure
Under , the tilted RVs have mean , so by LLN. For :
.
By the CLT under the tilted measure, .
Let $b \downarrow a$
.
The Rate Function and Large Deviation Probabilities
Explore the rate function for different distributions. The left panel shows the CGF with the tangent line of slope . The right panel shows as a function of . Adjust the threshold to see how the exponential decay rate changes.
Parameters
Large Deviations and Error Exponents in Coding Theory
The large deviations framework is not just an abstract curiosity β it is the mathematical engine behind error exponents in information theory. When a decoder makes an error, it is because the empirical statistics of the noise sequence deviate from their typical behavior. The rate at which the error probability decays with blocklength is precisely a large deviation rate function (the random coding error exponent).
The tilted distribution technique used in the lower bound of Cramer's theorem reappears as the "change of measure" in the sphere-packing bound for channel coding (Gallager, 1965).
- β’
Requires finite MGF (light-tailed distributions)
- β’
Heavy-tailed noise requires different techniques
Common Mistake: Cramer's Theorem Requires Light Tails
Mistake:
Applying Cramer's theorem to distributions without a finite MGF in a neighborhood of the origin. For example, Pareto or log-normal distributions have infinite MGF for all .
Correction:
For heavy-tailed distributions, the Chernoff bound and Cramer's theorem do not apply. Large deviation probabilities decay subexponentially β often polynomially. Use the theory of subexponential distributions or specialized results (e.g., Nagaev's theorem) instead.