The Chernoff Bound
The Exponential Tilt
Chebyshev uses the second moment; what if we could use all moments at once? The moment generating function encodes the entire moment sequence. The Chernoff bound exploits this by applying Markov's inequality to --- an exponential "tilt" of the distribution --- and then optimizing over the tilt parameter . The result is exponentially tight: it captures the correct exponential decay rate of the tail probability. This is the engine behind error exponents in coding theory and large deviations theory.
Theorem: The Chernoff Bound
Let be a random variable whose moment generating function exists in a neighborhood of the origin. Then for any :
For any , the exponential is an upper bound on the indicator (since when and always). Taking expectations and optimizing over gives the tightest possible exponential bound.
Monotone transformation
For any , the function is strictly increasing, so:
Apply Markov
Since , Markov's inequality gives:
Optimize over $t$
The bound holds for every , so we take the infimum: The minimizing satisfies , i.e., the tilted mean equals the threshold .
Example: Chernoff Bound for the Gaussian
Let . Compute the Chernoff bound on for .
MGF of the Gaussian
Chernoff function
Optimize
Differentiating with respect to : , so . Substituting:
Compare with exact
The exact tail is , and using the standard bound for , we see the Chernoff bound matches the correct exponential rate.
Example: Chernoff Bound for the Poisson
Let . Derive the Chernoff bound on for .
MGF of the Poisson
Chernoff function
Optimize
Setting the derivative to zero: , so . Substituting:
Interpretation
The exponent is the Kullback-Leibler divergence between and . The Chernoff bound for the Poisson tail decays at the rate of the KL divergence --- a preview of the large deviations connection.
Definition: Chernoff Exponent (Rate Function)
Chernoff Exponent (Rate Function)
The Chernoff exponent for the upper tail at threshold is: This is the Legendre-Fenchel transform of the cumulant generating function . The Chernoff bound becomes .
In large deviations theory, is called the rate function. CramΓ©r's theorem shows that is the exact asymptotic rate: .
Definition: Sub-Gaussian Random Variable
Sub-Gaussian Random Variable
A zero-mean random variable is sub-Gaussian with parameter if its MGF satisfies: Equivalently, the Chernoff bound gives Gaussian-type tails: for all .
Bounded random variables, Gaussian random variables, and sums of independent sub-Gaussian variables are all sub-Gaussian. This property is the key ingredient in Hoeffding's inequality.
Chernoff Bound Optimization
Visualize the Chernoff bounding function as a function of for different distributions and thresholds. The minimum over gives the Chernoff bound.
Parameters
The Exponential Tilt Animation
Common Mistake: The MGF Must Exist
Mistake:
Applying the Chernoff bound to a heavy-tailed distribution (e.g., Cauchy or Pareto with ) whose MGF is infinite.
Correction:
The Chernoff bound requires for some . Heavy-tailed distributions fail this condition. For such distributions, use Markov or Chebyshev (if moments exist), or specialized heavy-tail bounds.
Why This Matters: Chernoff Bound and Random Coding Error Exponents
The Chernoff bound is the engine behind error exponents in coding theory. When analyzing the probability of decoding error for a random code over a discrete memoryless channel, the pairwise error probability between codewords is bounded by a Chernoff-type expression. Optimizing the Chernoff parameter yields the Gallager exponent , which determines how fast decays with blocklength. We develop this connection in detail in Book ITA, Chapter 4.
Computing the Chernoff Bound in Practice
For distributions with closed-form MGFs (Gaussian, Poisson, binomial, exponential), the Chernoff optimization is a one-dimensional convex problem that can be solved analytically. For empirical distributions or complex models, compute numerically on a grid and minimize --- the objective is log-convex in , so any standard optimizer converges quickly.
Moment generating function (MGF)
. Encodes all moments of : . Exists if tails decay faster than any exponential.
Related: {{Ref:Gloss Chernoff Exponent}}
Chernoff exponent
The rate function governing the exponential decay of via the Chernoff bound.
Related: {{Ref:Gloss Mgf}}
Quick Check
For , the Chernoff bound on decays as:
The optimal tilt gives .