The Chernoff Bound

The Exponential Tilt

Chebyshev uses the second moment; what if we could use all moments at once? The moment generating function MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] encodes the entire moment sequence. The Chernoff bound exploits this by applying Markov's inequality to etXe^{tX} --- an exponential "tilt" of the distribution --- and then optimizing over the tilt parameter tt. The result is exponentially tight: it captures the correct exponential decay rate of the tail probability. This is the engine behind error exponents in coding theory and large deviations theory.

Theorem: The Chernoff Bound

Let XX be a random variable whose moment generating function MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] exists in a neighborhood of the origin. Then for any a∈Ra \in \mathbb{R}: P(Xβ‰₯a)≀inf⁑t>0 eβˆ’ta MX(t),\mathbb{P}(X \geq a) \leq \inf_{t > 0}\, e^{-ta}\, M_X(t), P(X≀a)≀inf⁑t<0 eβˆ’ta MX(t).\mathbb{P}(X \leq a) \leq \inf_{t < 0}\, e^{-ta}\, M_X(t).

For any t>0t > 0, the exponential et(xβˆ’a)e^{t(x - a)} is an upper bound on the indicator I{xβ‰₯a}I_{\{x \geq a\}} (since et(xβˆ’a)β‰₯1e^{t(x-a)} \geq 1 when xβ‰₯ax \geq a and et(xβˆ’a)>0e^{t(x-a)} > 0 always). Taking expectations and optimizing over tt gives the tightest possible exponential bound.

,

Example: Chernoff Bound for the Gaussian

Let X∼N(0,Οƒ2)X \sim \mathcal{N}(0, \sigma^2). Compute the Chernoff bound on P(Xβ‰₯a)\mathbb{P}(X \geq a) for a>0a > 0.

Example: Chernoff Bound for the Poisson

Let X∼Poisson(Ξ»)X \sim \text{Poisson}(\lambda). Derive the Chernoff bound on P(Xβ‰₯a)\mathbb{P}(X \geq a) for a>Ξ»a > \lambda.

Definition:

Chernoff Exponent (Rate Function)

The Chernoff exponent for the upper tail at threshold aa is: I(a)=sup⁑t>0 (taβˆ’log⁑MX(t)).I(a) = \sup_{t > 0}\, \bigl(ta - \log M_X(t)\bigr). This is the Legendre-Fenchel transform of the cumulant generating function Ξ›(t)=log⁑MX(t)\Lambda(t) = \log M_X(t). The Chernoff bound becomes P(Xβ‰₯a)≀eβˆ’I(a)\mathbb{P}(X \geq a) \leq e^{-I(a)}.

In large deviations theory, I(a)I(a) is called the rate function. CramΓ©r's theorem shows that I(a)I(a) is the exact asymptotic rate: lim⁑nβ†’βˆž1nlog⁑P(XΛ‰nβ‰₯a)=βˆ’I(a)\lim_{n \to \infty} \frac{1}{n} \log \mathbb{P}(\bar{X}_n \geq a) = -I(a).

Definition:

Sub-Gaussian Random Variable

A zero-mean random variable XX is sub-Gaussian with parameter Οƒ\sigma if its MGF satisfies: E[etX]≀eΟƒ2t2/2forΒ allΒ t∈R.\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2} \quad \text{for all } t \in \mathbb{R}. Equivalently, the Chernoff bound gives Gaussian-type tails: P(Xβ‰₯a)≀eβˆ’a2/(2Οƒ2)\mathbb{P}(X \geq a) \leq e^{-a^2/(2\sigma^2)} for all a>0a > 0.

Bounded random variables, Gaussian random variables, and sums of independent sub-Gaussian variables are all sub-Gaussian. This property is the key ingredient in Hoeffding's inequality.

Chernoff Bound Optimization

Visualize the Chernoff bounding function eβˆ’taMX(t)e^{-ta} M_X(t) as a function of tt for different distributions and thresholds. The minimum over t>0t > 0 gives the Chernoff bound.

Parameters
3
1

The Exponential Tilt Animation

Animated visualization of how the exponential tilting etXe^{tX} reshapes the distribution and tightens the tail bound as tt is optimized.
As tt increases, the tilted distribution ft(x)∝etxf(x)f_t(x) \propto e^{tx} f(x) shifts its mass toward the threshold aa. The optimal tβˆ—t^* balances the tilt against the exponential penalty eβˆ’tae^{-ta}.

Common Mistake: The MGF Must Exist

Mistake:

Applying the Chernoff bound to a heavy-tailed distribution (e.g., Cauchy or Pareto with α≀1\alpha \leq 1) whose MGF is infinite.

Correction:

The Chernoff bound requires MX(t)<∞M_X(t) < \infty for some t>0t > 0. Heavy-tailed distributions fail this condition. For such distributions, use Markov or Chebyshev (if moments exist), or specialized heavy-tail bounds.

Why This Matters: Chernoff Bound and Random Coding Error Exponents

The Chernoff bound is the engine behind error exponents in coding theory. When analyzing the probability of decoding error for a random code over a discrete memoryless channel, the pairwise error probability between codewords is bounded by a Chernoff-type expression. Optimizing the Chernoff parameter s∈[0,1]s \in [0, 1] yields the Gallager exponent E0(ρ)E_0(\rho), which determines how fast PeP_e decays with blocklength. We develop this connection in detail in Book ITA, Chapter 4.

πŸ”§Engineering Note

Computing the Chernoff Bound in Practice

For distributions with closed-form MGFs (Gaussian, Poisson, binomial, exponential), the Chernoff optimization is a one-dimensional convex problem that can be solved analytically. For empirical distributions or complex models, compute MX(t)M_X(t) numerically on a grid and minimize eβˆ’taMX(t)e^{-ta} M_X(t) --- the objective is log-convex in tt, so any standard optimizer converges quickly.

Moment generating function (MGF)

MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}]. Encodes all moments of XX: E[Xk]=MX(k)(0)\mathbb{E}[X^k] = M_X^{(k)}(0). Exists if tails decay faster than any exponential.

Related: {{Ref:Gloss Chernoff Exponent}}

Chernoff exponent

The rate function I(a)=sup⁑t>0(taβˆ’log⁑MX(t))I(a) = \sup_{t > 0}(ta - \log M_X(t)) governing the exponential decay of P(Xβ‰₯a)\mathbb{P}(X \geq a) via the Chernoff bound.

Related: {{Ref:Gloss Mgf}}

Quick Check

For X∼N(0,1)X \sim \mathcal{N}(0, 1), the Chernoff bound on P(Xβ‰₯a)\mathbb{P}(X \geq a) decays as:

eβˆ’ae^{-a}

eβˆ’a2/2e^{-a^2/2}

1/a21/a^2

eβˆ’a2e^{-a^2}