Ferkans — Interactive Telecom Tutor

The Exponential Tilt

Chebyshev uses the second moment; what if we could use all moments at once? The moment generating function $M_X(t) = \mathbb{E}[e^{tX}]$ encodes the entire moment sequence. The Chernoff bound exploits this by applying Markov's inequality to $e^{tX}$ --- an exponential "tilt" of the distribution --- and then optimizing over the tilt parameter $t$ . The result is exponentially tight: it captures the correct exponential decay rate of the tail probability. This is the engine behind error exponents in coding theory and large deviations theory.

Theorem: The Chernoff Bound

Let $X$ be a random variable whose moment generating function $M_X(t) = \mathbb{E}[e^{tX}]$ exists in a neighborhood of the origin. Then for any $a \in \mathbb{R}$ : $\mathbb{P}(X \geq a) \leq \inf_{t > 0}\, e^{-ta}\, M_X(t),$ $\mathbb{P}(X \leq a) \leq \inf_{t < 0}\, e^{-ta}\, M_X(t).$

For any $t > 0$ , the exponential $e^{t(x - a)}$ is an upper bound on the indicator $I_{\{x \geq a\}}$ (since $e^{t(x-a)} \geq 1$ when $x \geq a$ and $e^{t(x-a)} > 0$ always). Taking expectations and optimizing over $t$ gives the tightest possible exponential bound.

Proof

Monotone transformation

For any $t > 0$ , the function $x \mapsto e^{tx}$ is strictly increasing, so: $\mathbb{P}(X \geq a) = \mathbb{P}(tX \geq ta) = \mathbb{P}(e^{tX} \geq e^{ta}).$

Apply Markov

Since $e^{tX} \geq 0$ , Markov's inequality gives: $\mathbb{P}(e^{tX} \geq e^{ta}) \leq \frac{\mathbb{E}[e^{tX}]}{e^{ta}} = e^{-ta}\, M_X(t).$

Optimize over $t$

The bound holds for every $t > 0$ , so we take the infimum: $\mathbb{P}(X \geq a) \leq \inf_{t > 0}\, e^{-ta}\, M_X(t).$ The minimizing $t^*$ satisfies $M_X'(t^*)/M_X(t^*) = a$ , i.e., the tilted mean equals the threshold $a$ .

,

Example: Chernoff Bound for the Gaussian

Let $X \sim \mathcal{N}(0, \sigma^2)$ . Compute the Chernoff bound on $\mathbb{P}(X \geq a)$ for $a > 0$ .

Solution

MGF of the Gaussian

$M_X(t) = \mathbb{E}[e^{tX}] = e^{\sigma^2 t^2/2}.$

Chernoff function

$e^{-ta} M_X(t) = e^{-ta + \sigma^2 t^2/2}.$

Optimize

Differentiating with respect to $t$ : $-a + \sigma^2 t = 0$ , so $t^* = a/\sigma^2$ . Substituting: $\mathbb{P}(X \geq a) \leq e^{-a^2/(2\sigma^2)}.$

Compare with exact

The exact tail is $Q(a/\sigma)$ , and using the standard bound $Q(x) \leq e^{-x^2/2}$ for $x > 0$ , we see the Chernoff bound matches the correct exponential rate.

Example: Chernoff Bound for the Poisson

Let $X \sim \text{Poisson}(\lambda)$ . Derive the Chernoff bound on $\mathbb{P}(X \geq a)$ for $a > \lambda$ .

Solution

MGF of the Poisson

$M_X(t) = e^{\lambda(e^t - 1)}.$

Chernoff function

$e^{-ta}\, e^{\lambda(e^t - 1)} = e^{-ta + \lambda(e^t - 1)}.$

Optimize

Setting the derivative to zero: $-a + \lambda e^{t^*} = 0$ , so $t^* = \ln(a/\lambda)$ . Substituting: $\mathbb{P}(X \geq a) \leq e^{-a \ln(a/\lambda) + a - \lambda} = \left(\frac{e\lambda}{a}\right)^a e^{-\lambda}.$

Interpretation

The exponent $a \ln(a/\lambda) - (a - \lambda)$ is the Kullback-Leibler divergence between $\text{Poisson}(a)$ and $\text{Poisson}(\lambda)$ . The Chernoff bound for the Poisson tail decays at the rate of the KL divergence --- a preview of the large deviations connection.

Definition:
Chernoff Exponent (Rate Function)

The Chernoff exponent for the upper tail at threshold $a$ is: $I(a) = \sup_{t > 0}\, \bigl(ta - \log M_X(t)\bigr).$ This is the Legendre-Fenchel transform of the cumulant generating function $\Lambda(t) = \log M_X(t)$ . The Chernoff bound becomes $\mathbb{P}(X \geq a) \leq e^{-I(a)}$ .

In large deviations theory, $I(a)$ is called the rate function. Cramér's theorem shows that $I(a)$ is the exact asymptotic rate: $\lim_{n \to \infty} \frac{1}{n} \log \mathbb{P}(\bar{X}_n \geq a) = -I(a)$ .

Definition:
Sub-Gaussian Random Variable

A zero-mean random variable $X$ is sub-Gaussian with parameter $\sigma$ if its MGF satisfies: $\mathbb{E}[e^{tX}] \leq e^{\sigma^2 t^2/2} \quad \text{for all } t \in \mathbb{R}.$ Equivalently, the Chernoff bound gives Gaussian-type tails: $\mathbb{P}(X \geq a) \leq e^{-a^2/(2\sigma^2)}$ for all $a > 0$ .

Bounded random variables, Gaussian random variables, and sums of independent sub-Gaussian variables are all sub-Gaussian. This property is the key ingredient in Hoeffding's inequality.

Chernoff Bound Optimization

Visualize the Chernoff bounding function $e^{-ta} M_X(t)$ as a function of $t$ for different distributions and thresholds. The minimum over $t > 0$ gives the Chernoff bound.

Parameters

Distribution

a

(threshold)3

Distribution parameter1

The Exponential Tilt Animation

Animated visualization of how the exponential tilting

e^{tX}

reshapes the distribution and tightens the tail bound as

t

is optimized.

As

t

increases, the tilted distribution

f_t(x) \propto e^{tx} f(x)

shifts its mass toward the threshold

a

. The optimal

t^*

balances the tilt against the exponential penalty

e^{-ta}

.

Common Mistake: The MGF Must Exist

Mistake:

Applying the Chernoff bound to a heavy-tailed distribution (e.g., Cauchy or Pareto with $\alpha \leq 1$ ) whose MGF is infinite.

Correction:

The Chernoff bound requires $M_X(t) < \infty$ for some $t > 0$ . Heavy-tailed distributions fail this condition. For such distributions, use Markov or Chebyshev (if moments exist), or specialized heavy-tail bounds.

Why This Matters: Chernoff Bound and Random Coding Error Exponents

The Chernoff bound is the engine behind error exponents in coding theory. When analyzing the probability of decoding error for a random code over a discrete memoryless channel, the pairwise error probability between codewords is bounded by a Chernoff-type expression. Optimizing the Chernoff parameter $s \in [0, 1]$ yields the Gallager exponent $E_0(\rho)$ , which determines how fast $P_e$ decays with blocklength. We develop this connection in detail in Book ITA, Chapter 4.

🔧Engineering Note

Computing the Chernoff Bound in Practice

For distributions with closed-form MGFs (Gaussian, Poisson, binomial, exponential), the Chernoff optimization is a one-dimensional convex problem that can be solved analytically. For empirical distributions or complex models, compute $M_X(t)$ numerically on a grid and minimize $e^{-ta} M_X(t)$ --- the objective is log-convex in $t$ , so any standard optimizer converges quickly.

Moment generating function (MGF)

$M_X(t) = \mathbb{E}[e^{tX}]$ . Encodes all moments of $X$ : $\mathbb{E}[X^k] = M_X^{(k)}(0)$ . Exists if tails decay faster than any exponential.

Chernoff exponent

The rate function $I(a) = \sup_{t > 0}(ta - \log M_X(t))$ governing the exponential decay of $\mathbb{P}(X \geq a)$ via the Chernoff bound.

Related: {{Ref:Gloss Mgf}}

Quick Check

For $X \sim \mathcal{N}(0, 1)$ , the Chernoff bound on $\mathbb{P}(X \geq a)$ decays as:

$e^{-a}$

$e^{-a^2/2}$

$1/a^2$

$e^{-a^2}$

Correction:

e^{-a^2/2}

The optimal tilt $t^* = a$ gives $e^{-a^2/2}$ .

The Chernoff Bound

The Exponential Tilt

Theorem: The Chernoff Bound

Monotone transformation

Apply Markov

Optimize over $t$

Example: Chernoff Bound for the Gaussian

MGF of the Gaussian

Chernoff function

Optimize

Compare with exact

Example: Chernoff Bound for the Poisson

MGF of the Poisson

Chernoff function

Optimize

Interpretation

Definition: Chernoff Exponent (Rate Function)

Definition: Sub-Gaussian Random Variable

Chernoff Bound Optimization

Parameters

The Exponential Tilt Animation

Common Mistake: The MGF Must Exist

Why This Matters: Chernoff Bound and Random Coding Error Exponents

Computing the Chernoff Bound in Practice

Moment generating function (MGF)

Chernoff exponent

Quick Check

Definition:
Chernoff Exponent (Rate Function)

Definition:
Sub-Gaussian Random Variable