Ferkans — Interactive Telecom Tutor

When Exact Error Probabilities Are Impossible

For Gaussian mean shifts the exact error probability is $Q(\cdot)$ --- closed form. For almost every other problem --- correlated Gaussians with different covariances, non-Gaussian noise, coded signals over fading channels --- the integral $\int \min(\pi_0 f_0, \pi_1 f_1)\,dy$ admits no closed form. We need upper bounds that (a) are tight enough to predict system performance, (b) reveal the dominant exponent as the dimension grows. Bhattacharyya and Chernoff are the two workhorse bounds, and the exponents they produce --- particularly the Chernoff information --- are the fundamental limits that drive capacity-error-tradeoff theory and large-deviations analysis.

Theorem: Bhattacharyya Bound

Under the MAP rule with priors $\pi_0, \pi_1$ , the average error probability satisfies $P_e^\star \;\leq\; \sqrt{\pi_0 \pi_1}\;\int_\mathcal{Y} \sqrt{f_0(y)\,f_1(y)}\,dy \;=\; \sqrt{\pi_0 \pi_1}\; \rho_B,$ where $\rho_B := \int \sqrt{f_0 f_1}\,dy \in [0,1]$ is the Bhattacharyya coefficient.

$\rho_B$ is the geometric-mean overlap between $f_0$ and $f_1$ . When $f_0 = f_1$ , $\rho_B = 1$ ; when the supports are disjoint, $\rho_B = 0$ . It is $1 - H^2(f_0,f_1)/2$ where $H^2$ is the squared Hellinger distance --- a proper metric on probability distributions. The bound says: the less the two densities overlap, the smaller the error probability.

Show Hint

Start from $P_e^\star = \int \min(\pi_0 f_0, \pi_1 f_1)\,dy$ .

Use $\min(a,b) \leq \sqrt{ab}$ for $a, b \geq 0$ .

Proof

Exact error under MAP

The MAP rule decides $\mathcal{H}_i$ where $\pi_i f_i(y)$ is larger, and errs on the complement. Therefore $P_e^\star \;=\; \int_\mathcal{Y} \min\bigl(\pi_0 f_0(y), \pi_1 f_1(y)\bigr)\,dy.$

Geometric-arithmetic bound on $\min$

For any $a, b \geq 0$ , $\min(a,b) \leq \sqrt{ab}$ (since $\min(a,b)^2 \leq ab$ ). Hence $\min(\pi_0 f_0, \pi_1 f_1) \leq \sqrt{\pi_0 \pi_1 f_0 f_1}.$

Integrate

$P_e^\star \leq \int \sqrt{\pi_0 \pi_1 f_0(y) f_1(y)}\,dy = \sqrt{\pi_0\pi_1}\int\sqrt{f_0 f_1}\,dy = \sqrt{\pi_0 \pi_1}\,\rho_B. \qquad\blacksquare$ $

,

Example: Bhattacharyya Coefficient for Two Gaussians

Compute $\rho_B$ for $f_0 = \mathcal{N}(\mu_0, \sigma^2)$ and $f_1 = \mathcal{N}(\mu_1, \sigma^2)$ , and compare the bound to the exact $P_e$ (equal priors).

Solution

Product of densities

$\sqrt{f_0(y)f_1(y)} = \frac{1}{\sqrt{2\pi}\sigma} \exp\!\left(-\frac{(y-\mu_0)^2 + (y-\mu_1)^2}{4\sigma^2}\right).$ $Completing the square,$ (y-\mu_0)^2 + (y-\mu_1)^2 = 2(y - \bar\mu)^2 + (\mu_0-\mu_1)^2/2 $, where$ \bar\mu = (\mu_0+\mu_1)/2$.

Integrate

$\rho_B = \int\sqrt{f_0 f_1}\,dy = e^{-(\mu_0-\mu_1)^2/(8\sigma^2)} \underbrace{\int \frac{1}{\sqrt{2\pi}\sigma} e^{-(y-\bar\mu)^2/(2\sigma^2)}\,dy}_{=1} = e^{-(\mu_0-\mu_1)^2/(8\sigma^2)}.$ $

Error bound vs. exact

With $d = |\mu_1-\mu_0|/\sigma$ (the normalized separation), $P_e^\star \leq \tfrac{1}{2} e^{-d^2/8} \quad\text{while the exact error is}\quad P_e^\star = Q(d/2).$ For $d = 2$ : bound $= 0.184$ , exact $= Q(1) \approx 0.159$ . For $d = 4$ : bound $= 0.0068$ , exact $= Q(2) \approx 0.0228$ . The bound tracks the exponent $-d^2/8$ but is off by a polynomial factor, reflecting the well-known fact that $Q(x) \sim e^{-x^2/2}/(x\sqrt{2\pi})$ .

Theorem: Chernoff Bound

For any $s \in [0, 1]$ and any priors $\pi_0, \pi_1$ , the MAP error probability satisfies $P_e^\star \;\leq\; \pi_0^{1-s}\,\pi_1^{s}\; \int_\mathcal{Y} f_0(y)^{1-s}\,f_1(y)^{s}\,dy \;=\; \pi_0^{1-s}\pi_1^s\; e^{-\mu(s)},$ where $\mu(s) := -\log \int f_0^{1-s} f_1^s\,dy$ . The optimised bound, $P_e^\star \;\leq\; \min_{s \in [0,1]} \bigl[\pi_0^{1-s}\pi_1^s e^{-\mu(s)}\bigr],$ is the Chernoff bound.

The Chernoff bound is a family of tilted Bhattacharyya bounds: $s=1/2$ recovers Bhattacharyya exactly. Optimising over $s$ selects the tilt that best concentrates mass on the boundary region where the MAP rule errs --- the saddle point of the exponent. The function $\mu(s)$ is concave on $[0,1]$ and vanishes at $s=0, 1$ , so the maximum is interior.

Show Hint

Use the elementary bound $\min(a,b) \leq a^{1-s}b^s$ for any $s \in [0,1]$ and $a,b \geq 0$ .

Substitute into the exact error expression $P_e^\star = \int \min(\pi_0 f_0, \pi_1 f_1)\,dy$ .

Proof

Elementary inequality

For $a, b \geq 0$ and $s \in [0,1]$ , we prove $\min(a,b) \leq a^{1-s}b^s$ . Assume $a \leq b$ ; then $\min(a,b) = a = a^{1-s}a^s \leq a^{1-s}b^s$ . The case $b \leq a$ is symmetric.

Apply pointwise

Set $a = \pi_0 f_0(y)$ and $b = \pi_1 f_1(y)$ : $\min(\pi_0 f_0(y), \pi_1 f_1(y)) \leq (\pi_0 f_0(y))^{1-s}(\pi_1 f_1(y))^s = \pi_0^{1-s}\pi_1^s f_0(y)^{1-s} f_1(y)^s.$

Integrate and optimise

Integrate over $\mathcal{Y}$ : $P_e^\star = \int \min(\pi_0 f_0, \pi_1 f_1)\,dy \leq \pi_0^{1-s}\pi_1^s \int f_0^{1-s} f_1^s\,dy = \pi_0^{1-s}\pi_1^s e^{-\mu(s)}.$ The bound holds for each $s \in [0,1]$ ; take the minimum. $\blacksquare$

,

Theorem: The Chernoff Exponent $\mu(s)$ Is Concave

The function $\mu(s) = -\log \int_\mathcal{Y} f_0(y)^{1-s} f_1(y)^{s}\,dy$ is concave on $[0, 1]$ , with $\mu(0) = \mu(1) = 0$ , and $\mu(s) > 0$ for $s \in (0,1)$ whenever $f_0 \neq f_1$ . Its maximum $s^\star \in (0,1)$ is the solution to $\mu'(s^\star) = 0$ , equivalently $\mathbb{E}_{f_{s^\star}}\!\bigl[\ell(Y)\bigr] = 0,$ where $f_s(y) \propto f_0(y)^{1-s}f_1(y)^s$ is the tilted density.

$\mu$ is (minus the log of) an expectation of $e^{-s \ell}$ under $f_0$ --- the log-moment-generating function of $-\ell(Y)$ . Log-MGFs are convex, so $\mu$ is concave. The zero-mean condition at $s^\star$ says: the tilted distribution $f_{s^\star}$ makes the LLR a zero-mean random variable --- a canonical large-deviations saddle point.

Proof

Introduce the log-MGF

Let $X = \ell(Y) = \log(f_1(Y)/f_0(Y))$ and consider $Y \sim f_0$ . Then $\int f_0^{1-s}f_1^s\,dy = \int f_0 \cdot (f_1/f_0)^s\,dy = \mathbb{E}_{f_0}[e^{sX}].$ Hence $\mu(s) = -\log \mathbb{E}_{f_0}[e^{sX}]$ , which is the negative of the log-MGF of $X$ under $f_0$ .

Concavity from Hölder

The log-MGF $\Lambda(s) = \log \mathbb{E}_{f_0}[e^{sX}]$ is convex in $s$ (a standard consequence of Hölder's inequality applied to $\mathbb{E}[e^{(\lambda s_1 + (1-\lambda) s_2)X}]$ ). Therefore $\mu(s) = -\Lambda(s)$ is concave.

Boundary values

At $s=0$ : $\int f_0^1 f_1^0\,dy = \int f_0\,dy = 1$ , so $\mu(0)=0$ . At $s=1$ : $\int f_0^0 f_1^1\,dy = 1$ , so $\mu(1)=0$ . For $s \in (0,1)$ , $f_0^{1-s}f_1^s$ integrates to $\leq 1$ by Hölder, with strict inequality unless $f_0 = f_1$ a.e. Hence $\mu(s) > 0$ strictly.

Optimality condition

By concavity, the maximum $s^\star$ of $\mu$ solves $\mu'(s^\star) = 0$ . Differentiating, $\mu'(s) = -\frac{\int f_0^{1-s}f_1^s \log(f_1/f_0)\,dy} {\int f_0^{1-s}f_1^s\,dy} = -\mathbb{E}_{f_s}\!\bigl[\ell(Y)\bigr],$ where $f_s$ is the tilted density. Setting this to zero: $\mathbb{E}_{f_{s^\star}}[\ell] = 0$ . $\blacksquare$

Definition:
Chernoff Information

The Chernoff information between $f_0$ and $f_1$ is $C(f_0, f_1) \;=\; \max_{s \in [0,1]} \mu(s) \;=\; -\log \min_{s}\int f_0^{1-s}f_1^s\,dy.$ For $n$ i.i.d. observations, the Chernoff bound becomes $P_e^\star \leq \pi_0^{1-s}\pi_1^s e^{-n\mu(s)}$ , so $\lim_{n\to\infty} -\frac{1}{n}\log P_e^\star \;\geq\; C(f_0, f_1).$ In fact equality holds (Chernoff, 1952): $C(f_0, f_1)$ is the best achievable error exponent under the Bayesian criterion.

Compare with Stein's lemma for the NP criterion, where the error exponents are the asymmetric KL divergences $D(f_0 \| f_1)$ and $D(f_1 \| f_0)$ . The Chernoff information is the symmetric Bayes-optimal exponent; Stein's exponent is asymmetric.

Example: Chernoff Information for Two Gaussians

Compute $\mu(s)$ and $C(f_0, f_1)$ for $f_0 = \mathcal{N}(\mu_0, \sigma^2)$ and $f_1 = \mathcal{N}(\mu_1, \sigma^2)$ .

Solution

Compute the integral

By the completing-the-square argument of EBhattacharyya Coefficient for Two Gaussians generalised, $\int f_0^{1-s}f_1^s\,dy = e^{-s(1-s)(\mu_1-\mu_0)^2/(2\sigma^2)}.$ Hence $\mu(s) = s(1-s)\,d^2/2$ where $d = (\mu_1-\mu_0)/\sigma$ .

Optimise over $s$

$\mu(s) = s(1-s)d^2/2$ is a parabola in $s$ with maximum at $s^\star = 1/2$ , yielding $C(f_0, f_1) = \mu(1/2) = d^2/8.$ Bhattacharyya's bound $(s=1/2)$ is already Chernoff-optimal for equal-variance Gaussians --- a special property of the symmetric family.

Interpretation

For $n$ i.i.d. samples and equal priors, $P_e^\star \leq \tfrac{1}{2}e^{-n d^2/8}, \qquad\text{exact: } P_e^\star = Q(d\sqrt{n}/2) \sim \frac{e^{-nd^2/8}}{d\sqrt{\pi n/2}}.$ The exponent matches exactly; Chernoff misses only the polynomial $1/\sqrt{n}$ correction.

Chernoff Bound vs. Exact Error, and the Exponent $\mu(s)$

For the Gaussian mean-shift problem with $n$ i.i.d. samples, compare the Chernoff bound, Bhattacharyya bound, and exact error probability as functions of $n$ . The right panel plots $\mu(s)$ and its maximiser $s^\star$ .

Parameters

d = \mu/\sigma

1

Normalised separation

n_{\max}

30

Max number of samples

Numerical Computation of Chernoff Information

Complexity:

O(K \cdot |\mathcal{Y}|)

for grid-based integration with

K

values of

s

.

Input: Densities f0, f1 on a grid Y

Output: Chernoff information C(f0, f1) and optimiser s*

1: for s in linspace(0, 1, K):

2: integrand[s] <- f0^(1-s) * f1^s # pointwise on Y

3: Z[s] <- trapezoid(integrand[s], Y) # numerical integral

4: mu[s] <- -log(Z[s])

5: s_star <- argmax_s mu[s]

6: C <- mu[s_star]

7: return C, s_star

For 1D or 2D problems grid quadrature suffices. For high-dimensional problems use Monte Carlo importance sampling with $f_{s^\star}$ as the proposal density.

⚠️Engineering Note

Chernoff Bounds in Coded Systems

In coded communications (Book CM, Book CC), the union bound on codeword error reads $P_e \leq \sum_{m \neq m'} P_2(m \to m')$ , where $P_2(m \to m')$ is a pairwise error probability --- exactly a binary hypothesis test between codewords $m$ and $m'$ . Applying the Chernoff bound to each pairwise test and then optimising $s$ yields the Bhattacharyya bound on codeword error, which dictates the error-exponent of Shannon's random-coding theorem. The Chernoff information therefore sits at the intersection of detection, coding, and information theory --- the same exponent appears in Gallager's random-coding bound, in Stein's lemma, and in large-deviation-rate functions.

📋 Ref: Gallager 1968; Cover-Thomas 2006 Ch. 11

🎓CommIT Contribution(1999)

Chernoff-Bound Error Exponents for MIMO Detection

G. Caire, G. Taricco, E. Biglieri — IEEE Trans. Inf. Theory, vol. 44, no. 3

The CommIT group's work on bit-interleaved coded modulation (BICM) applies Chernoff-style bounding to pairwise decoding errors across a product channel, yielding the BICM capacity and random-coding exponent. The technique --- exponent optimisation over a tilting parameter --- is precisely the Chernoff bound of TChernoff Bound applied to the likelihood ratio between pairs of coded symbols after binary labelling. Chapter 5 of this book revisits the BICM exponent with the factor-graph tools developed in Parts IV-V.

bicmerror-exponentschernoffView Paper →

Common Mistake: Chernoff for One-Sided vs. Two-Sided Errors

Mistake:

Applying the Chernoff bound with $s \in [0,1]$ to bound $P_F$ alone (or $P_M$ alone) in the Neyman-Pearson framework.

Correction:

Our Chernoff bound is a bound on the average MAP error $P_e$ , which combines $P_F$ and $P_M$ via the priors. For one-sided error bounds, use Stein's lemma: $-\frac{1}{n}\log P_M \to D(f_0 \| f_1)$ when $P_F \leq \alpha$ is held fixed. The two exponents ( $D$ vs. Chernoff information) are generally distinct.

Quick Check

For two Gaussians with equal variance and means differing by $d$ standard deviations, the Chernoff-optimal $s^\star$ is:

$s^\star = 1/2$

$s^\star = 1/d$

$s^\star = d^2/8$

$s^\star$ depends on the priors.

Correction:

s^\star = 1/2

The Gaussian $\mu(s) = s(1-s)d^2/2$ is a symmetric parabola, so its max is at the midpoint $s=1/2$ . This is why Bhattacharyya is tight for equal-covariance Gaussians.

Chernoff information

$C(f_0, f_1) = \max_{s \in [0,1]} -\log \int f_0^{1-s}f_1^s\,dy$ . It is the best achievable Bayesian error exponent for binary hypothesis testing with $n$ i.i.d. observations. Symmetric in $(f_0, f_1)$ and always non-negative, with $C = 0$ iff $f_0 = f_1$ a.e.

Bhattacharyya coefficient

$\rho_B = \int \sqrt{f_0 f_1}\,dy$ . The geometric-mean overlap of two densities, lying in $[0,1]$ . Related to the Hellinger distance by $H^2(f_0, f_1) = 2(1 - \rho_B)$ .

Historical Note: Chernoff, 1952

1950s

Herman Chernoff (b. 1923) introduced his eponymous bound in "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations" (Ann. Math. Stat., 1952). The paper established the fundamental error exponent for Bayesian binary testing and, in so doing, laid groundwork for what would become the theory of large deviations. Chernoff's exponent-optimisation technique --- now taught in every information theory course --- is equally central to Shannon's random-coding exponents (1959) and to Gallager's error-exponent theory (1965).

Key Takeaway

For $n$ i.i.d. observations, the MAP error probability decays exponentially: $P_e^\star \sim e^{-n C(f_0, f_1)}$ . The Chernoff information $C(f_0,f_1)$ is the exponent --- the single most important scalar summarising the difficulty of the hypothesis-testing problem. Every design question ("how much SNR do I need?" "how many samples suffice?") ultimately reduces to bounding or computing this exponent.

Performance Bounds: Bhattacharyya and Chernoff

When Exact Error Probabilities Are Impossible

Theorem: Bhattacharyya Bound

Exact error under MAP

Geometric-arithmetic bound on $\min$

Integrate

Example: Bhattacharyya Coefficient for Two Gaussians

Product of densities

Integrate

Error bound vs. exact

Theorem: Chernoff Bound

Elementary inequality

Apply pointwise

Integrate and optimise

Theorem: The Chernoff Exponent μ(s)\mu(s)μ(s) Is Concave

Introduce the log-MGF

Concavity from Hölder

Boundary values

Optimality condition

Definition: Chernoff Information

Example: Chernoff Information for Two Gaussians

Compute the integral

Optimise over $s$

Interpretation

Chernoff Bound vs. Exact Error, and the Exponent μ(s)\mu(s)μ(s)

Parameters

Numerical Computation of Chernoff Information

Chernoff Bounds in Coded Systems

Chernoff-Bound Error Exponents for MIMO Detection

Common Mistake: Chernoff for One-Sided vs. Two-Sided Errors

Quick Check

Chernoff information

Bhattacharyya coefficient

Historical Note: Chernoff, 1952

Key Takeaway

Theorem: The Chernoff Exponent $\mu(s)$ Is Concave

Definition:
Chernoff Information

Chernoff Bound vs. Exact Error, and the Exponent $\mu(s)$