Performance Bounds: Bhattacharyya and Chernoff

When Exact Error Probabilities Are Impossible

For Gaussian mean shifts the exact error probability is Q(β‹…)Q(\cdot) --- closed form. For almost every other problem --- correlated Gaussians with different covariances, non-Gaussian noise, coded signals over fading channels --- the integral ∫min⁑(Ο€0f0,Ο€1f1) dy\int \min(\pi_0 f_0, \pi_1 f_1)\,dy admits no closed form. We need upper bounds that (a) are tight enough to predict system performance, (b) reveal the dominant exponent as the dimension grows. Bhattacharyya and Chernoff are the two workhorse bounds, and the exponents they produce --- particularly the Chernoff information --- are the fundamental limits that drive capacity-error-tradeoff theory and large-deviations analysis.

Theorem: Bhattacharyya Bound

Under the MAP rule with priors Ο€0,Ο€1\pi_0, \pi_1, the average error probability satisfies Peβ‹†β€…β€Šβ‰€β€…β€ŠΟ€0Ο€1β€…β€Šβˆ«Yf0(y) f1(y) dyβ€…β€Š=β€…β€ŠΟ€0Ο€1β€…β€ŠΟB,P_e^\star \;\leq\; \sqrt{\pi_0 \pi_1}\;\int_\mathcal{Y} \sqrt{f_0(y)\,f_1(y)}\,dy \;=\; \sqrt{\pi_0 \pi_1}\; \rho_B, where ρB:=∫f0f1 dy∈[0,1]\rho_B := \int \sqrt{f_0 f_1}\,dy \in [0,1] is the Bhattacharyya coefficient.

ρB\rho_B is the geometric-mean overlap between f0f_0 and f1f_1. When f0=f1f_0 = f_1, ρB=1\rho_B = 1; when the supports are disjoint, ρB=0\rho_B = 0. It is 1βˆ’H2(f0,f1)/21 - H^2(f_0,f_1)/2 where H2H^2 is the squared Hellinger distance --- a proper metric on probability distributions. The bound says: the less the two densities overlap, the smaller the error probability.

,

Example: Bhattacharyya Coefficient for Two Gaussians

Compute ρB\rho_B for f0=N(ΞΌ0,Οƒ2)f_0 = \mathcal{N}(\mu_0, \sigma^2) and f1=N(ΞΌ1,Οƒ2)f_1 = \mathcal{N}(\mu_1, \sigma^2), and compare the bound to the exact PeP_e (equal priors).

Theorem: Chernoff Bound

For any s∈[0,1]s \in [0, 1] and any priors Ο€0,Ο€1\pi_0, \pi_1, the MAP error probability satisfies Peβ‹†β€…β€Šβ‰€β€…β€ŠΟ€01βˆ’s π1sβ€…β€Šβˆ«Yf0(y)1βˆ’s f1(y)s dyβ€…β€Š=β€…β€ŠΟ€01βˆ’sΟ€1sβ€…β€Šeβˆ’ΞΌ(s),P_e^\star \;\leq\; \pi_0^{1-s}\,\pi_1^{s}\; \int_\mathcal{Y} f_0(y)^{1-s}\,f_1(y)^{s}\,dy \;=\; \pi_0^{1-s}\pi_1^s\; e^{-\mu(s)}, where ΞΌ(s):=βˆ’log⁑∫f01βˆ’sf1s dy\mu(s) := -\log \int f_0^{1-s} f_1^s\,dy. The optimised bound, Peβ‹†β€…β€Šβ‰€β€…β€Šmin⁑s∈[0,1][Ο€01βˆ’sΟ€1seβˆ’ΞΌ(s)],P_e^\star \;\leq\; \min_{s \in [0,1]} \bigl[\pi_0^{1-s}\pi_1^s e^{-\mu(s)}\bigr], is the Chernoff bound.

The Chernoff bound is a family of tilted Bhattacharyya bounds: s=1/2s=1/2 recovers Bhattacharyya exactly. Optimising over ss selects the tilt that best concentrates mass on the boundary region where the MAP rule errs --- the saddle point of the exponent. The function ΞΌ(s)\mu(s) is concave on [0,1][0,1] and vanishes at s=0,1s=0, 1, so the maximum is interior.

,

Theorem: The Chernoff Exponent ΞΌ(s)\mu(s) Is Concave

The function ΞΌ(s)=βˆ’log⁑∫Yf0(y)1βˆ’sf1(y)s dy\mu(s) = -\log \int_\mathcal{Y} f_0(y)^{1-s} f_1(y)^{s}\,dy is concave on [0,1][0, 1], with ΞΌ(0)=ΞΌ(1)=0\mu(0) = \mu(1) = 0, and ΞΌ(s)>0\mu(s) > 0 for s∈(0,1)s \in (0,1) whenever f0β‰ f1f_0 \neq f_1. Its maximum sβ‹†βˆˆ(0,1)s^\star \in (0,1) is the solution to ΞΌβ€²(s⋆)=0\mu'(s^\star) = 0, equivalently Efs⋆ ⁣[β„“(Y)]=0,\mathbb{E}_{f_{s^\star}}\!\bigl[\ell(Y)\bigr] = 0, where fs(y)∝f0(y)1βˆ’sf1(y)sf_s(y) \propto f_0(y)^{1-s}f_1(y)^s is the tilted density.

ΞΌ\mu is (minus the log of) an expectation of eβˆ’sβ„“e^{-s \ell} under f0f_0 --- the log-moment-generating function of βˆ’β„“(Y)-\ell(Y). Log-MGFs are convex, so ΞΌ\mu is concave. The zero-mean condition at s⋆s^\star says: the tilted distribution fs⋆f_{s^\star} makes the LLR a zero-mean random variable --- a canonical large-deviations saddle point.

Definition:

Chernoff Information

The Chernoff information between f0f_0 and f1f_1 is C(f0,f1)β€…β€Š=β€…β€Šmax⁑s∈[0,1]ΞΌ(s)β€…β€Š=β€…β€Šβˆ’log⁑min⁑s∫f01βˆ’sf1s dy.C(f_0, f_1) \;=\; \max_{s \in [0,1]} \mu(s) \;=\; -\log \min_{s}\int f_0^{1-s}f_1^s\,dy. For nn i.i.d. observations, the Chernoff bound becomes Pe⋆≀π01βˆ’sΟ€1seβˆ’nΞΌ(s)P_e^\star \leq \pi_0^{1-s}\pi_1^s e^{-n\mu(s)}, so lim⁑nβ†’βˆžβˆ’1nlog⁑Peβ‹†β€…β€Šβ‰₯β€…β€ŠC(f0,f1).\lim_{n\to\infty} -\frac{1}{n}\log P_e^\star \;\geq\; C(f_0, f_1). In fact equality holds (Chernoff, 1952): C(f0,f1)C(f_0, f_1) is the best achievable error exponent under the Bayesian criterion.

Compare with Stein's lemma for the NP criterion, where the error exponents are the asymmetric KL divergences D(f0βˆ₯f1)D(f_0 \| f_1) and D(f1βˆ₯f0)D(f_1 \| f_0). The Chernoff information is the symmetric Bayes-optimal exponent; Stein's exponent is asymmetric.

Example: Chernoff Information for Two Gaussians

Compute ΞΌ(s)\mu(s) and C(f0,f1)C(f_0, f_1) for f0=N(ΞΌ0,Οƒ2)f_0 = \mathcal{N}(\mu_0, \sigma^2) and f1=N(ΞΌ1,Οƒ2)f_1 = \mathcal{N}(\mu_1, \sigma^2).

Chernoff Bound vs. Exact Error, and the Exponent ΞΌ(s)\mu(s)

For the Gaussian mean-shift problem with nn i.i.d. samples, compare the Chernoff bound, Bhattacharyya bound, and exact error probability as functions of nn. The right panel plots ΞΌ(s)\mu(s) and its maximiser s⋆s^\star.

Parameters
1

Normalised separation

30

Max number of samples

Numerical Computation of Chernoff Information

Complexity: O(Kβ‹…βˆ£Y∣)O(K \cdot |\mathcal{Y}|) for grid-based integration with KK values of ss.
Input: Densities f0, f1 on a grid Y
Output: Chernoff information C(f0, f1) and optimiser s*
1: for s in linspace(0, 1, K):
2: integrand[s] <- f0^(1-s) * f1^s # pointwise on Y
3: Z[s] <- trapezoid(integrand[s], Y) # numerical integral
4: mu[s] <- -log(Z[s])
5: s_star <- argmax_s mu[s]
6: C <- mu[s_star]
7: return C, s_star

For 1D or 2D problems grid quadrature suffices. For high-dimensional problems use Monte Carlo importance sampling with fs⋆f_{s^\star} as the proposal density.

⚠️Engineering Note

Chernoff Bounds in Coded Systems

In coded communications (Book CM, Book CC), the union bound on codeword error reads Peβ‰€βˆ‘mβ‰ mβ€²P2(mβ†’mβ€²)P_e \leq \sum_{m \neq m'} P_2(m \to m'), where P2(mβ†’mβ€²)P_2(m \to m') is a pairwise error probability --- exactly a binary hypothesis test between codewords mm and mβ€²m'. Applying the Chernoff bound to each pairwise test and then optimising ss yields the Bhattacharyya bound on codeword error, which dictates the error-exponent of Shannon's random-coding theorem. The Chernoff information therefore sits at the intersection of detection, coding, and information theory --- the same exponent appears in Gallager's random-coding bound, in Stein's lemma, and in large-deviation-rate functions.

πŸ“‹ Ref: Gallager 1968; Cover-Thomas 2006 Ch. 11
πŸŽ“CommIT Contribution(1999)

Chernoff-Bound Error Exponents for MIMO Detection

G. Caire, G. Taricco, E. Biglieri β€” IEEE Trans. Inf. Theory, vol. 44, no. 3

The CommIT group's work on bit-interleaved coded modulation (BICM) applies Chernoff-style bounding to pairwise decoding errors across a product channel, yielding the BICM capacity and random-coding exponent. The technique --- exponent optimisation over a tilting parameter --- is precisely the Chernoff bound of TChernoff Bound applied to the likelihood ratio between pairs of coded symbols after binary labelling. Chapter 5 of this book revisits the BICM exponent with the factor-graph tools developed in Parts IV-V.

bicmerror-exponentschernoffView Paper β†’

Common Mistake: Chernoff for One-Sided vs. Two-Sided Errors

Mistake:

Applying the Chernoff bound with s∈[0,1]s \in [0,1] to bound PFP_F alone (or PMP_M alone) in the Neyman-Pearson framework.

Correction:

Our Chernoff bound is a bound on the average MAP error PeP_e, which combines PFP_F and PMP_M via the priors. For one-sided error bounds, use Stein's lemma: βˆ’1nlog⁑PMβ†’D(f0βˆ₯f1)-\frac{1}{n}\log P_M \to D(f_0 \| f_1) when PF≀αP_F \leq \alpha is held fixed. The two exponents (DD vs. Chernoff information) are generally distinct.

Quick Check

For two Gaussians with equal variance and means differing by dd standard deviations, the Chernoff-optimal s⋆s^\star is:

s⋆=1/2s^\star = 1/2

s⋆=1/ds^\star = 1/d

s⋆=d2/8s^\star = d^2/8

s⋆s^\star depends on the priors.

Chernoff information

C(f0,f1)=max⁑s∈[0,1]βˆ’log⁑∫f01βˆ’sf1s dyC(f_0, f_1) = \max_{s \in [0,1]} -\log \int f_0^{1-s}f_1^s\,dy. It is the best achievable Bayesian error exponent for binary hypothesis testing with nn i.i.d. observations. Symmetric in (f0,f1)(f_0, f_1) and always non-negative, with C=0C = 0 iff f0=f1f_0 = f_1 a.e.

Related: KL divergence, Stein's lemma, Bhattacharyya coefficient

Bhattacharyya coefficient

ρB=∫f0f1 dy\rho_B = \int \sqrt{f_0 f_1}\,dy. The geometric-mean overlap of two densities, lying in [0,1][0,1]. Related to the Hellinger distance by H2(f0,f1)=2(1βˆ’ΟB)H^2(f_0, f_1) = 2(1 - \rho_B).

Related: Chernoff Information, Hellinger distance, error bound

Historical Note: Chernoff, 1952

1950s

Herman Chernoff (b. 1923) introduced his eponymous bound in "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations" (Ann. Math. Stat., 1952). The paper established the fundamental error exponent for Bayesian binary testing and, in so doing, laid groundwork for what would become the theory of large deviations. Chernoff's exponent-optimisation technique --- now taught in every information theory course --- is equally central to Shannon's random-coding exponents (1959) and to Gallager's error-exponent theory (1965).

Key Takeaway

For nn i.i.d. observations, the MAP error probability decays exponentially: Peβ‹†βˆΌeβˆ’nC(f0,f1)P_e^\star \sim e^{-n C(f_0, f_1)}. The Chernoff information C(f0,f1)C(f_0,f_1) is the exponent --- the single most important scalar summarising the difficulty of the hypothesis-testing problem. Every design question ("how much SNR do I need?" "how many samples suffice?") ultimately reduces to bounding or computing this exponent.