Performance Bounds: Bhattacharyya and Chernoff
When Exact Error Probabilities Are Impossible
For Gaussian mean shifts the exact error probability is --- closed form. For almost every other problem --- correlated Gaussians with different covariances, non-Gaussian noise, coded signals over fading channels --- the integral admits no closed form. We need upper bounds that (a) are tight enough to predict system performance, (b) reveal the dominant exponent as the dimension grows. Bhattacharyya and Chernoff are the two workhorse bounds, and the exponents they produce --- particularly the Chernoff information --- are the fundamental limits that drive capacity-error-tradeoff theory and large-deviations analysis.
Theorem: Bhattacharyya Bound
Under the MAP rule with priors , the average error probability satisfies where is the Bhattacharyya coefficient.
is the geometric-mean overlap between and . When , ; when the supports are disjoint, . It is where is the squared Hellinger distance --- a proper metric on probability distributions. The bound says: the less the two densities overlap, the smaller the error probability.
Start from .
Use for .
Exact error under MAP
The MAP rule decides where is larger, and errs on the complement. Therefore
Geometric-arithmetic bound on $\min$
For any , (since ). Hence
Integrate
$
Example: Bhattacharyya Coefficient for Two Gaussians
Compute for and , and compare the bound to the exact (equal priors).
Product of densities
(y-\mu_0)^2 + (y-\mu_1)^2 = 2(y - \bar\mu)^2 + (\mu_0-\mu_1)^2/2\bar\mu = (\mu_0+\mu_1)/2$.
Integrate
$
Error bound vs. exact
With (the normalized separation), For : bound , exact . For : bound , exact . The bound tracks the exponent but is off by a polynomial factor, reflecting the well-known fact that .
Theorem: Chernoff Bound
For any and any priors , the MAP error probability satisfies where . The optimised bound, is the Chernoff bound.
The Chernoff bound is a family of tilted Bhattacharyya bounds: recovers Bhattacharyya exactly. Optimising over selects the tilt that best concentrates mass on the boundary region where the MAP rule errs --- the saddle point of the exponent. The function is concave on and vanishes at , so the maximum is interior.
Use the elementary bound for any and .
Substitute into the exact error expression .
Elementary inequality
For and , we prove . Assume ; then . The case is symmetric.
Apply pointwise
Set and :
Integrate and optimise
Integrate over : The bound holds for each ; take the minimum.
Theorem: The Chernoff Exponent Is Concave
The function is concave on , with , and for whenever . Its maximum is the solution to , equivalently where is the tilted density.
is (minus the log of) an expectation of under --- the log-moment-generating function of . Log-MGFs are convex, so is concave. The zero-mean condition at says: the tilted distribution makes the LLR a zero-mean random variable --- a canonical large-deviations saddle point.
Introduce the log-MGF
Let and consider . Then Hence , which is the negative of the log-MGF of under .
Concavity from HΓΆlder
The log-MGF is convex in (a standard consequence of HΓΆlder's inequality applied to ). Therefore is concave.
Boundary values
At : , so . At : , so . For , integrates to by HΓΆlder, with strict inequality unless a.e. Hence strictly.
Optimality condition
By concavity, the maximum of solves . Differentiating, where is the tilted density. Setting this to zero: .
Definition: Chernoff Information
Chernoff Information
The Chernoff information between and is For i.i.d. observations, the Chernoff bound becomes , so In fact equality holds (Chernoff, 1952): is the best achievable error exponent under the Bayesian criterion.
Compare with Stein's lemma for the NP criterion, where the error exponents are the asymmetric KL divergences and . The Chernoff information is the symmetric Bayes-optimal exponent; Stein's exponent is asymmetric.
Example: Chernoff Information for Two Gaussians
Compute and for and .
Compute the integral
By the completing-the-square argument of EBhattacharyya Coefficient for Two Gaussians generalised, Hence where .
Optimise over $s$
is a parabola in with maximum at , yielding Bhattacharyya's bound is already Chernoff-optimal for equal-variance Gaussians --- a special property of the symmetric family.
Interpretation
For i.i.d. samples and equal priors, The exponent matches exactly; Chernoff misses only the polynomial correction.
Chernoff Bound vs. Exact Error, and the Exponent
For the Gaussian mean-shift problem with i.i.d. samples, compare the Chernoff bound, Bhattacharyya bound, and exact error probability as functions of . The right panel plots and its maximiser .
Parameters
Normalised separation
Max number of samples
Numerical Computation of Chernoff Information
Complexity: for grid-based integration with values of .For 1D or 2D problems grid quadrature suffices. For high-dimensional problems use Monte Carlo importance sampling with as the proposal density.
Chernoff Bounds in Coded Systems
In coded communications (Book CM, Book CC), the union bound on codeword error reads , where is a pairwise error probability --- exactly a binary hypothesis test between codewords and . Applying the Chernoff bound to each pairwise test and then optimising yields the Bhattacharyya bound on codeword error, which dictates the error-exponent of Shannon's random-coding theorem. The Chernoff information therefore sits at the intersection of detection, coding, and information theory --- the same exponent appears in Gallager's random-coding bound, in Stein's lemma, and in large-deviation-rate functions.
Chernoff-Bound Error Exponents for MIMO Detection
The CommIT group's work on bit-interleaved coded modulation (BICM) applies Chernoff-style bounding to pairwise decoding errors across a product channel, yielding the BICM capacity and random-coding exponent. The technique --- exponent optimisation over a tilting parameter --- is precisely the Chernoff bound of TChernoff Bound applied to the likelihood ratio between pairs of coded symbols after binary labelling. Chapter 5 of this book revisits the BICM exponent with the factor-graph tools developed in Parts IV-V.
Common Mistake: Chernoff for One-Sided vs. Two-Sided Errors
Mistake:
Applying the Chernoff bound with to bound alone (or alone) in the Neyman-Pearson framework.
Correction:
Our Chernoff bound is a bound on the average MAP error , which combines and via the priors. For one-sided error bounds, use Stein's lemma: when is held fixed. The two exponents ( vs. Chernoff information) are generally distinct.
Quick Check
For two Gaussians with equal variance and means differing by standard deviations, the Chernoff-optimal is:
depends on the priors.
The Gaussian is a symmetric parabola, so its max is at the midpoint . This is why Bhattacharyya is tight for equal-covariance Gaussians.
Chernoff information
. It is the best achievable Bayesian error exponent for binary hypothesis testing with i.i.d. observations. Symmetric in and always non-negative, with iff a.e.
Related: KL divergence, Stein's lemma, Bhattacharyya coefficient
Bhattacharyya coefficient
. The geometric-mean overlap of two densities, lying in . Related to the Hellinger distance by .
Related: Chernoff Information, Hellinger distance, error bound
Historical Note: Chernoff, 1952
1950sHerman Chernoff (b. 1923) introduced his eponymous bound in "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations" (Ann. Math. Stat., 1952). The paper established the fundamental error exponent for Bayesian binary testing and, in so doing, laid groundwork for what would become the theory of large deviations. Chernoff's exponent-optimisation technique --- now taught in every information theory course --- is equally central to Shannon's random-coding exponents (1959) and to Gallager's error-exponent theory (1965).
Key Takeaway
For i.i.d. observations, the MAP error probability decays exponentially: . The Chernoff information is the exponent --- the single most important scalar summarising the difficulty of the hypothesis-testing problem. Every design question ("how much SNR do I need?" "how many samples suffice?") ultimately reduces to bounding or computing this exponent.