Error Exponents for Channel Coding

How Fast Does Error Probability Vanish?

Shannon's channel coding theorem tells us that reliable communication is possible at any rate R<CR < C. But a system designer needs more: how many channel uses do I need to achieve a target error probability? The answer depends on the error exponent β€” the exponential rate at which Pe(n)P_e^{(n)} decays with blocklength nn. There are three main exponents, each tight in a different rate regime: the random coding exponent Er(R)E_r(R), the sphere-packing exponent Esp(R)E_{sp}(R) (a converse), and the expurgated exponent Eex(R)E_{ex}(R) (which improves on ErE_r at low rates).

Definition:

Gallager's E0E_0 Function

For a DMC with transition probabilities W(y∣x)W(y|x) and input distribution QQ, Gallager's function is defined for ρ∈[0,1]\rho \in [0, 1] as: E0(ρ,Q)=βˆ’logβ‘βˆ‘y∈Y[βˆ‘x∈XQ(x)W(y∣x)1/(1+ρ)]1+ρ.E_0(\rho, Q) = -\log \sum_{y \in \mathcal{Y}} \left[\sum_{x \in \mathcal{X}} Q(x) W(y|x)^{1/(1+\rho)}\right]^{1+\rho}. This function interpolates between E0(0,Q)=0E_0(0, Q) = 0 and relates to the mutual information via βˆ‚E0βˆ‚Οβˆ£Ο=0=I(X;Y)\frac{\partial E_0}{\partial \rho}\big|_{\rho=0} = I(X;Y) under the joint distribution QΓ—WQ \times W.

Gallager's function is the workhorse of error exponent analysis. Its properties β€” concavity in ρ\rho, convexity/concavity relationships β€” drive the entire theory.

Theorem: Random Coding Error Exponent

For a DMC WW and code rate RR, the random coding error exponent is Er(R)=max⁑Qmax⁑0≀ρ≀1[E0(ρ,Q)βˆ’ΟR].E_r(R) = \max_{Q} \max_{0 \leq \rho \leq 1} \left[E_0(\rho, Q) - \rho R\right]. There exists a sequence of codes with M=2nRM = 2^{nR} codewords and maximum probability of error satisfying Pe(n)≀2βˆ’nEr(R)P_e^{(n)} \leq 2^{-n E_r(R)} for all nn sufficiently large.

The random coding exponent is obtained by analyzing the average error probability over a random codebook ensemble. The parameter ρ\rho optimizes the tradeoff between the exponential number of competing codewords (2nρR2^{n\rho R}) and the probability that any one of them causes confusion (2βˆ’nE0(ρ,Q)2^{-nE_0(\rho, Q)}). The point is that random codes are exponentially good β€” not just asymptotically reliable, but exponentially reliable.

,

Definition:

Critical Rate

The critical rate RcrR_{cr} is the rate at which the optimal ρ\rho in the random coding exponent transitions from ρ=1\rho = 1 (for R≀RcrR \leq R_{cr}) to ρ<1\rho < 1 (for R>RcrR > R_{cr}). Formally: Rcr=βˆ‚E0(ρ,Qβˆ—)βˆ‚Οβˆ£Ο=1R_{cr} = \frac{\partial E_0(\rho, Q^*)}{\partial \rho}\bigg|_{\rho=1} where Qβˆ—Q^* is the optimal input distribution. For R>RcrR > R_{cr}, the random coding exponent is tight (equals the sphere-packing exponent). For R<RcrR < R_{cr}, it is not β€” the expurgated exponent is tighter.

Theorem: Sphere-Packing Exponent (Converse)

For any sequence of codes with M=2nRM = 2^{nR} codewords and maximum probability of error Pe(n)P_e^{(n)}: lim sup⁑nβ†’βˆžβˆ’1nlog⁑Pe(n)≀Esp(R)\limsup_{n \to \infty} -\frac{1}{n} \log P_e^{(n)} \leq E_{sp}(R) where the sphere-packing exponent is Esp(R)=max⁑Qmin⁑V:I(Q,V)≀RD(Vβˆ₯W∣Q).E_{sp}(R) = \max_Q \min_{V : I(Q, V) \leq R} D(V \| W | Q). Here D(Vβˆ₯W∣Q)=βˆ‘xQ(x)βˆ‘yV(y∣x)log⁑V(y∣x)W(y∣x)D(V \| W | Q) = \sum_{x} Q(x) \sum_{y} V(y|x) \log \frac{V(y|x)}{W(y|x)} is the conditional KL divergence.

The sphere-packing bound says: no code can do better than Esp(R)E_{sp}(R). The name comes from the geometric intuition that each codeword needs a "decoding sphere" around it in output space, and these spheres must be packed without overlap. The minimum KL divergence captures the cost of confusing two codewords β€” the "worst-case" channel perturbation that limits reliability. For Rβ‰₯RcrR \geq R_{cr}, we have Esp(R)=Er(R)E_{sp}(R) = E_r(R), so the random coding exponent is exact in this regime.

Theorem: Expurgated Exponent

For rates below the critical rate (R<RcrR < R_{cr}), the random coding exponent can be improved by expurgation β€” removing the worst codewords from the random codebook. The expurgated exponent is: Eex(R)=max⁑Qmax⁑ρβ‰₯1[Ex(ρ,Q)βˆ’ΟR]E_{ex}(R) = \max_Q \max_{\rho \geq 1} \left[E_x(\rho, Q) - \rho R\right] where Ex(ρ,Q)=βˆ’logβ‘βˆ‘y[βˆ‘xQ(x)W(y∣x)1/ρ]ρE_x(\rho, Q) = -\log \sum_{y} \left[\sum_{x} Q(x) W(y|x)^{1/\rho}\right]^\rho.

For R<RcrR < R_{cr}: Eex(R)>Er(R)>0E_{ex}(R) > E_r(R) > 0.

The random coding ensemble occasionally produces "bad" codewords β€” pairs that are too close together and cause most of the errors. By expurgating (removing) the worst half of the codewords, we lose only one bit of rate but potentially gain a much larger error exponent. At low rates, where the codebook is small relative to the output space, this cleaning step is highly effective.

Channel Coding Error Exponents

Compare the random coding exponent Er(R)E_r(R), sphere-packing exponent Esp(R)E_{sp}(R), and expurgated exponent Eex(R)E_{ex}(R) for the BSC. Observe the three regimes: below the critical rate (where expurgation helps), between the critical rate and capacity (where Er=EspE_r = E_{sp}), and above capacity (where the exponents are zero).

Parameters
0.1

Example: Error Exponents for the BSC

For a BSC with crossover probability p=0.1p = 0.1, compute the channel capacity, critical rate, and evaluate the random coding exponent at rate R=0.3R = 0.3 bits/use.

Comparison of Error Exponents

PropertyRandom Coding ErE_rSphere-Packing EspE_{sp}Expurgated EexE_{ex}
TypeAchievability (lower bound on exponent)Converse (upper bound on exponent)Achievability (lower bound on exponent)
Rate range0<R<C0 < R < C0<R<C0 < R < C0<R<Rcr0 < R < R_{cr}
Tight?Yes for Rβ‰₯RcrR \geq R_{cr}Yes (always an upper bound)Tighter than ErE_r for R<RcrR < R_{cr}
Key parameterρ∈[0,1]\rho \in [0,1]Auxiliary channel VVρβ‰₯1\rho \geq 1
Proof techniqueRandom codebook + Gallager boundCombinatorial sphere-packingExpurgation of random codebook

Error Exponents vs. Practical Codes

The error exponent story has a subtle twist for code designers. Codes that are "good" in the error exponent sense β€” random codes β€” are computationally intractable to decode. Modern capacity-approaching codes (LDPC, polar, turbo) have zero error exponent at any fixed rate below capacity; their error probability decays as eβˆ’ne^{-\sqrt{n}} rather than eβˆ’ne^{-n}. These codes achieve capacity by operating at rates that approach CC as nn grows, trading error exponent for decoding complexity.

The point is that error exponents answer a different question than code design: they tell us the ultimate reliability at a fixed rate, while practical codes tell us how to achieve a target reliability at rates approaching capacity. Both perspectives are valuable.

⚠️Engineering Note

Beyond Error Exponents: Finite-Blocklength Bounds

For modern short-packet communications (5G URLLC, IoT), the blocklength nn is small (50–200 symbols) and the error exponent approximation is too loose. The normal approximation of Polyanskiy, Poor, and VerdΓΊ (2010) provides a much tighter characterization: log⁑Mβˆ—(n,Ο΅)β‰ˆnCβˆ’nVQβˆ’1(Ο΅)\log M^*(n, \epsilon) \approx nC - \sqrt{nV} Q^{-1}(\epsilon) where VV is the channel dispersion and Ο΅\epsilon is the target error probability. This finite-blocklength framework is covered in Chapter 26 of this book.

Historical Note: Robert Gallager and the Art of Bounding

1960s

The random coding error exponent was derived by Robert Gallager in his 1965 paper and refined in his 1968 textbook Information Theory and Reliable Communication. Gallager's key innovation was the "ρ\rho-trick" β€” raising the union bound to a fractional power ρ∈(0,1]\rho \in (0,1] to tighten it. This seemingly simple idea yielded the tightest known achievability bounds for decades. The sphere-packing converse dates to Shannon, Gallager, and Berlekamp (1967), while the expurgated exponent combines ideas from Gallager with earlier work by Elias. Gallager went on to invent LDPC codes in 1962 β€” codes that were forgotten for 30 years and are now the standard in 5G NR. The irony is that LDPC codes achieve capacity through a fundamentally different mechanism (iterative decoding at rates approaching capacity) than the random coding framework that Gallager used to analyze them.

πŸŽ“CommIT Contribution(1998)

BICM Error Exponents

G. Caire, G. Taricco, E. Biglieri β€” IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 927–946

Caire, Taricco, and Biglieri extended the error exponent framework to bit-interleaved coded modulation (BICM), where the interleaver between encoder and modulator creates a mismatched decoding setting. The BICM mutual information is lower than the true channel mutual information (the price of interleaving), but the resulting error exponent analysis reveals that BICM provides excellent performance over fading channels β€” the diversity advantage compensates for the rate loss. This analysis uses the same Gallager-style bounding techniques developed in this chapter, applied to the effective "BICM channel." See Chapter 10.3 of this book for the full treatment.

BICMerror-exponentcoded-modulationView Paper β†’

Quick Check

For a DMC with capacity CC, what is the random coding error exponent at rate R=CR = C?

Er(C)>0E_r(C) > 0

Er(C)=0E_r(C) = 0

Er(C)=CE_r(C) = C

Er(C)E_r(C) is undefined

Why This Matters: Error Exponents over Fading Channels

Over fading channels, the error exponent depends on the fading statistics and CSI availability. With perfect CSIR, the outage exponent governs reliability in the slow-fading regime, while the Gallager exponent generalizes to ergodic fading. The diversity order of a code (the slope of the error probability vs. SNR curve on a log-log scale) is intimately related to the error exponent. See Book telecom, Ch. 14 for fading channel capacity and Book mimo, Ch. 3 for MIMO diversity-multiplexing tradeoff.

Channel Coding Error Exponents: Three Regimes

Animated construction of the three error exponent curves (ErE_r, EspE_{sp}, EexE_{ex}) for the BSC, showing the critical rate RcrR_{cr} where random coding becomes tight and the zero-exponent behavior at capacity CC.

Key Takeaway

Channel coding error exponents quantify how fast error probability vanishes with blocklength. Three exponents partition the rate axis: the expurgated exponent EexE_{ex} for low rates, the random coding exponent ErE_r for moderate rates (where it equals the sphere-packing converse EspE_{sp}), and zero exponent at capacity. The critical rate RcrR_{cr} marks the transition. These results, while fundamental, describe the performance of unstructured (random) codes β€” practical structured codes trade error exponent for decoding complexity.