Error Exponents for Channel Coding
How Fast Does Error Probability Vanish?
Shannon's channel coding theorem tells us that reliable communication is possible at any rate . But a system designer needs more: how many channel uses do I need to achieve a target error probability? The answer depends on the error exponent β the exponential rate at which decays with blocklength . There are three main exponents, each tight in a different rate regime: the random coding exponent , the sphere-packing exponent (a converse), and the expurgated exponent (which improves on at low rates).
Definition: Gallager's Function
Gallager's Function
For a DMC with transition probabilities and input distribution , Gallager's function is defined for as: This function interpolates between and relates to the mutual information via under the joint distribution .
Gallager's function is the workhorse of error exponent analysis. Its properties β concavity in , convexity/concavity relationships β drive the entire theory.
Theorem: Random Coding Error Exponent
For a DMC and code rate , the random coding error exponent is There exists a sequence of codes with codewords and maximum probability of error satisfying for all sufficiently large.
The random coding exponent is obtained by analyzing the average error probability over a random codebook ensemble. The parameter optimizes the tradeoff between the exponential number of competing codewords () and the probability that any one of them causes confusion (). The point is that random codes are exponentially good β not just asymptotically reliable, but exponentially reliable.
Random coding ensemble
Generate codewords independently, each i.i.d. . The decoder uses maximum likelihood: given output , decode to the codeword maximizing .
Bounding the pairwise error
For message , the error event is . Using the union bound raised to power (Gallager's trick): The fractional power tightens the union bound.
Averaging over the ensemble
Taking the expectation over the random codebook and using independence: For i.i.d. and memoryless , this factorizes over coordinates, giving . Optimizing over and yields .
Existence of a good code
Since the average error probability over the ensemble is at most , there must exist at least one code in the ensemble achieving this bound. This is the probabilistic method β we do not construct the code explicitly.
Definition: Critical Rate
Critical Rate
The critical rate is the rate at which the optimal in the random coding exponent transitions from (for ) to (for ). Formally: where is the optimal input distribution. For , the random coding exponent is tight (equals the sphere-packing exponent). For , it is not β the expurgated exponent is tighter.
Theorem: Sphere-Packing Exponent (Converse)
For any sequence of codes with codewords and maximum probability of error : where the sphere-packing exponent is Here is the conditional KL divergence.
The sphere-packing bound says: no code can do better than . The name comes from the geometric intuition that each codeword needs a "decoding sphere" around it in output space, and these spheres must be packed without overlap. The minimum KL divergence captures the cost of confusing two codewords β the "worst-case" channel perturbation that limits reliability. For , we have , so the random coding exponent is exact in this regime.
High-level idea
The proof considers the joint type of the transmitted codeword and the received output. For any code, the decoder must resolve between codewords whose outputs have similar types. The probability of confusion is bounded below by the probability that the channel output has a joint type consistent with a different codeword, which is governed by the conditional KL divergence between the "confusing" channel and the true channel .
Optimization structure
The inner minimization finds the "worst" auxiliary channel that is both plausible (has small KL divergence from ) and confusing (supports rate of mutual information). The outer maximization over chooses the input distribution that makes confusion hardest. This is a saddle-point problem.
Theorem: Expurgated Exponent
For rates below the critical rate (), the random coding exponent can be improved by expurgation β removing the worst codewords from the random codebook. The expurgated exponent is: where .
For : .
The random coding ensemble occasionally produces "bad" codewords β pairs that are too close together and cause most of the errors. By expurgating (removing) the worst half of the codewords, we lose only one bit of rate but potentially gain a much larger error exponent. At low rates, where the codebook is small relative to the output space, this cleaning step is highly effective.
Pairwise error analysis
Instead of bounding the union of pairwise errors, we analyze the average pairwise error probability over the random ensemble. The key quantity is which measures how "confusable" two codewords are.
Expurgation argument
By Markov's inequality, at most half the codewords can have above-average total pairwise error. Removing them leaves a codebook of size with all pairwise errors below twice the average. The loss of one bit in rate is negligible as , but the improvement in exponent can be substantial.
Channel Coding Error Exponents
Compare the random coding exponent , sphere-packing exponent , and expurgated exponent for the BSC. Observe the three regimes: below the critical rate (where expurgation helps), between the critical rate and capacity (where ), and above capacity (where the exponents are zero).
Parameters
Example: Error Exponents for the BSC
For a BSC with crossover probability , compute the channel capacity, critical rate, and evaluate the random coding exponent at rate bits/use.
Channel capacity
bits/use.
Critical rate
For the BSC with uniform input (, which is optimal by symmetry): The critical rate is . For , numerical evaluation gives bits/use.
Random coding exponent at $\ntn{rate} = 0.3$
Since , we are in the regime where , so the random coding exponent is tight. Numerically optimizing gives bits/use.
This means . To achieve , we need channel uses.
Comparison of Error Exponents
| Property | Random Coding | Sphere-Packing | Expurgated |
|---|---|---|---|
| Type | Achievability (lower bound on exponent) | Converse (upper bound on exponent) | Achievability (lower bound on exponent) |
| Rate range | |||
| Tight? | Yes for | Yes (always an upper bound) | Tighter than for |
| Key parameter | Auxiliary channel | ||
| Proof technique | Random codebook + Gallager bound | Combinatorial sphere-packing | Expurgation of random codebook |
Error Exponents vs. Practical Codes
The error exponent story has a subtle twist for code designers. Codes that are "good" in the error exponent sense β random codes β are computationally intractable to decode. Modern capacity-approaching codes (LDPC, polar, turbo) have zero error exponent at any fixed rate below capacity; their error probability decays as rather than . These codes achieve capacity by operating at rates that approach as grows, trading error exponent for decoding complexity.
The point is that error exponents answer a different question than code design: they tell us the ultimate reliability at a fixed rate, while practical codes tell us how to achieve a target reliability at rates approaching capacity. Both perspectives are valuable.
Beyond Error Exponents: Finite-Blocklength Bounds
For modern short-packet communications (5G URLLC, IoT), the blocklength is small (50β200 symbols) and the error exponent approximation is too loose. The normal approximation of Polyanskiy, Poor, and VerdΓΊ (2010) provides a much tighter characterization: where is the channel dispersion and is the target error probability. This finite-blocklength framework is covered in Chapter 26 of this book.
Historical Note: Robert Gallager and the Art of Bounding
1960sThe random coding error exponent was derived by Robert Gallager in his 1965 paper and refined in his 1968 textbook Information Theory and Reliable Communication. Gallager's key innovation was the "-trick" β raising the union bound to a fractional power to tighten it. This seemingly simple idea yielded the tightest known achievability bounds for decades. The sphere-packing converse dates to Shannon, Gallager, and Berlekamp (1967), while the expurgated exponent combines ideas from Gallager with earlier work by Elias. Gallager went on to invent LDPC codes in 1962 β codes that were forgotten for 30 years and are now the standard in 5G NR. The irony is that LDPC codes achieve capacity through a fundamentally different mechanism (iterative decoding at rates approaching capacity) than the random coding framework that Gallager used to analyze them.
BICM Error Exponents
Caire, Taricco, and Biglieri extended the error exponent framework to bit-interleaved coded modulation (BICM), where the interleaver between encoder and modulator creates a mismatched decoding setting. The BICM mutual information is lower than the true channel mutual information (the price of interleaving), but the resulting error exponent analysis reveals that BICM provides excellent performance over fading channels β the diversity advantage compensates for the rate loss. This analysis uses the same Gallager-style bounding techniques developed in this chapter, applied to the effective "BICM channel." See Chapter 10.3 of this book for the full treatment.
Quick Check
For a DMC with capacity , what is the random coding error exponent at rate ?
is undefined
At the Shannon limit , the exponent is zero. The error probability still vanishes (Shannon's theorem), but only sub-exponentially. Any rate gives a strictly positive exponent.
Why This Matters: Error Exponents over Fading Channels
Over fading channels, the error exponent depends on the fading statistics and CSI availability. With perfect CSIR, the outage exponent governs reliability in the slow-fading regime, while the Gallager exponent generalizes to ergodic fading. The diversity order of a code (the slope of the error probability vs. SNR curve on a log-log scale) is intimately related to the error exponent. See Book telecom, Ch. 14 for fading channel capacity and Book mimo, Ch. 3 for MIMO diversity-multiplexing tradeoff.
Channel Coding Error Exponents: Three Regimes
Key Takeaway
Channel coding error exponents quantify how fast error probability vanishes with blocklength. Three exponents partition the rate axis: the expurgated exponent for low rates, the random coding exponent for moderate rates (where it equals the sphere-packing converse ), and zero exponent at capacity. The critical rate marks the transition. These results, while fundamental, describe the performance of unstructured (random) codes β practical structured codes trade error exponent for decoding complexity.