Ferkans — Interactive Telecom Tutor

How Fast Does Error Probability Vanish?

Shannon's channel coding theorem tells us that reliable communication is possible at any rate $R < C$ . But a system designer needs more: how many channel uses do I need to achieve a target error probability? The answer depends on the error exponent — the exponential rate at which $P_e^{(n)}$ decays with blocklength $n$ . There are three main exponents, each tight in a different rate regime: the random coding exponent $E_r(R)$ , the sphere-packing exponent $E_{sp}(R)$ (a converse), and the expurgated exponent $E_{ex}(R)$ (which improves on $E_r$ at low rates).

Definition:
Gallager's $E_0$ Function

For a DMC with transition probabilities $W(y|x)$ and input distribution $Q$ , Gallager's function is defined for $\rho \in [0, 1]$ as: $E_0(\rho, Q) = -\log \sum_{y \in \mathcal{Y}} \left[\sum_{x \in \mathcal{X}} Q(x) W(y|x)^{1/(1+\rho)}\right]^{1+\rho}.$ This function interpolates between $E_0(0, Q) = 0$ and relates to the mutual information via $\frac{\partial E_0}{\partial \rho}\big|_{\rho=0} = I(X;Y)$ under the joint distribution $Q \times W$ .

Gallager's function is the workhorse of error exponent analysis. Its properties — concavity in $\rho$ , convexity/concavity relationships — drive the entire theory.

Theorem: Random Coding Error Exponent

For a DMC $W$ and code rate $R$ , the random coding error exponent is $E_r(R) = \max_{Q} \max_{0 \leq \rho \leq 1} \left[E_0(\rho, Q) - \rho R\right].$ There exists a sequence of codes with $M = 2^{nR}$ codewords and maximum probability of error satisfying $P_e^{(n)} \leq 2^{-n E_r(R)}$ for all $n$ sufficiently large.

The random coding exponent is obtained by analyzing the average error probability over a random codebook ensemble. The parameter $\rho$ optimizes the tradeoff between the exponential number of competing codewords ( $2^{n\rho R}$ ) and the probability that any one of them causes confusion ( $2^{-nE_0(\rho, Q)}$ ). The point is that random codes are exponentially good — not just asymptotically reliable, but exponentially reliable.

Proof

Random coding ensemble

Generate $M = 2^{nR}$ codewords independently, each i.i.d. $\sim Q^n$ . The decoder uses maximum likelihood: given output $\mathbf{y}$ , decode to the codeword $\mathbf{x}_m$ maximizing $W^n(\mathbf{y}|\mathbf{x}_m)$ .

Bounding the pairwise error

For message $m = 1$ , the error event is $\bigcup_{m' \neq 1} \{W^n(\mathbf{y}|\mathbf{x}_{m'}) \geq W^n(\mathbf{y}|\mathbf{x}_1)\}$ . Using the union bound raised to power $\rho \in [0,1]$ (Gallager's trick): $P(\text{error} | m = 1) \leq \mathbb{E}\left[\left(\sum_{m' \neq 1} \frac{W^n(\mathbf{y}|\mathbf{x}_{m'})}{W^n(\mathbf{y}|\mathbf{x}_1)}\right)^\rho\right].$ The fractional power $\rho$ tightens the union bound.

Averaging over the ensemble

Taking the expectation over the random codebook and using independence: $\overline{P}_e \leq (M-1)^\rho \sum_{\mathbf{y}} \left[\sum_{\mathbf{x}} Q^n(\mathbf{x}) W^n(\mathbf{y}|\mathbf{x})^{1/(1+\rho)}\right]^{1+\rho}.$ For i.i.d. $Q$ and memoryless $W$ , this factorizes over $n$ coordinates, giving $\overline{P}_e \leq 2^{-n[E_0(\rho, Q) - \rho R]}$ . Optimizing over $\rho$ and $Q$ yields $E_r(R)$ .

Existence of a good code

Since the average error probability over the ensemble is at most $2^{-nE_r(R)}$ , there must exist at least one code in the ensemble achieving this bound. This is the probabilistic method — we do not construct the code explicitly.

,

Definition:
Critical Rate

The critical rate $R_{cr}$ is the rate at which the optimal $\rho$ in the random coding exponent transitions from $\rho = 1$ (for $R \leq R_{cr}$ ) to $\rho < 1$ (for $R > R_{cr}$ ). Formally: $R_{cr} = \frac{\partial E_0(\rho, Q^*)}{\partial \rho}\bigg|_{\rho=1}$ where $Q^*$ is the optimal input distribution. For $R > R_{cr}$ , the random coding exponent is tight (equals the sphere-packing exponent). For $R < R_{cr}$ , it is not — the expurgated exponent is tighter.

Theorem: Sphere-Packing Exponent (Converse)

For any sequence of codes with $M = 2^{nR}$ codewords and maximum probability of error $P_e^{(n)}$ : $\limsup_{n \to \infty} -\frac{1}{n} \log P_e^{(n)} \leq E_{sp}(R)$ where the sphere-packing exponent is $E_{sp}(R) = \max_Q \min_{V : I(Q, V) \leq R} D(V \| W | Q).$ Here $D(V \| W | Q) = \sum_{x} Q(x) \sum_{y} V(y|x) \log \frac{V(y|x)}{W(y|x)}$ is the conditional KL divergence.

The sphere-packing bound says: no code can do better than $E_{sp}(R)$ . The name comes from the geometric intuition that each codeword needs a "decoding sphere" around it in output space, and these spheres must be packed without overlap. The minimum KL divergence captures the cost of confusing two codewords — the "worst-case" channel perturbation that limits reliability. For $R \geq R_{cr}$ , we have $E_{sp}(R) = E_r(R)$ , so the random coding exponent is exact in this regime.

Proof

High-level idea

The proof considers the joint type of the transmitted codeword and the received output. For any code, the decoder must resolve between codewords whose outputs have similar types. The probability of confusion is bounded below by the probability that the channel output has a joint type consistent with a different codeword, which is governed by the conditional KL divergence between the "confusing" channel $V$ and the true channel $W$ .

Optimization structure

The inner minimization finds the "worst" auxiliary channel $V$ that is both plausible (has small KL divergence from $W$ ) and confusing (supports rate $R$ of mutual information). The outer maximization over $Q$ chooses the input distribution that makes confusion hardest. This is a saddle-point problem.

Theorem: Expurgated Exponent

For rates below the critical rate ( $R < R_{cr}$ ), the random coding exponent can be improved by expurgation — removing the worst codewords from the random codebook. The expurgated exponent is: $E_{ex}(R) = \max_Q \max_{\rho \geq 1} \left[E_x(\rho, Q) - \rho R\right]$ where $E_x(\rho, Q) = -\log \sum_{y} \left[\sum_{x} Q(x) W(y|x)^{1/\rho}\right]^\rho$ .

For $R < R_{cr}$ : $E_{ex}(R) > E_r(R) > 0$ .

The random coding ensemble occasionally produces "bad" codewords — pairs that are too close together and cause most of the errors. By expurgating (removing) the worst half of the codewords, we lose only one bit of rate but potentially gain a much larger error exponent. At low rates, where the codebook is small relative to the output space, this cleaning step is highly effective.

Proof

Pairwise error analysis

Instead of bounding the union of pairwise errors, we analyze the average pairwise error probability over the random ensemble. The key quantity is $\mathbb{E}[d(\mathbf{x}_1, \mathbf{x}_2)] = \sum_\mathbf{y} \sqrt{W^n(\mathbf{y}|\mathbf{x}_1) W^n(\mathbf{y}|\mathbf{x}_2)}$ which measures how "confusable" two codewords are.

Expurgation argument

By Markov's inequality, at most half the codewords can have above-average total pairwise error. Removing them leaves a codebook of size $M/2 = 2^{nR - 1}$ with all pairwise errors below twice the average. The loss of one bit in rate is negligible as $n \to \infty$ , but the improvement in exponent can be substantial.

Channel Coding Error Exponents

Compare the random coding exponent $E_r(R)$ , sphere-packing exponent $E_{sp}(R)$ , and expurgated exponent $E_{ex}(R)$ for the BSC. Observe the three regimes: below the critical rate (where expurgation helps), between the critical rate and capacity (where $E_r = E_{sp}$ ), and above capacity (where the exponents are zero).

Parameters

BSC crossover probability

p

0.1

Example: Error Exponents for the BSC

For a BSC with crossover probability $p = 0.1$ , compute the channel capacity, critical rate, and evaluate the random coding exponent at rate $R = 0.3$ bits/use.

Solution

Channel capacity

$C = 1 - H(p) = 1 - H(0.1) \approx 1 - 0.469 = 0.531$ bits/use.

Critical rate

For the BSC with uniform input ( $Q = (1/2, 1/2)$ , which is optimal by symmetry): $E_0(\rho, Q) = 1 - \log\left[(1-p)^{1/(1+\rho)} + p^{1/(1+\rho)}\right]^{1+\rho} + (1+\rho)\log 2 - \log 2$ The critical rate is $R_{cr} = E_0'(1, Q)$ . For $p = 0.1$ , numerical evaluation gives $R_{cr} \approx 0.234$ bits/use.

Random coding exponent at $\ntn{rate} = 0.3$

Since $R = 0.3 > R_{cr} \approx 0.234$ , we are in the regime where $E_r = E_{sp}$ , so the random coding exponent is tight. Numerically optimizing $E_r(0.3) = \max_\rho [E_0(\rho) - \rho \cdot 0.3]$ gives $E_r(0.3) \approx 0.076$ bits/use.

This means $P_e^{(n)} \approx 2^{-0.076n}$ . To achieve $P_e \leq 10^{-6}$ , we need $n \geq 20 / 0.076 \approx 263$ channel uses.

Comparison of Error Exponents

Property	Random Coding $E_r$	Sphere-Packing $E_{sp}$	Expurgated $E_{ex}$
Type	Achievability (lower bound on exponent)	Converse (upper bound on exponent)	Achievability (lower bound on exponent)
Rate range	$0 < R < C$	$0 < R < C$	$0 < R < R_{cr}$
Tight?	Yes for $R \geq R_{cr}$	Yes (always an upper bound)	Tighter than $E_r$ for $R < R_{cr}$
Key parameter	$\rho \in [0,1]$	Auxiliary channel $V$	$\rho \geq 1$
Proof technique	Random codebook + Gallager bound	Combinatorial sphere-packing	Expurgation of random codebook

Error Exponents vs. Practical Codes

The error exponent story has a subtle twist for code designers. Codes that are "good" in the error exponent sense — random codes — are computationally intractable to decode. Modern capacity-approaching codes (LDPC, polar, turbo) have zero error exponent at any fixed rate below capacity; their error probability decays as $e^{-\sqrt{n}}$ rather than $e^{-n}$ . These codes achieve capacity by operating at rates that approach $C$ as $n$ grows, trading error exponent for decoding complexity.

The point is that error exponents answer a different question than code design: they tell us the ultimate reliability at a fixed rate, while practical codes tell us how to achieve a target reliability at rates approaching capacity. Both perspectives are valuable.

⚠️Engineering Note

Beyond Error Exponents: Finite-Blocklength Bounds

For modern short-packet communications (5G URLLC, IoT), the blocklength $n$ is small (50–200 symbols) and the error exponent approximation is too loose. The normal approximation of Polyanskiy, Poor, and Verdú (2010) provides a much tighter characterization: $\log M^*(n, \epsilon) \approx nC - \sqrt{nV} Q^{-1}(\epsilon)$ where $V$ is the channel dispersion and $\epsilon$ is the target error probability. This finite-blocklength framework is covered in Chapter 26 of this book.

Historical Note: Robert Gallager and the Art of Bounding

1960s

The random coding error exponent was derived by Robert Gallager in his 1965 paper and refined in his 1968 textbook Information Theory and Reliable Communication. Gallager's key innovation was the " $\rho$ -trick" — raising the union bound to a fractional power $\rho \in (0,1]$ to tighten it. This seemingly simple idea yielded the tightest known achievability bounds for decades. The sphere-packing converse dates to Shannon, Gallager, and Berlekamp (1967), while the expurgated exponent combines ideas from Gallager with earlier work by Elias. Gallager went on to invent LDPC codes in 1962 — codes that were forgotten for 30 years and are now the standard in 5G NR. The irony is that LDPC codes achieve capacity through a fundamentally different mechanism (iterative decoding at rates approaching capacity) than the random coding framework that Gallager used to analyze them.

🎓CommIT Contribution(1998)

BICM Error Exponents

G. Caire, G. Taricco, E. Biglieri — IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 927–946

Caire, Taricco, and Biglieri extended the error exponent framework to bit-interleaved coded modulation (BICM), where the interleaver between encoder and modulator creates a mismatched decoding setting. The BICM mutual information is lower than the true channel mutual information (the price of interleaving), but the resulting error exponent analysis reveals that BICM provides excellent performance over fading channels — the diversity advantage compensates for the rate loss. This analysis uses the same Gallager-style bounding techniques developed in this chapter, applied to the effective "BICM channel." See Chapter 10.3 of this book for the full treatment.

BICMerror-exponentcoded-modulationView Paper →

Quick Check

For a DMC with capacity $C$ , what is the random coding error exponent at rate $R = C$ ?

$E_r(C) > 0$

$E_r(C) = 0$

$E_r(C) = C$

$E_r(C)$ is undefined

Correction:

E_r(C) = 0

At the Shannon limit $R = C$ , the exponent is zero. The error probability still vanishes (Shannon's theorem), but only sub-exponentially. Any rate $R < C$ gives a strictly positive exponent.

Why This Matters: Error Exponents over Fading Channels

Over fading channels, the error exponent depends on the fading statistics and CSI availability. With perfect CSIR, the outage exponent governs reliability in the slow-fading regime, while the Gallager exponent generalizes to ergodic fading. The diversity order of a code (the slope of the error probability vs. SNR curve on a log-log scale) is intimately related to the error exponent. See Book telecom, Ch. 14 for fading channel capacity and Book mimo, Ch. 3 for MIMO diversity-multiplexing tradeoff.

Channel Coding Error Exponents: Three Regimes

Animated construction of the three error exponent curves (

E_r

,

E_{sp}

,

E_{ex}

) for the BSC, showing the critical rate

R_{cr}

where random coding becomes tight and the zero-exponent behavior at capacity

C

.

Key Takeaway

Channel coding error exponents quantify how fast error probability vanishes with blocklength. Three exponents partition the rate axis: the expurgated exponent $E_{ex}$ for low rates, the random coding exponent $E_r$ for moderate rates (where it equals the sphere-packing converse $E_{sp}$ ), and zero exponent at capacity. The critical rate $R_{cr}$ marks the transition. These results, while fundamental, describe the performance of unstructured (random) codes — practical structured codes trade error exponent for decoding complexity.

Error Exponents for Channel Coding

How Fast Does Error Probability Vanish?

Definition: Gallager's E0E_0E0​ Function

Theorem: Random Coding Error Exponent

Random coding ensemble

Bounding the pairwise error

Averaging over the ensemble

Existence of a good code

Definition: Critical Rate

Theorem: Sphere-Packing Exponent (Converse)

High-level idea

Optimization structure

Theorem: Expurgated Exponent

Pairwise error analysis

Expurgation argument

Channel Coding Error Exponents

Parameters

Example: Error Exponents for the BSC

Channel capacity

Critical rate

Random coding exponent at $\ntn{rate} = 0.3$

Comparison of Error Exponents

Error Exponents vs. Practical Codes

Beyond Error Exponents: Finite-Blocklength Bounds

Historical Note: Robert Gallager and the Art of Bounding

BICM Error Exponents

Quick Check

Why This Matters: Error Exponents over Fading Channels

Channel Coding Error Exponents: Three Regimes

Key Takeaway

Definition:
Gallager's $E_0$ Function

Definition:
Critical Rate