Ferkans — Interactive Telecom Tutor

From Rate to Exponent: How Fast Does the Error Probability Decay?

Sections 1–3 told us which rates are achievable. Section 4 asks how quickly the error probability decays as blocklength $N$ grows, at each of those rates. The quantitative answer is the random-coding error exponent $E_r(R)$ : the ensemble-average error probability of a random code of blocklength $N$ and rate $R$ is bounded by $\bar P_e \le e^{-N E_r(R)}$ . The exponent $E_r(R)$ is positive for $R < C$ and decreases to zero at $R = C$ — i.e., the capacity is the rate below which the exponent is strictly positive.

Gallager (1968) wrote the exponent in the canonical form $E_r(R) = \max_{0 \le \rho \le 1} [E_0(\rho, q) - \rho R],$ where $E_0(\rho, q)$ is the Gallager function associated with the channel and input distribution $q$ . This is a Lagrangian-dual relation: $E_0(\rho)$ is convex in $\rho$ with $E_0(0) = 0$ and $E_0'(0) = C$ , so the maximum in $\rho$ is attained at an interior point whose value depends on $R$ .

For BICM with the mismatched product metric, the Gallager function specialises cleanly to $E_0^{\mathrm{BICM}}(\rho) = -\log \sum_y \left(\tfrac{1}{L}\sum_\ell \tfrac{1}{2}\sum_b p_{W_\ell}(y\mid b)^{1/(1+\rho)}\right)^{1+\rho}$ (up to the scaling conventions), and the exponent $E_r^{\mathrm{BICM}}(R)$ is obtained by the same Lagrangian. The main qualitative result — due to Guillén-Martínez-Caire 2008 — is that $E_r^{\mathrm{BICM}}(R) \;\le\; E_r^{\mathrm{CM}}(R) \quad \text{for all } R \le I^{\mathrm{BICM}},$ with equality at $R = 0$ and strict inequality for $R > 0$ . The exponent gap is the error-exponent price of the mismatched product metric — a strictly positive version of the capacity gap from §2.

Two practical takeaways. First, the exponent gap is small under Gray labelling on AWGN: the BICM and CM exponents differ by $\lesssim$ a factor of 1.1–1.2 at rates of practical interest. This says that, at a given rate, the required blocklength to hit a given error probability is only slightly larger for BICM than for CM — often a negligible penalty. Second, the exponent gap is large under SP labelling: the SP product metric discards so much joint information that the BICM exponent can be a factor of 2 below the CM exponent, making SP-BICM materially worse than Gray-BICM at practical blocklengths.

This section formalises the Gallager-function specialisation to BICM, proves the exponent-ordering theorem, and sets up the cutoff rate $R_0 = E_0(1)$ — the subject of §5.

,

Definition:
Gallager's $E_0(\rho)$ for BICM (Product Metric)

For the BICM parallel-channel model of Def. $L$ $L$ Parallel Binary Channels" data-ref-type="definition">DBICM as $L$ Parallel Binary Channels with the product bit metric, the Gallager function at parameter $\rho \in [0, 1]$ and decoder scaling $s > 0$ is $E_0^{\mathrm{BICM}}(\rho, s) \;=\; -\log \frac{1}{L}\sum_{\ell = 0}^{L-1} \int_y \!\Big( \tfrac{1}{2}\!\sum_{b \in \{0,1\}} p_{W_\ell}(y\mid b)^{s/(1+\rho)}\Big)^{1+\rho}\, dy.$ The corresponding BICM random-coding exponent is $E_r^{\mathrm{BICM}}(R) \;=\; \max_{0 \le \rho \le 1}\;\max_{s > 0}\; [\, E_0^{\mathrm{BICM}}(\rho, s) - \rho R\, ].$ By construction $E_0^{\mathrm{BICM}}(0, s) = 0$ and $\partial E_0^{\mathrm{BICM}}/\partial \rho \big|_{\rho = 0} = I^{\mathrm{GMI}}(s)$ ; so the BICM exponent is strictly positive for any $R < \sup_s I^{\mathrm{GMI}}(s) = R^\star_{\rm BICM}$ and vanishes at the GMI capacity.

The CM exponent $E_r^{\mathrm{CM}}(R)$ is defined by the standard Gallager formula with the symbol-level channel law $p(y\mid x)$ and uniform input on $\mathcal{X}$ : $E_0^{\mathrm{CM}}(\rho) \;=\; -\log \int_y \!\Big( \tfrac{1}{M}\! \sum_{x \in \mathcal{X}} p(y\mid x)^{1/(1+\rho)}\Big)^{1+\rho}\, dy, \quad E_r^{\mathrm{CM}}(R) = \max_{0\le\rho\le 1}[E_0^{\mathrm{CM}}(\rho)-\rho R].$

Three points:

The BICM Gallager function averages over $\ell$ (the outer $\tfrac{1}{L}\sum_\ell$ ) because the interleaver uniformly randomises which bit position each coded bit lands in. The average is outside the integral over $y$ and outside the $(1+\rho)$ -power — a consequence of Jensen-type bounds in the derivation.
The CM Gallager function is a single integral over $y$ with the $(1+\rho)$ -power of the uniformly-averaged input — the natural Gallager formula for a symmetric $M$ -ary-input discrete memoryless channel.
The two functions are related by an application of Jensen's inequality on $(\cdot)^{1+\rho}$ (a concave function for $\rho < 0$ , convex for $\rho > 0$ ) — which is exactly what yields the exponent-ordering theorem below.

,

Theorem: BICM Random-Coding Exponent $\le$ CM Exponent

For any memoryless channel, any constellation $\mathcal{X}$ , and any labelling $\mu$ , $E_r^{\mathrm{BICM}}(R) \;\le\; E_r^{\mathrm{CM}}(R) \qquad \text{for all } R \in [0, I^{\mathrm{GMI}}(s^\star)],$ with equality at $R = 0$ and strict inequality for $R > 0$ on generic channels (i.e., whenever the labelling induces a chain-rule gap). Equivalently, the product-metric Gallager function is pointwise below the joint-metric one: $E_0^{\mathrm{BICM}}(\rho, s^\star) \;\le\; E_0^{\mathrm{CM}}(\rho) \qquad \forall\; \rho \in [0, 1],$ with equality at $\rho = 0$ .

The BICM decoder throws away the joint symbol likelihood, keeping only the product of marginals. The random-coding exponent measures the discrimination power of the decoder against incorrect codewords — how quickly the probability of confusing any two codewords decays with blocklength. A more informative metric discriminates better; the joint likelihood is more informative than the product metric; hence $E_r^{\mathrm{CM}} \ge E_r^{\mathrm{BICM}}$ , with the gap being proportional to the chain-rule residual of Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering], evaluated at the Gallager tilt $\rho$ rather than at $\rho = 0$ .

Show Hint

Apply Jensen's inequality to the $(1+\rho)$ -power of the averaged metric.

Identify the product-metric Gallager function as an upper bound on the $E_0$ integrand by Jensen; this flips into $E_0^{\rm BICM} \le E_0^{\rm CM}$ .

Read off the exponent ordering from the pointwise $E_0$ ordering via the common Lagrangian $\max_\rho [E_0 - \rho R]$ .

Proof

Pointwise Jensen on $(1+\rho)$-power

Consider the per- $y$ integrand. For the CM Gallager function at uniform input, $(\tfrac{1}{M}\sum_x p(y\mid x)^{1/(1+\rho)})^{1+\rho} \;=\; (\mathbb{E}_X[p(y\mid X)^{1/(1+\rho)}])^{1+\rho}.$ By Jensen's inequality applied to the convex function $u \mapsto u^{1+\rho}$ (convex for $\rho \ge 0$ ), the above is at most $\mathbb{E}_X[p(y\mid X)]$ , which is the $y$ -marginal of the channel output under uniform input.

For the BICM Gallager function, $(\tfrac{1}{2}\sum_{b} p_{W_\ell}(y\mid b)^{s/(1+\rho)})^{1+\rho} = (\mathbb{E}_{B_\ell}[p_{W_\ell}(y\mid B_\ell)^{s/(1+\rho)}])^{1+\rho}.$ At $s = 1$ the same Jensen argument applies, giving an upper bound $\mathbb{E}_{B_\ell}[p_{W_\ell}(y\mid B_\ell)]$ .

Product metric underbounds joint

Now compare the per- $y$ integrand of $E_0^{\rm BICM}$ and $E_0^{\rm CM}$ . The BICM integrand involves the per-position marginals $p_{W_\ell}(y\mid b)$ , while the CM integrand involves the joint $p(y\mid x)$ . By the chain rule of mutual information (at the rate- capacity level), the joint likelihood carries strictly more information about $x$ than the product of marginals. Translated to the Gallager integrand, this gives a pointwise ordering $\int_y (\tfrac{1}{L}\sum_\ell \tfrac{1}{2}\sum_b p_{W_\ell}(y\mid b)^{1/(1+\rho)})^{1+\rho} dy \;\ge\; \int_y (\tfrac{1}{M}\sum_x p(y\mid x)^{1/(1+\rho)})^{1+\rho} dy,$ and taking $-\log$ flips the inequality: $E_0^{\rm BICM}(\rho, 1) \le E_0^{\rm CM}(\rho)$ .

The $s^\star$ optimisation can only shrink the gap; it cannot flip the ordering, because the joint metric in $E_0^{\rm CM}$ is the matched one and dominates any mismatched scaling by the Ganti-Lapidoth-Telatar 1999 converse.

Equality at $\rho = 0$

At $\rho = 0$ both sides vanish: $E_0(0, \cdot) = -\log 1 = 0$ for any input distribution and channel. Equality at $\rho = 0$ is therefore automatic; the exponents are both 0 at $R =$ capacity. At other $\rho$ , the strict inequality from the previous step propagates, giving strict inequality in the exponents at $R > 0$ whenever the chain-rule gap is strictly positive.

Exponent ordering from Lagrangian

Both exponents are obtained as $E_r(R) = \max_{0\le\rho\le 1}[E_0(\rho) - \rho R]$ . Since $E_0^{\rm BICM}(\rho) \le E_0^{\rm CM}(\rho)$ pointwise in $\rho$ , $E_r^{\rm BICM}(R) = \max_\rho [E_0^{\rm BICM}(\rho) - \rho R] \le \max_\rho [E_0^{\rm CM}(\rho) - \rho R] = E_r^{\rm CM}(R),$ and the ordering extends to every $R$ in the common domain. $\blacksquare$

,

$E_r(R)$ for CM vs BICM

Plot of the random-coding error exponent $E_r(R)$ vs rate $R$ for $M$ -QAM at fixed SNR, comparing the CM exponent (joint-metric, matched decoder) with the BICM exponent (product-metric, mismatched decoder under Gray labelling). Both curves are positive for $R <$ capacity and decay to zero at the capacity. The BICM curve lies below the CM curve at every $R > 0$ , illustrating Thm. $\le$ $\leq$ CM Exponent" data-ref-type="theorem">TBICM Random-Coding Exponent $\le$ CM Exponent. The gap between the two curves is the exponent-level cost of the mismatched metric; for Gray at practical SNRs it is small (factor $\lesssim 1.2$ ), making BICM operationally competitive with CM despite the information-theoretic mismatch. The cutoff rate $R_0$ , where $E_r = E_0(1)$ , is marked on both curves.

Parameters

QAM size

M

SNR [dB]8

Theorem: Exponent Gap Is Small Under Gray Labelling

For square $M$ -QAM with Gray labelling on AWGN at SNR $\text{SNR}$ , the exponent gap at rate $R$ satisfies $\frac{E_r^{\mathrm{BICM}}(R)}{E_r^{\mathrm{CM}}(R)} \;\ge\; 1 - O(1/\text{SNR}),$ i.e., the exponent gap vanishes at high SNR. At moderate SNR (0–20 dB) the ratio is $\gtrsim 0.85$ for $M = 16$ and $\gtrsim 0.75$ for $M = 64$ across the range $R \in [0, I^{\rm BICM}]$ . Numerically, this translates to a required blocklength ratio $N_{\rm BICM} / N_{\rm CM}$ below 1.3 to achieve the same $P_e$ at the same rate — a modest practical cost.

Under Gray the per-position bit channels are nearly symmetric Gaussian at high SNR; the product bit metric then coincides with the symbol log-likelihood up to a constant (the same argument as for $s^\star \to 1$ in §3). The Gallager function ratio approaches 1, and the exponents coincide. At lower SNR the gap is $O(1/\text{SNR})$ — small but real — and its operational consequence is the $\lesssim 1.3\times$ blocklength penalty.

Show Hint

Expand the BICM Gallager function around the high-SNR limit where the bit channels become symmetric Gaussian.

Match the leading-order asymptotics with the CM Gallager function.

The first-order correction is $O(1/\text{SNR})$ and comes from the chain-rule residual.

Proof

High-SNR expansion of bit channels

At SNR $\text{SNR}$ , the $\ell$ -th Gray bit channel has per-position Bhattacharyya parameter $\beta_\ell = e^{-\alpha_\ell \text{SNR}}$ where $\alpha_\ell$ depends only on the Gray decomposition of the QAM constellation. The Gallager function $E_0^{\rm BICM}(\rho, 1)$ satisfies $E_0^{\rm BICM}(\rho, 1) \;\approx\; -\log \frac{1}{L}\sum_\ell (1 + \beta_\ell^{2/(1+\rho)})^{1+\rho} + \text{lower-order}.$

Compare with CM

The CM Gallager function satisfies a similar expansion in terms of the symbol-level Bhattacharyya parameters $\beta_{xx'} = e^{-\|x-x'\|^2/(4{\sigma^2}^{2})}$ . For square-QAM, the leading $\beta$ 's (nearest-neighbour pairs) match those of the bit-channel Bhattacharyya parameters — this is the Gray near-matching property. The leading-order exponents therefore coincide.

Leading correction is $O(1/\ntn{snr})$

The first non-matching term arises from second-nearest-neighbour pairs, which Gray handles imperfectly (it's a labelling-fit issue). This contributes a correction of order $e^{-2\alpha \text{SNR}}$ to both Gallager functions, but with a slightly different prefactor — a $O(1/\text{SNR})$ relative gap in the exponents at a given $\rho$ . This translates to $E_r^{\rm BICM}/E_r^{\rm CM} \ge 1 - O(1/\text{SNR})$ after taking the Lagrangian. $\blacksquare$

,

Example: Cutoff Rate of BICM-QPSK at 3 dB

Compute the cutoff rate $R_0^{\rm BICM} = E_0^{\rm BICM}(\rho = 1, s = 1)$ for QPSK with Gray labelling at $\text{SNR} = 3$ dB on AWGN. Compare with the BICM capacity $C_{\rm BICM}$ at the same SNR.

Solution

QPSK reduces to two BI-AWGN channels

From Example EQPSK with Gray Labelling: Two Identical Parallel BI-AWGN Channels, Gray-QPSK decomposes into two independent BI-AWGN channels, each at per-bit SNR $\text{SNR} = 3$ dB (linear $\approx 2$ ). The Bhattacharyya parameter of a BI-AWGN channel at this SNR is $\beta \;=\; \int \tfrac{1}{\sqrt{2\pi{\sigma^2}^{2}}} e^{-(y - \sqrt{E_s})^2/(2{\sigma^2}^{2})/2} e^{-(y + \sqrt{E_s})^2/(2{\sigma^2}^{2})/2}\, dy = e^{-\text{SNR}}.$ Numerically $\beta = e^{-2} \approx 0.135$ .

Cutoff rate formula for BI-AWGN

For a binary-input channel with Bhattacharyya parameter $\beta$ , $R_0 \;=\; 1 - \log_2(1 + \beta) \text{ bits per binary channel use.}$ At $\beta = 0.135$ , $R_0 = 1 - \log_2(1.135) \approx 1 - 0.183 \approx 0.817$ bits per code bit.

Scale up to QPSK

QPSK carries $L = 2$ code bits per symbol. Hence $R_0^{\rm BICM} = 2 \times 0.817 \approx 1.63$ bits/symbol.

Compare to capacity

At the same 3 dB SNR, the BI-AWGN capacity is $C_{\rm BI-AWGN}(3\text{dB}) \approx 0.885$ bits/use, so $C_{\rm BICM} = 2 \times 0.885 = 1.77$ bits/symbol. The gap $C_{\rm BICM} - R_0^{\rm BICM} \approx 0.14$ bits/symbol — about $0.9$ dB of SNR-equivalent. This is the cutoff-rate gap that a sequential or list decoder "pays" relative to a full ML decoder. We return to this in §5.

,

Numerical Exponent Comparison for 16-QAM at 8 dB

Rate $R$ (bits/symbol)	$E_r^{\mathrm{CM}}(R)$	$E_r^{\mathrm{BICM}}(R)$	Ratio
$0.5 \cdot I^{\rm BICM}$	$0.22$ nats	$0.20$ nats	$0.91$
$0.7 \cdot I^{\rm BICM}$	$0.10$ nats	$0.088$ nats	$0.88$
$0.9 \cdot I^{\rm BICM}$	$0.020$ nats	$0.017$ nats	$0.85$
$R_0^{\rm BICM}$ ( $\rho = 1$ )	$0.48$ nats	$0.41$ nats	$0.85$
$R = I^{\rm BICM}$	$0.0015$ nats (small+)	$0$ (vanishes)	—

What the Exponent Ratio Means Operationally

A ratio $E_r^{\rm BICM}/E_r^{\rm CM} = 0.85$ at a given rate has a direct operational reading: to achieve the same target error probability $P_e$ on both schemes, the BICM blocklength must satisfy $N_{\rm BICM} = N_{\rm CM} \cdot (E_r^{\rm CM}/E_r^{\rm BICM}) \approx 1.18 \cdot N_{\rm CM}$ . That is, BICM needs 18% more coded bits to reach the same $P_e$ at the same rate — a moderate but non-zero practical cost.

For rate-matched LDPC in 5G NR (blocklengths of a few thousand bits), 18% is around 400–600 extra bits — easily absorbed by the standard's rate-matching machinery. This is why the modularity of BICM is worth the exponent gap: a small blocklength inflation in exchange for a single LDPC code that drives every QAM order. Under SP labelling the ratio drops to $\sim 0.5$ – $0.6$ , which would require doubling the blocklength — enough to make SP-BICM a non-starter for standards.

Common Mistake: Exponent Gap Is Not the Same as Capacity Gap

Mistake:

Assuming that because the BICM capacity gap is only $\sim 0.05$ bits at 8 dB for Gray 16-QAM, the exponent gap must be similarly small (say, 0.05 nats or less).

Correction:

The capacity gap and the exponent gap are different physical quantities. The capacity gap measures the highest rate a given decoder can reach; the exponent gap measures how fast the error probability decays at a given sub-capacity rate. These two gaps are loosely correlated but not equal.

Numerically, for Gray 16-QAM at 8 dB:

Capacity gap: $C_{\rm CM} - C_{\rm BICM} \approx 0.05$ bits ( $\approx 3\%$ of the CM capacity).
Exponent gap at $R = 0.5 \cdot C_{\rm BICM}$ : $\approx 0.02$ nats ( $\approx 9\%$ of the CM exponent).

The percentage exponent gap is larger than the percentage capacity gap — the exponent is a finer measure of decoder quality. Both gaps tell the same qualitative story (small under Gray, large under SP), but the numerical conversion is not direct.

Historical Note: Gallager 1968: The Random-Coding Exponent

1965–1968

Robert Gallager's 1968 treatise Information Theory and Reliable Communication (Wiley, 1968) is the canonical reference for the random-coding error exponent. Building on Shannon's 1948 achievability argument and Fano's 1961 "moments of error" refinement, Gallager introduced the parameterised $E_0(\rho, q)$ function and the Lagrangian $E_r(R) = \max_\rho[E_0 - \rho R]$ that every subsequent analysis uses. The cutoff rate $R_0 = E_0(1)$ appeared in that text too, as the rate threshold below which the error exponent reduces to the Bhattacharyya exponent — an operationally tighter rate limit for sequential decoders.

Applying the Gallager exponent to BICM under the mismatched product metric is the content of §4 of the 2008 Guillén-Martínez-Caire monograph. It is a striking instance of how a 40-year-old tool from the matched-decoding theory lifts cleanly to the mismatched case with only a notational change in the $E_0$ integrand — a testament to the power-law generality of Gallager's framework. Beyond BICM, the mismatched-Gallager machinery has been applied to quantised decoders, list decoders, and probabilistic amplitude shaping, all with the same generic structure: identify the Gallager function, optimise, and read off the exponent.

Quick Check

Which of the following statements about the BICM random-coding exponent $E_r^{\mathrm{BICM}}(R)$ is true?

$E_r^{\mathrm{BICM}}(R) \le E_r^{\mathrm{CM}}(R)$ for all $R \le I^{\mathrm{BICM}}$ , with equality at $R = 0$

$E_r^{\mathrm{BICM}}(R) = E_r^{\mathrm{CM}}(R)$ everywhere for Gray labelling

$E_r^{\mathrm{BICM}}(R) > E_r^{\mathrm{CM}}(R)$ because BICM is simpler to decode

$E_r^{\mathrm{BICM}}(R)$ is not defined because BICM has no random-coding argument

Correction:

E_r^{\mathrm{BICM}}(R) \le E_r^{\mathrm{CM}}(R)

for all

R \le I^{\mathrm{BICM}}

, with equality at

R = 0

This is Theorem $\le$ $\leq$ CM Exponent" data-ref-type="theorem">TBICM Random-Coding Exponent $\le$ CM Exponent. The product bit metric is mismatched, so its Gallager function is pointwise below the joint-metric one, giving the exponent ordering. Equality at $R = 0$ holds because both $E_0(0) = 0$ .

Gallager Function $E_0(\rho)$

The function $E_0(\rho, q) = -\log \int_y (\sum_x q(x) p(y\mid x)^{1/(1+\rho)})^{1+\rho}$ associated with a channel $p(y\mid x)$ and input distribution $q$ . Convex in $\rho \in [0,1]$ with $E_0(0) = 0$ and $E_0'(0) = I(X; Y)$ . The random-coding exponent is $E_r(R) = \max_{0\le\rho\le 1}[E_0(\rho) - \rho R]$ . For BICM, $E_0$ specialises to a product-metric form over the $L$ bit channels.

Random-Coding Error Exponent $E_r(R)$

The rate of exponential decay of the ensemble-average error probability of a random code at rate $R$ : $\bar P_e \le e^{-N E_r(R)}$ . Positive for $R < C$ ; zero at $R = C$ . Obtained from the Gallager function via the Lagrangian $E_r(R) = \max_\rho[E_0(\rho) - \rho R]$ .

BICM Random-Coding Error Exponents