BICM Random-Coding Error Exponents
From Rate to Exponent: How Fast Does the Error Probability Decay?
Sections 1–3 told us which rates are achievable. Section 4 asks how quickly the error probability decays as blocklength grows, at each of those rates. The quantitative answer is the random-coding error exponent : the ensemble-average error probability of a random code of blocklength and rate is bounded by . The exponent is positive for and decreases to zero at — i.e., the capacity is the rate below which the exponent is strictly positive.
Gallager (1968) wrote the exponent in the canonical form where is the Gallager function associated with the channel and input distribution . This is a Lagrangian-dual relation: is convex in with and , so the maximum in is attained at an interior point whose value depends on .
For BICM with the mismatched product metric, the Gallager function specialises cleanly to (up to the scaling conventions), and the exponent is obtained by the same Lagrangian. The main qualitative result — due to Guillén-Martínez-Caire 2008 — is that with equality at and strict inequality for . The exponent gap is the error-exponent price of the mismatched product metric — a strictly positive version of the capacity gap from §2.
Two practical takeaways. First, the exponent gap is small under Gray labelling on AWGN: the BICM and CM exponents differ by a factor of 1.1–1.2 at rates of practical interest. This says that, at a given rate, the required blocklength to hit a given error probability is only slightly larger for BICM than for CM — often a negligible penalty. Second, the exponent gap is large under SP labelling: the SP product metric discards so much joint information that the BICM exponent can be a factor of 2 below the CM exponent, making SP-BICM materially worse than Gray-BICM at practical blocklengths.
This section formalises the Gallager-function specialisation to BICM, proves the exponent-ordering theorem, and sets up the cutoff rate — the subject of §5.
Definition: Gallager's for BICM (Product Metric)
Gallager's for BICM (Product Metric)
For the BICM parallel-channel model of Def. Parallel Binary Channels" data-ref-type="definition">DBICM as Parallel Binary Channels with the product bit metric, the Gallager function at parameter and decoder scaling is The corresponding BICM random-coding exponent is By construction and ; so the BICM exponent is strictly positive for any and vanishes at the GMI capacity.
The CM exponent is defined by the standard Gallager formula with the symbol-level channel law and uniform input on :
Three points:
- The BICM Gallager function averages over (the outer ) because the interleaver uniformly randomises which bit position each coded bit lands in. The average is outside the integral over and outside the -power — a consequence of Jensen-type bounds in the derivation.
- The CM Gallager function is a single integral over with the -power of the uniformly-averaged input — the natural Gallager formula for a symmetric -ary-input discrete memoryless channel.
- The two functions are related by an application of Jensen's inequality on (a concave function for , convex for ) — which is exactly what yields the exponent-ordering theorem below.
Theorem: BICM Random-Coding Exponent CM Exponent
For any memoryless channel, any constellation , and any labelling , with equality at and strict inequality for on generic channels (i.e., whenever the labelling induces a chain-rule gap). Equivalently, the product-metric Gallager function is pointwise below the joint-metric one: with equality at .
The BICM decoder throws away the joint symbol likelihood, keeping only the product of marginals. The random-coding exponent measures the discrimination power of the decoder against incorrect codewords — how quickly the probability of confusing any two codewords decays with blocklength. A more informative metric discriminates better; the joint likelihood is more informative than the product metric; hence , with the gap being proportional to the chain-rule residual of Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering], evaluated at the Gallager tilt rather than at .
Apply Jensen's inequality to the -power of the averaged metric.
Identify the product-metric Gallager function as an upper bound on the integrand by Jensen; this flips into .
Read off the exponent ordering from the pointwise ordering via the common Lagrangian .
Pointwise Jensen on $(1+\rho)$-power
Consider the per- integrand. For the CM Gallager function at uniform input, By Jensen's inequality applied to the convex function (convex for ), the above is at most , which is the -marginal of the channel output under uniform input.
For the BICM Gallager function, At the same Jensen argument applies, giving an upper bound .
Product metric underbounds joint
Now compare the per- integrand of and . The BICM integrand involves the per-position marginals , while the CM integrand involves the joint . By the chain rule of mutual information (at the rate- capacity level), the joint likelihood carries strictly more information about than the product of marginals. Translated to the Gallager integrand, this gives a pointwise ordering and taking flips the inequality: .
The optimisation can only shrink the gap; it cannot flip the ordering, because the joint metric in is the matched one and dominates any mismatched scaling by the Ganti-Lapidoth-Telatar 1999 converse.
Equality at $\rho = 0$
At both sides vanish: for any input distribution and channel. Equality at is therefore automatic; the exponents are both 0 at capacity. At other , the strict inequality from the previous step propagates, giving strict inequality in the exponents at whenever the chain-rule gap is strictly positive.
Exponent ordering from Lagrangian
Both exponents are obtained as . Since pointwise in , and the ordering extends to every in the common domain.
for CM vs BICM
Plot of the random-coding error exponent vs rate for -QAM at fixed SNR, comparing the CM exponent (joint-metric, matched decoder) with the BICM exponent (product-metric, mismatched decoder under Gray labelling). Both curves are positive for capacity and decay to zero at the capacity. The BICM curve lies below the CM curve at every , illustrating Thm. CM Exponent" data-ref-type="theorem">TBICM Random-Coding Exponent CM Exponent. The gap between the two curves is the exponent-level cost of the mismatched metric; for Gray at practical SNRs it is small (factor ), making BICM operationally competitive with CM despite the information-theoretic mismatch. The cutoff rate , where , is marked on both curves.
Parameters
Theorem: Exponent Gap Is Small Under Gray Labelling
For square -QAM with Gray labelling on AWGN at SNR , the exponent gap at rate satisfies i.e., the exponent gap vanishes at high SNR. At moderate SNR (0–20 dB) the ratio is for and for across the range . Numerically, this translates to a required blocklength ratio below 1.3 to achieve the same at the same rate — a modest practical cost.
Under Gray the per-position bit channels are nearly symmetric Gaussian at high SNR; the product bit metric then coincides with the symbol log-likelihood up to a constant (the same argument as for in §3). The Gallager function ratio approaches 1, and the exponents coincide. At lower SNR the gap is — small but real — and its operational consequence is the blocklength penalty.
Expand the BICM Gallager function around the high-SNR limit where the bit channels become symmetric Gaussian.
Match the leading-order asymptotics with the CM Gallager function.
The first-order correction is and comes from the chain-rule residual.
High-SNR expansion of bit channels
At SNR , the -th Gray bit channel has per-position Bhattacharyya parameter where depends only on the Gray decomposition of the QAM constellation. The Gallager function satisfies
Compare with CM
The CM Gallager function satisfies a similar expansion in terms of the symbol-level Bhattacharyya parameters . For square-QAM, the leading 's (nearest-neighbour pairs) match those of the bit-channel Bhattacharyya parameters — this is the Gray near-matching property. The leading-order exponents therefore coincide.
Leading correction is $O(1/\ntn{snr})$
The first non-matching term arises from second-nearest-neighbour pairs, which Gray handles imperfectly (it's a labelling-fit issue). This contributes a correction of order to both Gallager functions, but with a slightly different prefactor — a relative gap in the exponents at a given . This translates to after taking the Lagrangian.
Example: Cutoff Rate of BICM-QPSK at 3 dB
Compute the cutoff rate for QPSK with Gray labelling at dB on AWGN. Compare with the BICM capacity at the same SNR.
QPSK reduces to two BI-AWGN channels
From Example EQPSK with Gray Labelling: Two Identical Parallel BI-AWGN Channels, Gray-QPSK decomposes into two independent BI-AWGN channels, each at per-bit SNR dB (linear ). The Bhattacharyya parameter of a BI-AWGN channel at this SNR is Numerically .
Cutoff rate formula for BI-AWGN
For a binary-input channel with Bhattacharyya parameter , At , bits per code bit.
Scale up to QPSK
QPSK carries code bits per symbol. Hence bits/symbol.
Compare to capacity
At the same 3 dB SNR, the BI-AWGN capacity is bits/use, so bits/symbol. The gap bits/symbol — about dB of SNR-equivalent. This is the cutoff-rate gap that a sequential or list decoder "pays" relative to a full ML decoder. We return to this in §5.
Numerical Exponent Comparison for 16-QAM at 8 dB
| Rate (bits/symbol) | Ratio | ||
|---|---|---|---|
| nats | nats | ||
| nats | nats | ||
| nats | nats | ||
| () | nats | nats | |
| nats (small+) | (vanishes) | — |
What the Exponent Ratio Means Operationally
A ratio at a given rate has a direct operational reading: to achieve the same target error probability on both schemes, the BICM blocklength must satisfy . That is, BICM needs 18% more coded bits to reach the same at the same rate — a moderate but non-zero practical cost.
For rate-matched LDPC in 5G NR (blocklengths of a few thousand bits), 18% is around 400–600 extra bits — easily absorbed by the standard's rate-matching machinery. This is why the modularity of BICM is worth the exponent gap: a small blocklength inflation in exchange for a single LDPC code that drives every QAM order. Under SP labelling the ratio drops to –, which would require doubling the blocklength — enough to make SP-BICM a non-starter for standards.
Common Mistake: Exponent Gap Is Not the Same as Capacity Gap
Mistake:
Assuming that because the BICM capacity gap is only bits at 8 dB for Gray 16-QAM, the exponent gap must be similarly small (say, 0.05 nats or less).
Correction:
The capacity gap and the exponent gap are different physical quantities. The capacity gap measures the highest rate a given decoder can reach; the exponent gap measures how fast the error probability decays at a given sub-capacity rate. These two gaps are loosely correlated but not equal.
Numerically, for Gray 16-QAM at 8 dB:
- Capacity gap: bits ( of the CM capacity).
- Exponent gap at : nats ( of the CM exponent).
The percentage exponent gap is larger than the percentage capacity gap — the exponent is a finer measure of decoder quality. Both gaps tell the same qualitative story (small under Gray, large under SP), but the numerical conversion is not direct.
Historical Note: Gallager 1968: The Random-Coding Exponent
1965–1968Robert Gallager's 1968 treatise Information Theory and Reliable Communication (Wiley, 1968) is the canonical reference for the random-coding error exponent. Building on Shannon's 1948 achievability argument and Fano's 1961 "moments of error" refinement, Gallager introduced the parameterised function and the Lagrangian that every subsequent analysis uses. The cutoff rate appeared in that text too, as the rate threshold below which the error exponent reduces to the Bhattacharyya exponent — an operationally tighter rate limit for sequential decoders.
Applying the Gallager exponent to BICM under the mismatched product metric is the content of §4 of the 2008 Guillén-Martínez-Caire monograph. It is a striking instance of how a 40-year-old tool from the matched-decoding theory lifts cleanly to the mismatched case with only a notational change in the integrand — a testament to the power-law generality of Gallager's framework. Beyond BICM, the mismatched-Gallager machinery has been applied to quantised decoders, list decoders, and probabilistic amplitude shaping, all with the same generic structure: identify the Gallager function, optimise, and read off the exponent.
Quick Check
Which of the following statements about the BICM random-coding exponent is true?
for all , with equality at
everywhere for Gray labelling
because BICM is simpler to decode
is not defined because BICM has no random-coding argument
This is Theorem CM Exponent" data-ref-type="theorem">TBICM Random-Coding Exponent CM Exponent. The product bit metric is mismatched, so its Gallager function is pointwise below the joint-metric one, giving the exponent ordering. Equality at holds because both .
Gallager Function
The function associated with a channel and input distribution . Convex in with and . The random-coding exponent is . For BICM, specialises to a product-metric form over the bit channels.
Related: Random-Coding Error Exponent , Gallager Cutoff Rate , Mismatched Maximum-Metric Decoding
Random-Coding Error Exponent
The rate of exponential decay of the ensemble-average error probability of a random code at rate : . Positive for ; zero at . Obtained from the Gallager function via the Lagrangian .
Related: Gallager Function , Gallager Cutoff Rate