BICM Random-Coding Error Exponents

From Rate to Exponent: How Fast Does the Error Probability Decay?

Sections 1–3 told us which rates are achievable. Section 4 asks how quickly the error probability decays as blocklength NN grows, at each of those rates. The quantitative answer is the random-coding error exponent Er(R)E_r(R): the ensemble-average error probability of a random code of blocklength NN and rate RR is bounded by PˉeeNEr(R)\bar P_e \le e^{-N E_r(R)}. The exponent Er(R)E_r(R) is positive for R<CR < C and decreases to zero at R=CR = C — i.e., the capacity is the rate below which the exponent is strictly positive.

Gallager (1968) wrote the exponent in the canonical form Er(R)=max0ρ1[E0(ρ,q)ρR],E_r(R) = \max_{0 \le \rho \le 1} [E_0(\rho, q) - \rho R], where E0(ρ,q)E_0(\rho, q) is the Gallager function associated with the channel and input distribution qq. This is a Lagrangian-dual relation: E0(ρ)E_0(\rho) is convex in ρ\rho with E0(0)=0E_0(0) = 0 and E0(0)=CE_0'(0) = C, so the maximum in ρ\rho is attained at an interior point whose value depends on RR.

For BICM with the mismatched product metric, the Gallager function specialises cleanly to E0BICM(ρ)=logy(1L12bpW(yb)1/(1+ρ))1+ρE_0^{\mathrm{BICM}}(\rho) = -\log \sum_y \left(\tfrac{1}{L}\sum_\ell \tfrac{1}{2}\sum_b p_{W_\ell}(y\mid b)^{1/(1+\rho)}\right)^{1+\rho} (up to the scaling conventions), and the exponent ErBICM(R)E_r^{\mathrm{BICM}}(R) is obtained by the same Lagrangian. The main qualitative result — due to Guillén-Martínez-Caire 2008 — is that ErBICM(R)    ErCM(R)for all RIBICM,E_r^{\mathrm{BICM}}(R) \;\le\; E_r^{\mathrm{CM}}(R) \quad \text{for all } R \le I^{\mathrm{BICM}}, with equality at R=0R = 0 and strict inequality for R>0R > 0. The exponent gap is the error-exponent price of the mismatched product metric — a strictly positive version of the capacity gap from §2.

Two practical takeaways. First, the exponent gap is small under Gray labelling on AWGN: the BICM and CM exponents differ by \lesssim a factor of 1.1–1.2 at rates of practical interest. This says that, at a given rate, the required blocklength to hit a given error probability is only slightly larger for BICM than for CM — often a negligible penalty. Second, the exponent gap is large under SP labelling: the SP product metric discards so much joint information that the BICM exponent can be a factor of 2 below the CM exponent, making SP-BICM materially worse than Gray-BICM at practical blocklengths.

This section formalises the Gallager-function specialisation to BICM, proves the exponent-ordering theorem, and sets up the cutoff rate R0=E0(1)R_0 = E_0(1) — the subject of §5.

,

Definition:

Gallager's E0(ρ)E_0(\rho) for BICM (Product Metric)

For the BICM parallel-channel model of Def. LL Parallel Binary Channels" data-ref-type="definition">DBICM as LL Parallel Binary Channels with the product bit metric, the Gallager function at parameter ρ[0,1]\rho \in [0, 1] and decoder scaling s>0s > 0 is E0BICM(ρ,s)  =  log1L=0L1y ⁣(12 ⁣b{0,1}pW(yb)s/(1+ρ))1+ρdy.E_0^{\mathrm{BICM}}(\rho, s) \;=\; -\log \frac{1}{L}\sum_{\ell = 0}^{L-1} \int_y \!\Big( \tfrac{1}{2}\!\sum_{b \in \{0,1\}} p_{W_\ell}(y\mid b)^{s/(1+\rho)}\Big)^{1+\rho}\, dy. The corresponding BICM random-coding exponent is ErBICM(R)  =  max0ρ1  maxs>0  [E0BICM(ρ,s)ρR].E_r^{\mathrm{BICM}}(R) \;=\; \max_{0 \le \rho \le 1}\;\max_{s > 0}\; [\, E_0^{\mathrm{BICM}}(\rho, s) - \rho R\, ]. By construction E0BICM(0,s)=0E_0^{\mathrm{BICM}}(0, s) = 0 and E0BICM/ρρ=0=IGMI(s)\partial E_0^{\mathrm{BICM}}/\partial \rho \big|_{\rho = 0} = I^{\mathrm{GMI}}(s); so the BICM exponent is strictly positive for any R<supsIGMI(s)=RBICMR < \sup_s I^{\mathrm{GMI}}(s) = R^\star_{\rm BICM} and vanishes at the GMI capacity.

The CM exponent ErCM(R)E_r^{\mathrm{CM}}(R) is defined by the standard Gallager formula with the symbol-level channel law p(yx)p(y\mid x) and uniform input on X\mathcal{X}: E0CM(ρ)  =  logy ⁣(1M ⁣xXp(yx)1/(1+ρ))1+ρdy,ErCM(R)=max0ρ1[E0CM(ρ)ρR].E_0^{\mathrm{CM}}(\rho) \;=\; -\log \int_y \!\Big( \tfrac{1}{M}\! \sum_{x \in \mathcal{X}} p(y\mid x)^{1/(1+\rho)}\Big)^{1+\rho}\, dy, \quad E_r^{\mathrm{CM}}(R) = \max_{0\le\rho\le 1}[E_0^{\mathrm{CM}}(\rho)-\rho R].

Three points:

  1. The BICM Gallager function averages over \ell (the outer 1L\tfrac{1}{L}\sum_\ell) because the interleaver uniformly randomises which bit position each coded bit lands in. The average is outside the integral over yy and outside the (1+ρ)(1+\rho)-power — a consequence of Jensen-type bounds in the derivation.
  2. The CM Gallager function is a single integral over yy with the (1+ρ)(1+\rho)-power of the uniformly-averaged input — the natural Gallager formula for a symmetric MM-ary-input discrete memoryless channel.
  3. The two functions are related by an application of Jensen's inequality on ()1+ρ(\cdot)^{1+\rho} (a concave function for ρ<0\rho < 0, convex for ρ>0\rho > 0) — which is exactly what yields the exponent-ordering theorem below.
,

Theorem: BICM Random-Coding Exponent \le CM Exponent

For any memoryless channel, any constellation X\mathcal{X}, and any labelling μ\mu, ErBICM(R)    ErCM(R)for all R[0,IGMI(s)],E_r^{\mathrm{BICM}}(R) \;\le\; E_r^{\mathrm{CM}}(R) \qquad \text{for all } R \in [0, I^{\mathrm{GMI}}(s^\star)], with equality at R=0R = 0 and strict inequality for R>0R > 0 on generic channels (i.e., whenever the labelling induces a chain-rule gap). Equivalently, the product-metric Gallager function is pointwise below the joint-metric one: E0BICM(ρ,s)    E0CM(ρ)  ρ[0,1],E_0^{\mathrm{BICM}}(\rho, s^\star) \;\le\; E_0^{\mathrm{CM}}(\rho) \qquad \forall\; \rho \in [0, 1], with equality at ρ=0\rho = 0.

The BICM decoder throws away the joint symbol likelihood, keeping only the product of marginals. The random-coding exponent measures the discrimination power of the decoder against incorrect codewords — how quickly the probability of confusing any two codewords decays with blocklength. A more informative metric discriminates better; the joint likelihood is more informative than the product metric; hence ErCMErBICME_r^{\mathrm{CM}} \ge E_r^{\mathrm{BICM}}, with the gap being proportional to the chain-rule residual of Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering], evaluated at the Gallager tilt ρ\rho rather than at ρ=0\rho = 0.

,

Er(R)E_r(R) for CM vs BICM

Plot of the random-coding error exponent Er(R)E_r(R) vs rate RR for MM-QAM at fixed SNR, comparing the CM exponent (joint-metric, matched decoder) with the BICM exponent (product-metric, mismatched decoder under Gray labelling). Both curves are positive for R<R < capacity and decay to zero at the capacity. The BICM curve lies below the CM curve at every R>0R > 0, illustrating Thm. \le CM Exponent" data-ref-type="theorem">TBICM Random-Coding Exponent \le CM Exponent. The gap between the two curves is the exponent-level cost of the mismatched metric; for Gray at practical SNRs it is small (factor 1.2\lesssim 1.2), making BICM operationally competitive with CM despite the information-theoretic mismatch. The cutoff rate R0R_0, where Er=E0(1)E_r = E_0(1), is marked on both curves.

Parameters
8

Theorem: Exponent Gap Is Small Under Gray Labelling

For square MM-QAM with Gray labelling on AWGN at SNR SNR\text{SNR}, the exponent gap at rate RR satisfies ErBICM(R)ErCM(R)    1O(1/SNR),\frac{E_r^{\mathrm{BICM}}(R)}{E_r^{\mathrm{CM}}(R)} \;\ge\; 1 - O(1/\text{SNR}), i.e., the exponent gap vanishes at high SNR. At moderate SNR (0–20 dB) the ratio is 0.85\gtrsim 0.85 for M=16M = 16 and 0.75\gtrsim 0.75 for M=64M = 64 across the range R[0,IBICM]R \in [0, I^{\rm BICM}]. Numerically, this translates to a required blocklength ratio NBICM/NCMN_{\rm BICM} / N_{\rm CM} below 1.3 to achieve the same PeP_e at the same rate — a modest practical cost.

Under Gray the per-position bit channels are nearly symmetric Gaussian at high SNR; the product bit metric then coincides with the symbol log-likelihood up to a constant (the same argument as for s1s^\star \to 1 in §3). The Gallager function ratio approaches 1, and the exponents coincide. At lower SNR the gap is O(1/SNR)O(1/\text{SNR}) — small but real — and its operational consequence is the 1.3×\lesssim 1.3\times blocklength penalty.

,

Example: Cutoff Rate of BICM-QPSK at 3 dB

Compute the cutoff rate R0BICM=E0BICM(ρ=1,s=1)R_0^{\rm BICM} = E_0^{\rm BICM}(\rho = 1, s = 1) for QPSK with Gray labelling at SNR=3\text{SNR} = 3 dB on AWGN. Compare with the BICM capacity CBICMC_{\rm BICM} at the same SNR.

,

Numerical Exponent Comparison for 16-QAM at 8 dB

Rate RR (bits/symbol)ErCM(R)E_r^{\mathrm{CM}}(R)ErBICM(R)E_r^{\mathrm{BICM}}(R)Ratio
0.5IBICM0.5 \cdot I^{\rm BICM}0.220.22 nats0.200.20 nats0.910.91
0.7IBICM0.7 \cdot I^{\rm BICM}0.100.10 nats0.0880.088 nats0.880.88
0.9IBICM0.9 \cdot I^{\rm BICM}0.0200.020 nats0.0170.017 nats0.850.85
R0BICMR_0^{\rm BICM} (ρ=1\rho = 1)0.480.48 nats0.410.41 nats0.850.85
R=IBICMR = I^{\rm BICM}0.00150.0015 nats (small+)00 (vanishes)

What the Exponent Ratio Means Operationally

A ratio ErBICM/ErCM=0.85E_r^{\rm BICM}/E_r^{\rm CM} = 0.85 at a given rate has a direct operational reading: to achieve the same target error probability PeP_e on both schemes, the BICM blocklength must satisfy NBICM=NCM(ErCM/ErBICM)1.18NCMN_{\rm BICM} = N_{\rm CM} \cdot (E_r^{\rm CM}/E_r^{\rm BICM}) \approx 1.18 \cdot N_{\rm CM}. That is, BICM needs 18% more coded bits to reach the same PeP_e at the same rate — a moderate but non-zero practical cost.

For rate-matched LDPC in 5G NR (blocklengths of a few thousand bits), 18% is around 400–600 extra bits — easily absorbed by the standard's rate-matching machinery. This is why the modularity of BICM is worth the exponent gap: a small blocklength inflation in exchange for a single LDPC code that drives every QAM order. Under SP labelling the ratio drops to 0.5\sim 0.50.60.6, which would require doubling the blocklength — enough to make SP-BICM a non-starter for standards.

Common Mistake: Exponent Gap Is Not the Same as Capacity Gap

Mistake:

Assuming that because the BICM capacity gap is only 0.05\sim 0.05 bits at 8 dB for Gray 16-QAM, the exponent gap must be similarly small (say, 0.05 nats or less).

Correction:

The capacity gap and the exponent gap are different physical quantities. The capacity gap measures the highest rate a given decoder can reach; the exponent gap measures how fast the error probability decays at a given sub-capacity rate. These two gaps are loosely correlated but not equal.

Numerically, for Gray 16-QAM at 8 dB:

  • Capacity gap: CCMCBICM0.05C_{\rm CM} - C_{\rm BICM} \approx 0.05 bits (3%\approx 3\% of the CM capacity).
  • Exponent gap at R=0.5CBICMR = 0.5 \cdot C_{\rm BICM}: 0.02\approx 0.02 nats (9%\approx 9\% of the CM exponent).

The percentage exponent gap is larger than the percentage capacity gap — the exponent is a finer measure of decoder quality. Both gaps tell the same qualitative story (small under Gray, large under SP), but the numerical conversion is not direct.

Historical Note: Gallager 1968: The Random-Coding Exponent

1965–1968

Robert Gallager's 1968 treatise Information Theory and Reliable Communication (Wiley, 1968) is the canonical reference for the random-coding error exponent. Building on Shannon's 1948 achievability argument and Fano's 1961 "moments of error" refinement, Gallager introduced the parameterised E0(ρ,q)E_0(\rho, q) function and the Lagrangian Er(R)=maxρ[E0ρR]E_r(R) = \max_\rho[E_0 - \rho R] that every subsequent analysis uses. The cutoff rate R0=E0(1)R_0 = E_0(1) appeared in that text too, as the rate threshold below which the error exponent reduces to the Bhattacharyya exponent — an operationally tighter rate limit for sequential decoders.

Applying the Gallager exponent to BICM under the mismatched product metric is the content of §4 of the 2008 Guillén-Martínez-Caire monograph. It is a striking instance of how a 40-year-old tool from the matched-decoding theory lifts cleanly to the mismatched case with only a notational change in the E0E_0 integrand — a testament to the power-law generality of Gallager's framework. Beyond BICM, the mismatched-Gallager machinery has been applied to quantised decoders, list decoders, and probabilistic amplitude shaping, all with the same generic structure: identify the Gallager function, optimise, and read off the exponent.

Quick Check

Which of the following statements about the BICM random-coding exponent ErBICM(R)E_r^{\mathrm{BICM}}(R) is true?

ErBICM(R)ErCM(R)E_r^{\mathrm{BICM}}(R) \le E_r^{\mathrm{CM}}(R) for all RIBICMR \le I^{\mathrm{BICM}}, with equality at R=0R = 0

ErBICM(R)=ErCM(R)E_r^{\mathrm{BICM}}(R) = E_r^{\mathrm{CM}}(R) everywhere for Gray labelling

ErBICM(R)>ErCM(R)E_r^{\mathrm{BICM}}(R) > E_r^{\mathrm{CM}}(R) because BICM is simpler to decode

ErBICM(R)E_r^{\mathrm{BICM}}(R) is not defined because BICM has no random-coding argument

Gallager Function E0(ρ)E_0(\rho)

The function E0(ρ,q)=logy(xq(x)p(yx)1/(1+ρ))1+ρE_0(\rho, q) = -\log \int_y (\sum_x q(x) p(y\mid x)^{1/(1+\rho)})^{1+\rho} associated with a channel p(yx)p(y\mid x) and input distribution qq. Convex in ρ[0,1]\rho \in [0,1] with E0(0)=0E_0(0) = 0 and E0(0)=I(X;Y)E_0'(0) = I(X; Y). The random-coding exponent is Er(R)=max0ρ1[E0(ρ)ρR]E_r(R) = \max_{0\le\rho\le 1}[E_0(\rho) - \rho R]. For BICM, E0E_0 specialises to a product-metric form over the LL bit channels.

Related: Random-Coding Error Exponent Er(R)E_r(R), Gallager Cutoff Rate R0R_0, Mismatched Maximum-Metric Decoding

Random-Coding Error Exponent Er(R)E_r(R)

The rate of exponential decay of the ensemble-average error probability of a random code at rate RR: PˉeeNEr(R)\bar P_e \le e^{-N E_r(R)}. Positive for R<CR < C; zero at R=CR = C. Obtained from the Gallager function via the Lagrangian Er(R)=maxρ[E0(ρ)ρR]E_r(R) = \max_\rho[E_0(\rho) - \rho R].

Related: Gallager Function E0(ρ)E_0(\rho), Gallager Cutoff Rate R0R_0