BICM as Mismatched Decoding

The Decoder BICM Actually Uses Is Wrong

If you wrote down, from scratch, the optimal maximum-likelihood decoder for the BICM channel, you would sum over all M=2LM = 2^L constellation symbols compatible with a given binary codeword label — a joint symbol likelihood that couples the LL bits of every symbol through the Euclidean geometry of the constellation. No real receiver does this. What every 5G NR, Wi-Fi, and DVB-S2 BICM receiver actually runs is: compute LL soft per-bit LLRs from the demapper, de-interleave, and feed a standard binary decoder that multiplies the per-bit LLRs. That is the product bit metric — and it is a mismatched metric: it is not the true likelihood of the codeword given yy.

The point of this section — and the central contribution of the 2008 Foundations & Trends monograph of Guillén i Fàbregas, Martínez, and Caire — is to take this observation seriously. Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994: the rate achievable by a random code when the decoder uses a metric qq instead of the true likelihood pp is not the mutual information I(Y;X)I(Y; X), but the generalised mutual information IGMI(s)I^{\mathrm{GMI}}(s), parameterised by a decoder scaling ss. What Guillén-Martínez-Caire showed is that this classical framework is exactly the right tool for analysing BICM.

Under their reframing, three previously fuzzy aspects of BICM become crystal-clear: (i) the Caire-Taricco-Biglieri capacity formula C\sum_\ell C_\ell is exactly the GMI at s=1s = 1 for the product bit metric, turning a heuristic formula into a rigorous GMI expression; (ii) the decoder scaling ss is a tunable knob — when the demapper's LLRs are suboptimally computed (e.g., max-log, quantised, mismatched to the noise variance), a scaling s1s \ne 1 recovers part of the loss; (iii) the error-exponent analysis for BICM lifts cleanly from Gallager's E0(ρ)E_0(\rho) applied to the product metric — the topic of §4.

This section develops the mismatched-decoding framework, derives the BICM product metric as the specific mismatched metric that BICM uses, and introduces the GMI in enough detail to build the main achievability theorem of §3. The CommIT contribution is the 2008 monograph that put all of this on a unified rigorous footing.

,

Definition:

Mismatched Maximum-Metric Decoding

Let CXN\mathcal{C} \subset \mathcal{X}^N be a codebook of blocklength NN on a memoryless channel p(yx)p(y\mid x) with output yYN\mathbf{y} \in \mathcal{Y}^N. A mismatched maximum-metric decoder with metric q:Y×XR0q: \mathcal{Y} \times \mathcal{X} \to \mathbb{R}_{\ge 0} selects the codeword that maximises the product metric across symbols: x^  =  argmaxxCn=1Nq(yn,xn)  =  argmaxxCn=1Nlogq(yn,xn).\hat{\mathbf{x}} \;=\; \arg\max_{\mathbf{x} \in \mathcal{C}} \prod_{n=1}^{N} q(y_n, x_n) \;=\; \arg\max_{\mathbf{x} \in \mathcal{C}} \sum_{n=1}^{N} \log q(y_n, x_n). The decoder is matched if q(y,x)=p(yx)q(y, x) = p(y\mid x) (up to a codeword-independent normalisation); mismatched otherwise.

The mismatch metric qq is a design choice of the receiver — generally chosen for implementation convenience, not for information-theoretic optimality. Operational questions:

  • What rate is reliably achievable by a random code with a given mismatched decoder?
  • How far is this rate below the matched-decoder capacity I(Y;X)I(Y; X)?
  • Is there a free scaling of qq that can improve the achievable rate?

Mismatched decoding is not a theoretical pathology. In BICM, as we will see below, the receiver knows the true channel law p(yx)p(y\mid x) but chooses to use the product bit metric because it decouples the demapper from the binary decoder — a massive engineering win at a small information-theoretic cost. The same is true for bit-wise LDPC decoders in 5G NR, sphere decoders under early termination, and log-max receivers in practice.

, ,

Definition:

The BICM Product Bit Metric

For the BICM system of Chapter 5, let μ:{0,1}LX\mu: \{0,1\}^L \to \mathcal{X} be the labelling. The BICM bit metric (or product bit metric) is qBICM(y,x)  =  =0L1pW ⁣(yμ1(x))  =  =0L1q(y,b),b=μ1(x),q_{\rm BICM}(y, x) \;=\; \prod_{\ell = 0}^{L-1} p_{W_\ell}\!\big(y \mid \mu^{-1}(x)_\ell \big) \;=\; \prod_{\ell = 0}^{L-1} q_\ell(y, b_\ell), \quad b_\ell = \mu^{-1}(x)_\ell, where each per-bit factor q(y,b)q_\ell(y, b) is the marginal per-position transition law pW(yb)=2MxX(b)p(yx)p_{W_\ell}(y\mid b) = \tfrac{2}{M}\sum_{x'\in \mathcal{X}_\ell^{(b)}} p(y\mid x') from Def. LL Parallel Binary Channels" data-ref-type="definition">DBICM as LL Parallel Binary Channels. Writing this in LLR form: if λ(y)=logpW(y0)pW(y1)\lambda_\ell(y) = \log \frac{p_{W_\ell}(y\mid 0)}{p_{W_\ell}(y\mid 1)} is the per-position LLR, then logqBICM(y,x)  =  :b=1λ(y)  +  const(y).\log q_{\rm BICM}(y, x) \;=\; -\sum_{\ell: b_\ell = 1} \lambda_\ell(y) \;+\; \mathrm{const}(y). The decoder adds these LLRs across all NN received symbols and selects the codeword with the largest total. This is exactly what a standard binary decoder does downstream of the demapper.

This metric is mismatched because the true symbol likelihood is p(yx)pW(yb)p(y\mid x) \ne \prod_\ell p_{W_\ell}(y\mid b_\ell) whenever the labelling induces dependencies among the bits conditional on yy — i.e., whenever CCM>CBICMC_{\rm CM} > C_{\rm BICM}, which is generic for M16M \ge 16.

Three practical subtleties to note:

  1. The LLR formula above is the exact marginal demapping; max-log receivers use λ1σ22(minx0yx02minx1yx12)\lambda_\ell \approx \tfrac{1}{{\sigma^2}^{2}}(\min_{x_0} \|y - x_0\|^2 - \min_{x_1} \|y - x_1\|^2) which is a further approximation (an additional source of mismatch, studied in §5).
  2. The constant const(y)\mathrm{const}(y) cancels in all codeword comparisons — it does not affect the decoder's decision.
  3. Calling qBICMq_{\rm BICM} "mismatched" is a statement about the receiver's modelling of the channel. The transmitter has not done anything wrong; it is the decoder that has chosen a suboptimal metric in exchange for architectural simplicity.
,
🎓CommIT Contribution(2008)

BICM as Mismatched Decoding (Generalised Mutual Information)

A. Guillén i Fàbregas, A. Martinez, G. CaireFoundations and Trends in Communications and Information Theory, vol. 5, no. 1–2, pp. 1–153

The 153-page Foundations & Trends monograph of Albert Guillén i Fàbregas, Alfonso Martínez, and Giuseppe Caire is the definitive information-theoretic treatment of BICM. Building on the 1998 Caire- Taricco-Biglieri paper (Ch. 5), it recasts BICM as a mismatched- decoding problem and puts the entire analysis of capacity, error exponents, and PEP on the footing of Gallager's 1968 random-coding machinery applied to the product bit metric. Its technical contributions are four-fold and are the backbone of Chapters 7–9 of this book.

(i) BICM is a mismatched decoder; the natural rate is the GMI. The BICM receiver uses the product bit metric qBICM(y,x)=pW(yb)q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y \mid b_\ell) in place of the true symbol likelihood p(yx)p(y \mid x). Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994 — the achievable rate is not I(Y;X)I(Y; X) but the generalised mutual information IGMI(s)I^{\mathrm{GMI}}(s), a function of the decoder scaling s>0s > 0. Guillén-Martínez-Caire show that the Caire-Taricco-Biglieri capacity C\sum_\ell C_\ell is exactly IGMI(1)I^{\mathrm{GMI}}(1) for the product metric — elevating the 1998 formula from a heuristic derivation to a rigorous mismatched-decoding rate theorem.

(ii) The decoder scaling ss is a tunable knob. For Gray labelling on M-QAM at high SNR the optimal s1s^\star \to 1 — the BICM capacity is the GMI at s=1s = 1 and no scaling helps. At low SNR, or with non-Gray labellings, s1s^\star \ne 1, and the largest achievable rate sups>0IGMI(s)\sup_{s > 0} I^{\mathrm{GMI}}(s) exceeds the naive BICM capacity IGMI(1)I^{\mathrm{GMI}}(1) by a measurable margin. This explains why, at very low SNR, one observes BICM rates slightly above the CTB formula.

(iii) BICM error exponent via Gallager's product-metric E0(ρ)E_0(\rho). Applying Gallager's random-coding bound to the product bit metric gives the BICM random-coding exponent ErBICM(R)=maxρ[E0BICM(ρ)ρR]E_r^{\mathrm{BICM}}(R) = \max_\rho [E_0^{\mathrm{BICM}}(\rho) - \rho R] in closed form. A direct computation shows ErBICM(R)ErCM(R)E_r^{\mathrm{BICM}}(R) \le E_r^{\mathrm{CM}}(R) at every rate, with equality at R=0R = 0, and the gap under Gray labelling is small over the operational range. This is the central result of §4 of the monograph and §4 of this chapter.

(iv) Extensions to fading, block fading, and MIMO. Chapters 6–8 of the monograph extend the GMI framework to outage analysis on block-fading channels (the BICM outage capacity reduces to a per-symbol GMI computation), to MIMO BICM (where the bit metric becomes a marginalised log-likelihood over a joint space), and to BICM-ID (with decoder feedback, treated in Ch. 8 of this book). The GMI framework scales; the Caire-Taricco-Biglieri formula does not, except through the GMI reinterpretation.

Why it redefined the theory of BICM. Before 2008, the BICM capacity formula I(Y;B)\sum_\ell I(Y; B_\ell) was known to be achievable by the parallel-channel argument of Chapter 5, but its information-theoretic converse — the statement that no mismatched decoder using the product metric can exceed it — was folkloric. Guillén-Martínez-Caire supplied the converse via the mismatched-decoding GMI, closed the capacity gap, and gave the unified framework under which all subsequent BICM papers (on error exponents, labelling design, iterative decoding, and standards analysis) have been written. This monograph is the single most authoritative reference on BICM and is the central CommIT contribution of Chapter 7 of this book.

bicmgmimismatched-decodingerror-exponentcutoff-rategallagerrandom-codingView Paper →

CM MI, BICM GMI, and Shannon Capacity vs SNR

Three rate curves for MM-QAM: (i) the unconstrained Shannon capacity log2(1+SNR)\log_2(1 + \text{SNR}); (ii) the CM capacity CCM=I(Y;X)C_{\rm CM} = I(Y; X) (the matched ML decoder on the constellation); and (iii) the BICM GMI at s=1s = 1, which coincides with the Caire-Taricco- Biglieri capacity C\sum_\ell C_\ell. The three curves converge at low SNR, where Shannon is tight; they separate at moderate SNR by 0.3\lesssim 0.3 dB (Gray QAM); they saturate at log2M\log_2 M at high SNR (the constellation-constrained ceiling). The BICM-to-CM gap is the mismatched-decoding penalty; the CM-to-Shannon gap is the shaping gap closed in Chapter 9 by probabilistic amplitude shaping.

Parameters

Theorem: The BICM Product Metric Is Strictly Mismatched for M16M \ge 16

For any memoryless channel p(yx)p(y\mid x) with M16M \ge 16 constellation points X\mathcal{X} and any labelling μ\mu, the BICM product bit metric qBICM(y,x)=pW(yb)q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell) is strictly different from the true symbol likelihood p(yx)p(y\mid x) on a set of output pairs (x,y)(x, y) of positive probability. Consequently, qBICMq_{\rm BICM} is a strictly mismatched metric in the sense of Def. DMismatched Maximum-Metric Decoding, and its associated achievable rate (computed rigorously in §3) is strictly less than CCMC_{\rm CM} for generic channels.

Equivalently, the chain-rule gap CCM=0L1I(Y;B)  =  =1L1I(B;B<Y)  >  0C_{\rm CM} - \sum_{\ell=0}^{L-1} I(Y; B_\ell) \;=\; \sum_{\ell = 1}^{L-1} I(B_\ell; B_{<\ell}\mid Y) \;>\; 0 is strictly positive unless the labelling induces conditional independence of the label bits given YY, which fails for every M16M \ge 16 on AWGN.

Two different constellation points can produce similar yy's only by accident of Euclidean geometry — that is, by having several label bits in common. Such coincidences are recorded in the joint p(yx)p(y\mid x) but lost in the product pW(yb)\prod_\ell p_{W_\ell}(y\mid b_\ell), because the product factor pWp_{W_\ell} has already averaged out the other bits. The mismatch is the price of that averaging.

,

Example: 16-QAM at 10 dB: The Size of the Mismatch

For 16-QAM with Gray labelling at SNR=10\text{SNR} = 10 dB on AWGN, compute (numerically) the CM capacity CCMC_{\rm CM}, the BICM GMI at s=1s = 1 (= C\sum_\ell C_\ell), and the mismatch gap in bits and in dB of SNR-equivalent.

,
⚠️Engineering Note

The Product Bit Metric in 5G NR LDPC Decoding

Every 5G NR user-plane receiver is a BICM mismatched decoder. The constellation demapper produces one approximate LLR per coded bit (often max-log, often quantised to 5–6 bits), and the LDPC decoder treats these LLRs as independent per-bit observations of a binary channel. The decoder has no access to the joint symbol likelihood; it only sees the product-metric sum of log-LLRs. Hence, from §2's perspective, every 5G NR uplink/downlink is operating with the mismatched product bit metric qBICMq_{\rm BICM} that this chapter analyses.

The information-theoretic consequence is that the achievable rate is IGMI(s)I^{\mathrm{GMI}}(s), not CCMC_{\rm CM}. For the QPSK–1024-QAM range used by 5G NR with Gray labelling, the gap CCMsupsIGMI(s)C_{\rm CM} - \sup_s I^{\mathrm{GMI}}(s) ranges from 00 (QPSK, exactly matched) to 0.1\approx 0.1 bits/symbol (1024-QAM at moderate SNR) — this is the fundamental limit of a 5G NR receiver architecture, below which 5G cannot go without abandoning BICM. The remaining gap to Shannon is covered by probabilistic amplitude shaping, which is being standardised for beyond-5G in the 3GPP R18 evaluations (see also Ch. 9).

Practical Constraints
  • LDPC decoder input is per-bit LLR, not per-symbol log-likelihood

  • Max-log demapper introduces an additional small mismatch on top of the product-metric mismatch

  • BICM-to-CM gap bounded by 0.1\sim 0.1 bits for 1024-QAM with Gray

  • GMI scaling s1s \ne 1 could, in principle, be implemented by an LLR-scaling tap; not done in current silicon

📋 Ref: 3GPP TS 38.212, Rel-17
,

Common Mistake: GMI Is a Lower Bound on Achievable Rate, Not the Ultimate Capacity

Mistake:

Concluding that supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s) is the capacity of the mismatched BICM channel — i.e., that no code on the BICM channel could achieve a rate higher than supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s).

Correction:

The GMI supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s) is the largest rate achievable by a random code with i.i.d. uniform codeword letters under the given mismatched metric. It is an achievability result (a lower bound on the mismatched capacity). The true mismatched capacity — the largest rate achievable by any code — may be higher: structured codes (with non-i.i.d. letter distributions, or with constant-composition codewords) can in principle achieve rates above the GMI, an effect known as the Csiszár-Narayan capacity.

For BICM with Gray labelling on AWGN this gap is negligible in practice — the GMI and the Csiszár-Narayan capacity coincide to far more decimal places than any simulation can resolve — so the BICM community treats "BICM capacity = supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s)" as an equality. But be aware that it is formally a lower bound. See Csiszár-Körner §10.3 and Ganti-Lapidoth-Telatar (1999) for the more refined mismatched-capacity theory.

,

Quick Check

The "mismatch" in BICM's mismatched decoding refers to

the BICM decoder using the product bit metric pW(yb)\prod_\ell p_{W_\ell}(y\mid b_\ell) instead of the true symbol likelihood p(yx)p(y\mid x)

the receiver using the wrong noise variance σ22{\sigma^2}^{2} in the LLR formula

the binary decoder using the wrong code rate

the interleaver not being long enough, so successive bits remain correlated

Mismatched Decoding

A decoding framework in which the receiver's maximum-metric rule uses a metric q(y,x)q(y, x) that differs from the true channel likelihood p(yx)p(y\mid x). Studied since Merhav-Kaplan-Lapidoth-Shamai (1994); the achievable rate with a random code is the generalised mutual information IGMI(s)I^{\mathrm{GMI}}(s), parameterised by a scaling s>0s > 0.

Related: Generalised Mutual Information, The BICM Product Bit Metric, Random Coding

Product Bit Metric (BICM)

The decoding metric qBICM(y,x)==0L1pW(yb)q_{\rm BICM}(y, x) = \prod_{\ell=0}^{L-1} p_{W_\ell}(y\mid b_\ell) used by every BICM receiver. It replaces the joint symbol likelihood by a product of per-position marginals, a mismatched replacement that enables bit-wise decoding at a small information-theoretic cost.

Related: Mismatched Maximum-Metric Decoding, Bit Channel (BICM), Consistent-Gaussian LLR Assumption

Generalised Mutual Information (GMI)

The rate IGMI(s)=E[logq(Y,X)sEXˉ[q(Y,Xˉ)s]]I^{\mathrm{GMI}}(s) = \mathbb{E}\left[\log \frac{q(Y,X)^s} {\mathbb{E}_{\bar X}[q(Y,\bar X)^s]}\right] achievable by a random code with i.i.d.\ uniform letters under a mismatched metric qq and decoder scaling ss. For BICM with q=qBICMq = q_{\rm BICM} and s=1s = 1, the GMI equals the Caire-Taricco-Biglieri BICM capacity C\sum_\ell C_\ell.

Related: Mismatched Maximum-Metric Decoding, Product Bit Metric (BICM), BICM Capacity