Ferkans — Interactive Telecom Tutor

The Decoder BICM Actually Uses Is Wrong

If you wrote down, from scratch, the optimal maximum-likelihood decoder for the BICM channel, you would sum over all $M = 2^L$ constellation symbols compatible with a given binary codeword label — a joint symbol likelihood that couples the $L$ bits of every symbol through the Euclidean geometry of the constellation. No real receiver does this. What every 5G NR, Wi-Fi, and DVB-S2 BICM receiver actually runs is: compute $L$ soft per-bit LLRs from the demapper, de-interleave, and feed a standard binary decoder that multiplies the per-bit LLRs. That is the product bit metric — and it is a mismatched metric: it is not the true likelihood of the codeword given $y$ .

The point of this section — and the central contribution of the 2008 Foundations & Trends monograph of Guillén i Fàbregas, Martínez, and Caire — is to take this observation seriously. Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994: the rate achievable by a random code when the decoder uses a metric $q$ instead of the true likelihood $p$ is not the mutual information $I(Y; X)$ , but the generalised mutual information $I^{\mathrm{GMI}}(s)$ , parameterised by a decoder scaling $s$ . What Guillén-Martínez-Caire showed is that this classical framework is exactly the right tool for analysing BICM.

Under their reframing, three previously fuzzy aspects of BICM become crystal-clear: (i) the Caire-Taricco-Biglieri capacity formula $\sum_\ell C_\ell$ is exactly the GMI at $s = 1$ for the product bit metric, turning a heuristic formula into a rigorous GMI expression; (ii) the decoder scaling $s$ is a tunable knob — when the demapper's LLRs are suboptimally computed (e.g., max-log, quantised, mismatched to the noise variance), a scaling $s \ne 1$ recovers part of the loss; (iii) the error-exponent analysis for BICM lifts cleanly from Gallager's $E_0(\rho)$ applied to the product metric — the topic of §4.

This section develops the mismatched-decoding framework, derives the BICM product metric as the specific mismatched metric that BICM uses, and introduces the GMI in enough detail to build the main achievability theorem of §3. The CommIT contribution is the 2008 monograph that put all of this on a unified rigorous footing.

,

Definition:
Mismatched Maximum-Metric Decoding

Let $\mathcal{C} \subset \mathcal{X}^N$ be a codebook of blocklength $N$ on a memoryless channel $p(y\mid x)$ with output $\mathbf{y} \in \mathcal{Y}^N$ . A mismatched maximum-metric decoder with metric $q: \mathcal{Y} \times \mathcal{X} \to \mathbb{R}_{\ge 0}$ selects the codeword that maximises the product metric across symbols: $\hat{\mathbf{x}} \;=\; \arg\max_{\mathbf{x} \in \mathcal{C}} \prod_{n=1}^{N} q(y_n, x_n) \;=\; \arg\max_{\mathbf{x} \in \mathcal{C}} \sum_{n=1}^{N} \log q(y_n, x_n).$ The decoder is matched if $q(y, x) = p(y\mid x)$ (up to a codeword-independent normalisation); mismatched otherwise.

The mismatch metric $q$ is a design choice of the receiver — generally chosen for implementation convenience, not for information-theoretic optimality. Operational questions:

What rate is reliably achievable by a random code with a given mismatched decoder?
How far is this rate below the matched-decoder capacity $I(Y; X)$ ?
Is there a free scaling of $q$ that can improve the achievable rate?

Mismatched decoding is not a theoretical pathology. In BICM, as we will see below, the receiver knows the true channel law $p(y\mid x)$ but chooses to use the product bit metric because it decouples the demapper from the binary decoder — a massive engineering win at a small information-theoretic cost. The same is true for bit-wise LDPC decoders in 5G NR, sphere decoders under early termination, and log-max receivers in practice.

, ,

Definition:
The BICM Product Bit Metric

For the BICM system of Chapter 5, let $\mu: \{0,1\}^L \to \mathcal{X}$ be the labelling. The BICM bit metric (or product bit metric) is $q_{\rm BICM}(y, x) \;=\; \prod_{\ell = 0}^{L-1} p_{W_\ell}\!\big(y \mid \mu^{-1}(x)_\ell \big) \;=\; \prod_{\ell = 0}^{L-1} q_\ell(y, b_\ell), \quad b_\ell = \mu^{-1}(x)_\ell,$ where each per-bit factor $q_\ell(y, b)$ is the marginal per-position transition law $p_{W_\ell}(y\mid b) = \tfrac{2}{M}\sum_{x'\in \mathcal{X}_\ell^{(b)}} p(y\mid x')$ from Def. $L$ $L$ Parallel Binary Channels" data-ref-type="definition">DBICM as $L$ Parallel Binary Channels. Writing this in LLR form: if $\lambda_\ell(y) = \log \frac{p_{W_\ell}(y\mid 0)}{p_{W_\ell}(y\mid 1)}$ is the per-position LLR, then $\log q_{\rm BICM}(y, x) \;=\; -\sum_{\ell: b_\ell = 1} \lambda_\ell(y) \;+\; \mathrm{const}(y).$ The decoder adds these LLRs across all $N$ received symbols and selects the codeword with the largest total. This is exactly what a standard binary decoder does downstream of the demapper.

This metric is mismatched because the true symbol likelihood is $p(y\mid x) \ne \prod_\ell p_{W_\ell}(y\mid b_\ell)$ whenever the labelling induces dependencies among the bits conditional on $y$ — i.e., whenever $C_{\rm CM} > C_{\rm BICM}$ , which is generic for $M \ge 16$ .

Three practical subtleties to note:

The LLR formula above is the exact marginal demapping; max-log receivers use $\lambda_\ell \approx \tfrac{1}{{\sigma^2}^{2}}(\min_{x_0} \|y - x_0\|^2 - \min_{x_1} \|y - x_1\|^2)$ which is a further approximation (an additional source of mismatch, studied in §5).
The constant $\mathrm{const}(y)$ cancels in all codeword comparisons — it does not affect the decoder's decision.
Calling $q_{\rm BICM}$ "mismatched" is a statement about the receiver's modelling of the channel. The transmitter has not done anything wrong; it is the decoder that has chosen a suboptimal metric in exchange for architectural simplicity.

,

🎓CommIT Contribution(2008)

BICM as Mismatched Decoding (Generalised Mutual Information)

A. Guillén i Fàbregas, A. Martinez, G. Caire — Foundations and Trends in Communications and Information Theory, vol. 5, no. 1–2, pp. 1–153

The 153-page Foundations & Trends monograph of Albert Guillén i Fàbregas, Alfonso Martínez, and Giuseppe Caire is the definitive information-theoretic treatment of BICM. Building on the 1998 Caire- Taricco-Biglieri paper (Ch. 5), it recasts BICM as a mismatched- decoding problem and puts the entire analysis of capacity, error exponents, and PEP on the footing of Gallager's 1968 random-coding machinery applied to the product bit metric. Its technical contributions are four-fold and are the backbone of Chapters 7–9 of this book.

(i) BICM is a mismatched decoder; the natural rate is the GMI. The BICM receiver uses the product bit metric $q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y \mid b_\ell)$ in place of the true symbol likelihood $p(y \mid x)$ . Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994 — the achievable rate is not $I(Y; X)$ but the generalised mutual information $I^{\mathrm{GMI}}(s)$ , a function of the decoder scaling $s > 0$ . Guillén-Martínez-Caire show that the Caire-Taricco-Biglieri capacity $\sum_\ell C_\ell$ is exactly $I^{\mathrm{GMI}}(1)$ for the product metric — elevating the 1998 formula from a heuristic derivation to a rigorous mismatched-decoding rate theorem.

(ii) The decoder scaling $s$ is a tunable knob. For Gray labelling on M-QAM at high SNR the optimal $s^\star \to 1$ — the BICM capacity is the GMI at $s = 1$ and no scaling helps. At low SNR, or with non-Gray labellings, $s^\star \ne 1$ , and the largest achievable rate $\sup_{s > 0} I^{\mathrm{GMI}}(s)$ exceeds the naive BICM capacity $I^{\mathrm{GMI}}(1)$ by a measurable margin. This explains why, at very low SNR, one observes BICM rates slightly above the CTB formula.

(iii) BICM error exponent via Gallager's product-metric $E_0(\rho)$ . Applying Gallager's random-coding bound to the product bit metric gives the BICM random-coding exponent $E_r^{\mathrm{BICM}}(R) = \max_\rho [E_0^{\mathrm{BICM}}(\rho) - \rho R]$ in closed form. A direct computation shows $E_r^{\mathrm{BICM}}(R) \le E_r^{\mathrm{CM}}(R)$ at every rate, with equality at $R = 0$ , and the gap under Gray labelling is small over the operational range. This is the central result of §4 of the monograph and §4 of this chapter.

(iv) Extensions to fading, block fading, and MIMO. Chapters 6–8 of the monograph extend the GMI framework to outage analysis on block-fading channels (the BICM outage capacity reduces to a per-symbol GMI computation), to MIMO BICM (where the bit metric becomes a marginalised log-likelihood over a joint space), and to BICM-ID (with decoder feedback, treated in Ch. 8 of this book). The GMI framework scales; the Caire-Taricco-Biglieri formula does not, except through the GMI reinterpretation.

Why it redefined the theory of BICM. Before 2008, the BICM capacity formula $\sum_\ell I(Y; B_\ell)$ was known to be achievable by the parallel-channel argument of Chapter 5, but its information-theoretic converse — the statement that no mismatched decoder using the product metric can exceed it — was folkloric. Guillén-Martínez-Caire supplied the converse via the mismatched-decoding GMI, closed the capacity gap, and gave the unified framework under which all subsequent BICM papers (on error exponents, labelling design, iterative decoding, and standards analysis) have been written. This monograph is the single most authoritative reference on BICM and is the central CommIT contribution of Chapter 7 of this book.

bicmgmimismatched-decodingerror-exponentcutoff-rategallagerrandom-codingView Paper →

CM MI, BICM GMI, and Shannon Capacity vs SNR

Three rate curves for $M$ -QAM: (i) the unconstrained Shannon capacity $\log_2(1 + \text{SNR})$ ; (ii) the CM capacity $C_{\rm CM} = I(Y; X)$ (the matched ML decoder on the constellation); and (iii) the BICM GMI at $s = 1$ , which coincides with the Caire-Taricco- Biglieri capacity $\sum_\ell C_\ell$ . The three curves converge at low SNR, where Shannon is tight; they separate at moderate SNR by $\lesssim 0.3$ dB (Gray QAM); they saturate at $\log_2 M$ at high SNR (the constellation-constrained ceiling). The BICM-to-CM gap is the mismatched-decoding penalty; the CM-to-Shannon gap is the shaping gap closed in Chapter 9 by probabilistic amplitude shaping.

Parameters

QAM size

M

Theorem: The BICM Product Metric Is Strictly Mismatched for $M \ge 16$

For any memoryless channel $p(y\mid x)$ with $M \ge 16$ constellation points $\mathcal{X}$ and any labelling $\mu$ , the BICM product bit metric $q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell)$ is strictly different from the true symbol likelihood $p(y\mid x)$ on a set of output pairs $(x, y)$ of positive probability. Consequently, $q_{\rm BICM}$ is a strictly mismatched metric in the sense of Def. DMismatched Maximum-Metric Decoding, and its associated achievable rate (computed rigorously in §3) is strictly less than $C_{\rm CM}$ for generic channels.

Equivalently, the chain-rule gap $C_{\rm CM} - \sum_{\ell=0}^{L-1} I(Y; B_\ell) \;=\; \sum_{\ell = 1}^{L-1} I(B_\ell; B_{<\ell}\mid Y) \;>\; 0$ is strictly positive unless the labelling induces conditional independence of the label bits given $Y$ , which fails for every $M \ge 16$ on AWGN.

Two different constellation points can produce similar $y$ 's only by accident of Euclidean geometry — that is, by having several label bits in common. Such coincidences are recorded in the joint $p(y\mid x)$ but lost in the product $\prod_\ell p_{W_\ell}(y\mid b_\ell)$ , because the product factor $p_{W_\ell}$ has already averaged out the other bits. The mismatch is the price of that averaging.

Show Hint

Pick two constellation points $x_1, x_2$ at the same Euclidean distance from some $y_0$ but whose labels agree in $k < L$ positions.

The product metric $q_{\rm BICM}(y_0, x_1)$ depends only on position-wise marginals — it doesn't see the joint structure.

The full likelihood $p(y_0 \mid x_1)$ evaluated vs. $p(y_0 \mid x_2)$ records the Euclidean geometry faithfully.

Proof

Write both metrics explicitly

For any $(x, y)$ , by definition of $p_{W_\ell}$ , $q_{\rm BICM}(y, x) \;=\; \prod_{\ell=0}^{L-1} \frac{2}{M} \sum_{x' \in \mathcal{X}_\ell^{(b_\ell)}} p(y\mid x'), \quad b_\ell = \mu^{-1}(x)_\ell.$ The product expands into a sum over $(2^L = M)$ index tuples, only one of which equals the all- $x$ term $p(y\mid x)^{L}$ — the rest are cross-terms involving $p(y\mid x') p(y\mid x'') \cdots$ for $x', x'' \ne x$ .

Identify where the mismatch lives

The ratio $\frac{q_{\rm BICM}(y, x)}{p(y\mid x)} \;=\; \prod_{\ell=0}^{L-1} \frac{(2/M)\sum_{x'\in\mathcal{X}_\ell^{(b_\ell)}} p(y\mid x')}{p(y\mid x)}$ equals 1 iff, for every $\ell$ , the numerator reduces to exactly $p(y\mid x) \cdot (2/M)$ — i.e., iff for every $\ell$ the only symbol $x' \in \mathcal{X}_\ell^{(b_\ell)}$ with non-negligible $p(y\mid x')$ is $x$ itself. This is the asymptotic high-SNR regime (decision region of $x$ dominates for all sensible $y$ ).

Low-to-moderate SNR violates the condition

At moderate SNR the decision regions overlap; several $x'$ contribute non-negligibly to the $\ell$ -th subset sum for each $\ell$ . The ratio $q_{\rm BICM}/p$ is bounded away from 1, and the two metrics induce different codeword rankings on codewords that would be tied under the matched metric. Hence $q_{\rm BICM}$ is strictly mismatched.

Tie to the chain-rule gap

An equivalent information-theoretic statement is the chain-rule identity of Ch. 5: $I(Y; X) - \sum_\ell I(Y; B_\ell) = \sum_{\ell=1}^{L-1} I(B_\ell; B_{<\ell}\mid Y).$ The right-hand side vanishes iff the $\ell$ -th label bit is conditionally independent of the earlier label bits given $Y$ . For generic QAM+AWGN with $M \ge 16$ this fails at every SNR, so the right-hand side is strictly positive. The strict positivity of the chain-rule residual is precisely the strict mismatch of $q_{\rm BICM}$ . $\blacksquare$

,

Example: 16-QAM at 10 dB: The Size of the Mismatch

For 16-QAM with Gray labelling at $\text{SNR} = 10$ dB on AWGN, compute (numerically) the CM capacity $C_{\rm CM}$ , the BICM GMI at $s = 1$ (= $\sum_\ell C_\ell$ ), and the mismatch gap in bits and in dB of SNR-equivalent.

Solution

Read off the rates

At $\text{SNR} = 10$ dB, numerical evaluation of the 16-QAM AWGN capacity integrals (see interactive plot 📊CM MI, BICM GMI, and Shannon Capacity vs SNR) gives $C_{\rm CM} \approx 3.21 \text{ bits/symbol}, \quad C_{\rm BICM,Gray} \approx 3.16 \text{ bits/symbol}.$ (The Shannon capacity at 10 dB is $\log_2(1+10) \approx 3.46$ bits, so both 16-QAM constraints are within $\approx 0.3$ bits of Shannon, and the BICM-to-CM gap is only $\approx 0.05$ bits — tiny.)

Convert to dB SNR equivalent

At this SNR, the capacity slope $dC_{\rm CM}/d(\text{SNR}_{\rm dB}) \approx 0.14$ bits/dB. A rate loss of $0.05$ bits therefore corresponds to an SNR penalty of $0.05 / 0.14 \approx 0.36$ dB. This is the Gray-BICM penalty on AWGN at practical SNRs.

Contrast with SP labelling

Under SP labelling the BICM capacity is $C_{\rm BICM,SP} \approx 2.77$ bits (see Example EThe Four Bit Channels of 16-QAM at 10 dB). The CM-to-SP gap is $3.21 - 2.77 \approx 0.44$ bits, or roughly $3$ dB of SNR. Gray's tight $\approx 0.36$ dB penalty is the whole reason Gray dominates the standards.

Tuning $s$ at moderate SNR

Forward reference: in §3 we will see that the largest achievable BICM rate is $\sup_s I^{\mathrm{GMI}}(s)$ , not just $I^{\mathrm{GMI}}(1)$ . For Gray 16-QAM at 10 dB, $s^\star \approx 1.0$ to four decimal places, and $\sup_s I^{\mathrm{GMI}}(s) \approx I^{\mathrm{GMI}}(1)$ . The decoder scaling $s$ only pays off at low SNR and with non- Gray labellings. This is the operational content of the GMI saddle-point result of §3.

,

⚠️Engineering Note

The Product Bit Metric in 5G NR LDPC Decoding

Every 5G NR user-plane receiver is a BICM mismatched decoder. The constellation demapper produces one approximate LLR per coded bit (often max-log, often quantised to 5–6 bits), and the LDPC decoder treats these LLRs as independent per-bit observations of a binary channel. The decoder has no access to the joint symbol likelihood; it only sees the product-metric sum of log-LLRs. Hence, from §2's perspective, every 5G NR uplink/downlink is operating with the mismatched product bit metric $q_{\rm BICM}$ that this chapter analyses.

The information-theoretic consequence is that the achievable rate is $I^{\mathrm{GMI}}(s)$ , not $C_{\rm CM}$ . For the QPSK–1024-QAM range used by 5G NR with Gray labelling, the gap $C_{\rm CM} - \sup_s I^{\mathrm{GMI}}(s)$ ranges from $0$ (QPSK, exactly matched) to $\approx 0.1$ bits/symbol (1024-QAM at moderate SNR) — this is the fundamental limit of a 5G NR receiver architecture, below which 5G cannot go without abandoning BICM. The remaining gap to Shannon is covered by probabilistic amplitude shaping, which is being standardised for beyond-5G in the 3GPP R18 evaluations (see also Ch. 9).

Practical Constraints

•
LDPC decoder input is per-bit LLR, not per-symbol log-likelihood
•
Max-log demapper introduces an additional small mismatch on top of the product-metric mismatch
•
BICM-to-CM gap bounded by $\sim 0.1$ bits for 1024-QAM with Gray
•
GMI scaling $s \ne 1$ could, in principle, be implemented by an LLR-scaling tap; not done in current silicon

📋 Ref: 3GPP TS 38.212, Rel-17

,

Common Mistake: GMI Is a Lower Bound on Achievable Rate, Not the Ultimate Capacity

Mistake:

Concluding that $\sup_s I^{\mathrm{GMI}}(s)$ is the capacity of the mismatched BICM channel — i.e., that no code on the BICM channel could achieve a rate higher than $\sup_s I^{\mathrm{GMI}}(s)$ .

Correction:

The GMI $\sup_s I^{\mathrm{GMI}}(s)$ is the largest rate achievable by a random code with i.i.d. uniform codeword letters under the given mismatched metric. It is an achievability result (a lower bound on the mismatched capacity). The true mismatched capacity — the largest rate achievable by any code — may be higher: structured codes (with non-i.i.d. letter distributions, or with constant-composition codewords) can in principle achieve rates above the GMI, an effect known as the Csiszár-Narayan capacity.

For BICM with Gray labelling on AWGN this gap is negligible in practice — the GMI and the Csiszár-Narayan capacity coincide to far more decimal places than any simulation can resolve — so the BICM community treats "BICM capacity = $\sup_s I^{\mathrm{GMI}}(s)$ " as an equality. But be aware that it is formally a lower bound. See Csiszár-Körner §10.3 and Ganti-Lapidoth-Telatar (1999) for the more refined mismatched-capacity theory.

,

Quick Check

The "mismatch" in BICM's mismatched decoding refers to

the BICM decoder using the product bit metric $\prod_\ell p_{W_\ell}(y\mid b_\ell)$ instead of the true symbol likelihood $p(y\mid x)$

the receiver using the wrong noise variance ${\sigma^2}^{2}$ in the LLR formula

the binary decoder using the wrong code rate

the interleaver not being long enough, so successive bits remain correlated

Correction:

the BICM decoder using the product bit metric

\prod_\ell p_{W_\ell}(y\mid b_\ell)

instead of the true symbol likelihood

p(y\mid x)

Exactly. The product bit metric is the natural output of a bit-wise demapper; the true likelihood requires joint $M$ -ary processing. BICM deliberately uses the former for architectural modularity, paying an information-theoretic penalty quantified by the GMI of §3.

Mismatched Decoding

A decoding framework in which the receiver's maximum-metric rule uses a metric $q(y, x)$ that differs from the true channel likelihood $p(y\mid x)$ . Studied since Merhav-Kaplan-Lapidoth-Shamai (1994); the achievable rate with a random code is the generalised mutual information $I^{\mathrm{GMI}}(s)$ , parameterised by a scaling $s > 0$ .

Product Bit Metric (BICM)

The decoding metric $q_{\rm BICM}(y, x) = \prod_{\ell=0}^{L-1} p_{W_\ell}(y\mid b_\ell)$ used by every BICM receiver. It replaces the joint symbol likelihood by a product of per-position marginals, a mismatched replacement that enables bit-wise decoding at a small information-theoretic cost.

Generalised Mutual Information (GMI)

The rate $I^{\mathrm{GMI}}(s) = \mathbb{E}\left[\log \frac{q(Y,X)^s} {\mathbb{E}_{\bar X}[q(Y,\bar X)^s]}\right]$ achievable by a random code with i.i.d.\ uniform letters under a mismatched metric $q$ and decoder scaling $s$ . For BICM with $q = q_{\rm BICM}$ and $s = 1$ , the GMI equals the Caire-Taricco-Biglieri BICM capacity $\sum_\ell C_\ell$ .

BICM as Mismatched Decoding