BICM as Mismatched Decoding
The Decoder BICM Actually Uses Is Wrong
If you wrote down, from scratch, the optimal maximum-likelihood decoder for the BICM channel, you would sum over all constellation symbols compatible with a given binary codeword label — a joint symbol likelihood that couples the bits of every symbol through the Euclidean geometry of the constellation. No real receiver does this. What every 5G NR, Wi-Fi, and DVB-S2 BICM receiver actually runs is: compute soft per-bit LLRs from the demapper, de-interleave, and feed a standard binary decoder that multiplies the per-bit LLRs. That is the product bit metric — and it is a mismatched metric: it is not the true likelihood of the codeword given .
The point of this section — and the central contribution of the 2008 Foundations & Trends monograph of Guillén i Fàbregas, Martínez, and Caire — is to take this observation seriously. Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994: the rate achievable by a random code when the decoder uses a metric instead of the true likelihood is not the mutual information , but the generalised mutual information , parameterised by a decoder scaling . What Guillén-Martínez-Caire showed is that this classical framework is exactly the right tool for analysing BICM.
Under their reframing, three previously fuzzy aspects of BICM become crystal-clear: (i) the Caire-Taricco-Biglieri capacity formula is exactly the GMI at for the product bit metric, turning a heuristic formula into a rigorous GMI expression; (ii) the decoder scaling is a tunable knob — when the demapper's LLRs are suboptimally computed (e.g., max-log, quantised, mismatched to the noise variance), a scaling recovers part of the loss; (iii) the error-exponent analysis for BICM lifts cleanly from Gallager's applied to the product metric — the topic of §4.
This section develops the mismatched-decoding framework, derives the BICM product metric as the specific mismatched metric that BICM uses, and introduces the GMI in enough detail to build the main achievability theorem of §3. The CommIT contribution is the 2008 monograph that put all of this on a unified rigorous footing.
Definition: Mismatched Maximum-Metric Decoding
Mismatched Maximum-Metric Decoding
Let be a codebook of blocklength on a memoryless channel with output . A mismatched maximum-metric decoder with metric selects the codeword that maximises the product metric across symbols: The decoder is matched if (up to a codeword-independent normalisation); mismatched otherwise.
The mismatch metric is a design choice of the receiver — generally chosen for implementation convenience, not for information-theoretic optimality. Operational questions:
- What rate is reliably achievable by a random code with a given mismatched decoder?
- How far is this rate below the matched-decoder capacity ?
- Is there a free scaling of that can improve the achievable rate?
Mismatched decoding is not a theoretical pathology. In BICM, as we will see below, the receiver knows the true channel law but chooses to use the product bit metric because it decouples the demapper from the binary decoder — a massive engineering win at a small information-theoretic cost. The same is true for bit-wise LDPC decoders in 5G NR, sphere decoders under early termination, and log-max receivers in practice.
Definition: The BICM Product Bit Metric
The BICM Product Bit Metric
For the BICM system of Chapter 5, let be the labelling. The BICM bit metric (or product bit metric) is where each per-bit factor is the marginal per-position transition law from Def. Parallel Binary Channels" data-ref-type="definition">DBICM as Parallel Binary Channels. Writing this in LLR form: if is the per-position LLR, then The decoder adds these LLRs across all received symbols and selects the codeword with the largest total. This is exactly what a standard binary decoder does downstream of the demapper.
This metric is mismatched because the true symbol likelihood is whenever the labelling induces dependencies among the bits conditional on — i.e., whenever , which is generic for .
Three practical subtleties to note:
- The LLR formula above is the exact marginal demapping; max-log receivers use which is a further approximation (an additional source of mismatch, studied in §5).
- The constant cancels in all codeword comparisons — it does not affect the decoder's decision.
- Calling "mismatched" is a statement about the receiver's modelling of the channel. The transmitter has not done anything wrong; it is the decoder that has chosen a suboptimal metric in exchange for architectural simplicity.
BICM as Mismatched Decoding (Generalised Mutual Information)
The 153-page Foundations & Trends monograph of Albert Guillén i Fàbregas, Alfonso Martínez, and Giuseppe Caire is the definitive information-theoretic treatment of BICM. Building on the 1998 Caire- Taricco-Biglieri paper (Ch. 5), it recasts BICM as a mismatched- decoding problem and puts the entire analysis of capacity, error exponents, and PEP on the footing of Gallager's 1968 random-coding machinery applied to the product bit metric. Its technical contributions are four-fold and are the backbone of Chapters 7–9 of this book.
(i) BICM is a mismatched decoder; the natural rate is the GMI. The BICM receiver uses the product bit metric in place of the true symbol likelihood . Mismatched decoding has been studied since Merhav-Kaplan-Lapidoth-Shamai 1994 — the achievable rate is not but the generalised mutual information , a function of the decoder scaling . Guillén-Martínez-Caire show that the Caire-Taricco-Biglieri capacity is exactly for the product metric — elevating the 1998 formula from a heuristic derivation to a rigorous mismatched-decoding rate theorem.
(ii) The decoder scaling is a tunable knob. For Gray labelling on M-QAM at high SNR the optimal — the BICM capacity is the GMI at and no scaling helps. At low SNR, or with non-Gray labellings, , and the largest achievable rate exceeds the naive BICM capacity by a measurable margin. This explains why, at very low SNR, one observes BICM rates slightly above the CTB formula.
(iii) BICM error exponent via Gallager's product-metric . Applying Gallager's random-coding bound to the product bit metric gives the BICM random-coding exponent in closed form. A direct computation shows at every rate, with equality at , and the gap under Gray labelling is small over the operational range. This is the central result of §4 of the monograph and §4 of this chapter.
(iv) Extensions to fading, block fading, and MIMO. Chapters 6–8 of the monograph extend the GMI framework to outage analysis on block-fading channels (the BICM outage capacity reduces to a per-symbol GMI computation), to MIMO BICM (where the bit metric becomes a marginalised log-likelihood over a joint space), and to BICM-ID (with decoder feedback, treated in Ch. 8 of this book). The GMI framework scales; the Caire-Taricco-Biglieri formula does not, except through the GMI reinterpretation.
Why it redefined the theory of BICM. Before 2008, the BICM capacity formula was known to be achievable by the parallel-channel argument of Chapter 5, but its information-theoretic converse — the statement that no mismatched decoder using the product metric can exceed it — was folkloric. Guillén-Martínez-Caire supplied the converse via the mismatched-decoding GMI, closed the capacity gap, and gave the unified framework under which all subsequent BICM papers (on error exponents, labelling design, iterative decoding, and standards analysis) have been written. This monograph is the single most authoritative reference on BICM and is the central CommIT contribution of Chapter 7 of this book.
CM MI, BICM GMI, and Shannon Capacity vs SNR
Three rate curves for -QAM: (i) the unconstrained Shannon capacity ; (ii) the CM capacity (the matched ML decoder on the constellation); and (iii) the BICM GMI at , which coincides with the Caire-Taricco- Biglieri capacity . The three curves converge at low SNR, where Shannon is tight; they separate at moderate SNR by dB (Gray QAM); they saturate at at high SNR (the constellation-constrained ceiling). The BICM-to-CM gap is the mismatched-decoding penalty; the CM-to-Shannon gap is the shaping gap closed in Chapter 9 by probabilistic amplitude shaping.
Parameters
Theorem: The BICM Product Metric Is Strictly Mismatched for
For any memoryless channel with constellation points and any labelling , the BICM product bit metric is strictly different from the true symbol likelihood on a set of output pairs of positive probability. Consequently, is a strictly mismatched metric in the sense of Def. DMismatched Maximum-Metric Decoding, and its associated achievable rate (computed rigorously in §3) is strictly less than for generic channels.
Equivalently, the chain-rule gap is strictly positive unless the labelling induces conditional independence of the label bits given , which fails for every on AWGN.
Two different constellation points can produce similar 's only by accident of Euclidean geometry — that is, by having several label bits in common. Such coincidences are recorded in the joint but lost in the product , because the product factor has already averaged out the other bits. The mismatch is the price of that averaging.
Pick two constellation points at the same Euclidean distance from some but whose labels agree in positions.
The product metric depends only on position-wise marginals — it doesn't see the joint structure.
The full likelihood evaluated vs. records the Euclidean geometry faithfully.
Write both metrics explicitly
For any , by definition of , The product expands into a sum over index tuples, only one of which equals the all- term — the rest are cross-terms involving for .
Identify where the mismatch lives
The ratio equals 1 iff, for every , the numerator reduces to exactly — i.e., iff for every the only symbol with non-negligible is itself. This is the asymptotic high-SNR regime (decision region of dominates for all sensible ).
Low-to-moderate SNR violates the condition
At moderate SNR the decision regions overlap; several contribute non-negligibly to the -th subset sum for each . The ratio is bounded away from 1, and the two metrics induce different codeword rankings on codewords that would be tied under the matched metric. Hence is strictly mismatched.
Tie to the chain-rule gap
An equivalent information-theoretic statement is the chain-rule identity of Ch. 5: The right-hand side vanishes iff the -th label bit is conditionally independent of the earlier label bits given . For generic QAM+AWGN with this fails at every SNR, so the right-hand side is strictly positive. The strict positivity of the chain-rule residual is precisely the strict mismatch of .
Example: 16-QAM at 10 dB: The Size of the Mismatch
For 16-QAM with Gray labelling at dB on AWGN, compute (numerically) the CM capacity , the BICM GMI at (= ), and the mismatch gap in bits and in dB of SNR-equivalent.
Read off the rates
At dB, numerical evaluation of the 16-QAM AWGN capacity integrals (see interactive plot 📊CM MI, BICM GMI, and Shannon Capacity vs SNR) gives (The Shannon capacity at 10 dB is bits, so both 16-QAM constraints are within bits of Shannon, and the BICM-to-CM gap is only bits — tiny.)
Convert to dB SNR equivalent
At this SNR, the capacity slope bits/dB. A rate loss of bits therefore corresponds to an SNR penalty of dB. This is the Gray-BICM penalty on AWGN at practical SNRs.
Contrast with SP labelling
Under SP labelling the BICM capacity is bits (see Example EThe Four Bit Channels of 16-QAM at 10 dB). The CM-to-SP gap is bits, or roughly dB of SNR. Gray's tight dB penalty is the whole reason Gray dominates the standards.
Tuning $s$ at moderate SNR
Forward reference: in §3 we will see that the largest achievable BICM rate is , not just . For Gray 16-QAM at 10 dB, to four decimal places, and . The decoder scaling only pays off at low SNR and with non- Gray labellings. This is the operational content of the GMI saddle-point result of §3.
The Product Bit Metric in 5G NR LDPC Decoding
Every 5G NR user-plane receiver is a BICM mismatched decoder. The constellation demapper produces one approximate LLR per coded bit (often max-log, often quantised to 5–6 bits), and the LDPC decoder treats these LLRs as independent per-bit observations of a binary channel. The decoder has no access to the joint symbol likelihood; it only sees the product-metric sum of log-LLRs. Hence, from §2's perspective, every 5G NR uplink/downlink is operating with the mismatched product bit metric that this chapter analyses.
The information-theoretic consequence is that the achievable rate is , not . For the QPSK–1024-QAM range used by 5G NR with Gray labelling, the gap ranges from (QPSK, exactly matched) to bits/symbol (1024-QAM at moderate SNR) — this is the fundamental limit of a 5G NR receiver architecture, below which 5G cannot go without abandoning BICM. The remaining gap to Shannon is covered by probabilistic amplitude shaping, which is being standardised for beyond-5G in the 3GPP R18 evaluations (see also Ch. 9).
- •
LDPC decoder input is per-bit LLR, not per-symbol log-likelihood
- •
Max-log demapper introduces an additional small mismatch on top of the product-metric mismatch
- •
BICM-to-CM gap bounded by bits for 1024-QAM with Gray
- •
GMI scaling could, in principle, be implemented by an LLR-scaling tap; not done in current silicon
Common Mistake: GMI Is a Lower Bound on Achievable Rate, Not the Ultimate Capacity
Mistake:
Concluding that is the capacity of the mismatched BICM channel — i.e., that no code on the BICM channel could achieve a rate higher than .
Correction:
The GMI is the largest rate achievable by a random code with i.i.d. uniform codeword letters under the given mismatched metric. It is an achievability result (a lower bound on the mismatched capacity). The true mismatched capacity — the largest rate achievable by any code — may be higher: structured codes (with non-i.i.d. letter distributions, or with constant-composition codewords) can in principle achieve rates above the GMI, an effect known as the Csiszár-Narayan capacity.
For BICM with Gray labelling on AWGN this gap is negligible in practice — the GMI and the Csiszár-Narayan capacity coincide to far more decimal places than any simulation can resolve — so the BICM community treats "BICM capacity = " as an equality. But be aware that it is formally a lower bound. See Csiszár-Körner §10.3 and Ganti-Lapidoth-Telatar (1999) for the more refined mismatched-capacity theory.
Quick Check
The "mismatch" in BICM's mismatched decoding refers to
the BICM decoder using the product bit metric instead of the true symbol likelihood
the receiver using the wrong noise variance in the LLR formula
the binary decoder using the wrong code rate
the interleaver not being long enough, so successive bits remain correlated
Exactly. The product bit metric is the natural output of a bit-wise demapper; the true likelihood requires joint -ary processing. BICM deliberately uses the former for architectural modularity, paying an information-theoretic penalty quantified by the GMI of §3.
Mismatched Decoding
A decoding framework in which the receiver's maximum-metric rule uses a metric that differs from the true channel likelihood . Studied since Merhav-Kaplan-Lapidoth-Shamai (1994); the achievable rate with a random code is the generalised mutual information , parameterised by a scaling .
Related: Generalised Mutual Information, The BICM Product Bit Metric, Random Coding
Product Bit Metric (BICM)
The decoding metric used by every BICM receiver. It replaces the joint symbol likelihood by a product of per-position marginals, a mismatched replacement that enables bit-wise decoding at a small information-theoretic cost.
Related: Mismatched Maximum-Metric Decoding, Bit Channel (BICM), Consistent-Gaussian LLR Assumption
Generalised Mutual Information (GMI)
The rate achievable by a random code with i.i.d.\ uniform letters under a mismatched metric and decoder scaling . For BICM with and , the GMI equals the Caire-Taricco-Biglieri BICM capacity .
Related: Mismatched Maximum-Metric Decoding, Product Bit Metric (BICM), BICM Capacity