Generalised Mutual Information
The Rate That Mismatched BICM Actually Achieves
We now turn §2's mismatched-decoding framework into a rate that we can compute and compare with the CM capacity. The object we need is the generalised mutual information — a one-parameter family of lower bounds on the mismatched capacity, indexed by a decoder scaling parameter .
The story goes like this. A random-coding argument over i.i.d. uniform codebooks, with a decoder that maximises in place of the true likelihood, produces a reliability condition of the form — a rate threshold that depends on through a Chernoff-style bound on the pairwise error probability. The largest such rate, optimised over , is the best achievable rate the random-coding argument can deliver: This is Theorem 3.1 of the Guillén-Martínez-Caire monograph for BICM, and it is the central achievability result of this chapter.
Two operational facts stand out. First, at the "natural" choice , the GMI of the BICM product metric reduces exactly to the Caire-Taricco-Biglieri capacity of Chapter 5 — a beautiful conceptual closure: the heuristic BICM formula is a rigorous mismatched-decoding achievability result at . Second, there is a saddle-point structure to the optimisation in : for Gray labelling on QAM at high SNR, ; at low SNR, and the sup is achieved strictly above . Operationally, the scaling acts as a "temperature" on the decoder — at low temperature ( large) the decoder is overconfident; at high temperature ( small) it averages away useful signal. The optimum balances these two effects.
This section develops the GMI formula, proves the achievability theorem, identifies the specialisation with the CTB formula, and shows when and how the scaling departs from 1.
Definition: Generalised Mutual Information
Generalised Mutual Information
Let be a memoryless channel, a decoding metric, and an input distribution on . The generalised mutual information (GMI) at decoder scaling is where the inner expectation is over an independent copy of the input. Equivalently, The "mutual information" nomenclature is historical: at and , the GMI reduces to the classical mutual information .
For the BICM product metric with uniform input over , the GMI simplifies because and the inner expectation factorises over bit positions (uniform inputs make the label bits marginally i.i.d.\ uniform binary):
Three points of intuition:
(i) The numerator is the metric evaluated on the transmitted bit ; the denominator averages over all possible inputs. The ratio is large when the true wins the metric competition by a wide margin — which is what enables reliable decoding.
(ii) At , , uniform : the GMI is Shannon's mutual information . For mismatched , the GMI is strictly less in general, and can be improved by choosing .
(iii) The GMI is concave in for the product metric (this is a lemma from convexity of ), so is attained at an interior point satisfying the first-order condition . The saddle-point structure makes the optimisation numerically painless.
Theorem: BICM Achievability via GMI (Guillén-Martínez-Caire)
Consider a BICM system with labelling , memoryless channel , uniform inputs , and the BICM product bit metric .
(Achievability.) For every rate there exists a sequence of block codes of blocklength whose error probability under the mismatched decoder tends to zero as . The largest such rate is
(CTB specialisation.) At , the GMI reduces to the Caire- Taricco-Biglieri BICM capacity:
(Gray-optimality of .) For Gray labelling on square -QAM over AWGN at high SNR, , and therefore . For non-Gray labellings and/or low SNR, and exceeds strictly.
The random-coding argument for mismatched decoding is almost word-for- word Shannon's 1948 random-coding argument, with two modifications. First, the maximum-likelihood decoder is replaced by the maximum-metric decoder; the metric is raised to the power before summing the log across symbols. Second, the Chernoff/tilting trick that bounds the pairwise error probability introduces as a free parameter — optimising over yields the best achievable rate from this argument.
The GMI reduces to Shannon's mutual information when the metric is the true likelihood and ; it reduces to the CTB BICM capacity when the metric is the product bit metric and . The GMI framework thus unifies matched and mismatched decoding into a single scalar achievability formula.
Use a random codebook of rate , letters i.i.d.\ uniform on , blocklength .
Bound the pairwise error probability by a Chernoff bound parameterised by .
Sum over the codewords via the union bound; the error probability tends to zero iff .
For the CTB specialisation, plug and into the GMI formula — the inner expectation becomes and the sum of log ratios equals .
Random code + mismatched decoder
Fix and . Draw codewords i.i.d.\ uniform on . The decoder outputs Conditional on transmitting , the pairwise error probability of () is
Chernoff bound with the scaling $s$
For any , by Markov: Choosing and using the i.i.d.\ codeword letters plus memoryless channel gives and after standard manipulation (see Ganti-Lapidoth-Telatar 1999 or Ch. 3 of the monograph) the per-letter factor is exactly .
Union bound over codewords
Union-bounding over the incorrect codewords, the ensemble-average error probability is bounded by This tends to zero iff (with a convention on the GMI). Optimising over gives the largest achievable rate .
CTB specialisation at $s = 1$
At , the inner expectation is, for the BICM product metric and uniform inputs, Therefore This is the Caire-Taricco-Biglieri formula. The GMI equals the BICM capacity of Ch. 5 exactly — proving that the CTB formula is a rigorous mismatched-decoding achievability result.
Gray-QAM asymptotics
For Gray labelling on square -QAM over AWGN at high SNR, the per-position bit channels become asymptotically symmetric and Gaussian, so the GMI of the product metric approaches the GMI of the genuine channel, and — i.e., . For non-Gray labellings the asymmetry between the per-position channels forces .
GMI vs Decoder Scaling at Fixed SNR
For fixed -QAM and fixed SNR, the GMI of the BICM product metric is a concave function of the decoder scaling parameter . This interactive plot sweeps and shows the GMI curve together with its maximiser (marked by a red dot) and the value (the CTB BICM capacity, marked by a horizontal dashed line). For Gray 16-QAM at 10 dB the curve is nearly flat near and . Drop SNR to 0 dB and the peak shifts slightly to — you can read off the GMI gain. The saddle-point structure is what makes the optimisation cheap in practice.
Parameters
GMI Saddle-Point: Finding
Example: 64-QAM at Low SNR: Departs from 1
For 64-QAM with Gray labelling on AWGN at dB, numerically locate the GMI saddle-point , compute and , and express the rate improvement in bits and in dB of SNR-equivalent.
Evaluate GMI$(s = 1)$ at 0 dB
At dB (linear 1), for 64-QAM with Gray labelling, direct numerical evaluation gives bits/symbol (a heavily power-limited regime — only a small fraction of the bit capacity is usable at this SNR).
Sweep $s$ and locate the peak
Numerical sweep of in a fine grid around (say ) shows a clear peak at with bits/symbol. The gain over is about bits/symbol — small but numerically real.
Convert to SNR-equivalent
At this SNR, the 64-QAM capacity slope is bits/dB. A rate gain of bits/symbol corresponds to dB of SNR. This is the decoder-scaling dividend: a quiet but measurable improvement at low SNR without changing the transmitter at all. In practice, 5G NR decoders do not implement this scaling — it is a research-grade optimisation.
Why $s^\star < 1$ here
At low SNR the LLRs produced by the max-log demapper are noisy and slightly overconfident. Choosing down-scales the LLRs towards uniformity, effectively applying a "temperature" correction to the decoder that better matches the true posterior at low SNR. At high SNR this effect vanishes and .
Theorem: BICM GMI at Equals the CTB Capacity
For any memoryless channel , any constellation of size , any labelling , and uniform input, The Caire-Taricco-Biglieri 1998 BICM capacity is the GMI of the product bit metric at scaling .
At the GMI's inner expectation collapses to — and for uniform inputs with the product bit metric this factors across bit positions into a product of per-position mixtures . The GMI then becomes a sum over bit positions of the mutual information of the -th bit channel — exactly the CTB formula.
Start from the GMI formula at .
Use uniform inputs and the product-metric factorisation to split both numerator and denominator into a product over .
The becomes a sum of log-ratios — each one a per-position mutual information.
Write out the GMI at $s=1$
$
Factorise numerator
by definition; hence .
Factorise denominator
With uniform input on , the marginal over the -th bit is , and the bits are marginally independent. Therefore
Combine
\ellI(Y; B_\ell)\ell\blacksquare$
At High SNR Under Gray,
For square -QAM with Gray labelling on AWGN, the per-position bit channels are, at high SNR, symmetric binary-input quasi-Gaussian channels whose decision regions are dominated by a single pair of nearest-neighbour constellation points. The product metric is asymptotically proportional to the squared Euclidean distance from to the constellation point — i.e., it coincides with the true symbol log-likelihood up to a constant. The mismatch vanishes, and the saddle-point of the GMI is at .
This is the theoretical underpinning of the empirical observation from Chapter 5 that Gray-BICM and CM capacity differ by only bits at moderate-to-high SNR on square QAM. The GMI framework makes this rigorous via the saddle-point ; the CTB formula is exact as an achievability result in this regime, not just approximate.
At low SNR, or with non-Gray labellings, the per-position channels are far from symmetric Gaussian and the mismatch is . The saddle point moves away from , and the achievable rate exceeds the naive CTB capacity by a small but measurable amount. This is the operational payoff of the GMI framework: a tighter achievability bound than CTB at low SNR, obtained for free by tuning .
Common Mistake: The Scaling Is Not the Same as Scaling the LLRs
Mistake:
Thinking that "GMI scaling " is the same as the familiar engineering practice of multiplying the LLRs by a constant before feeding them to the LDPC decoder.
Correction:
The GMI's is applied inside the exponent of the metric: In LLR language, this scales the log-metric, i.e., multiplies the per-bit LLR by . So for the BICM product metric specifically, GMI scaling is indeed equivalent to scaling the LLRs — but this equivalence only holds because the BICM metric is a product. For non-product metrics (e.g., probabilistic-shaping LLRs with non-uniform priors) the scaling affects the metric in a more subtle way.
Also note that LLR scaling in practical receivers is typically done for fixed-point arithmetic reasons (avoiding saturation) or for estimator mismatch correction (wrong noise variance). The GMI scaling is an information-theoretic optimum; it is usually not what a real decoder's LLR-scale tap is tuned to. The two concepts coincide only in the idealised product-metric setting.
LLR Scaling Taps in Practical Receivers
Commercial LDPC decoders in 5G NR, Wi-Fi, and DVB-S2/S2X implement an LLR-scaling tap that multiplies the per-bit LLR by a scalar gain before the min-sum iterations begin. The gain is typically set by one of:
- Saturation management — fixed-point LLRs must stay inside the decoder's word width (typically 5–6 bits). Scaling is chosen so that 99.9% of LLRs fit without clipping.
- Noise-variance correction — if the receiver's estimate of is off by a factor , the LLRs are scaled by to compensate.
- Damping / pre-conditioning — to improve LDPC convergence at low SNR, a sub-unity scale is sometimes used (damping).
None of these is the GMI-optimal of §3. Implementing would require estimating the bit-channel marginals in real time, computing the GMI curve, and locating its peak — a computation that no current silicon does. The rate gain from doing so is dB SNR-equivalent at moderate SNR, well below the margin of standard imperfections; so for now, this remains a research knob.
- •
LLR-scaling taps exist in 5G/WiFi/DVB decoders but are set for fixed-point arithmetic, not GMI optimality
- •
GMI-optimal requires runtime estimation of bit-channel marginals — not implemented
- •
GMI gain at moderate SNR is dB, below margin of other receiver imperfections
Quick Check
At , the BICM GMI equals
the Caire-Taricco-Biglieri BICM capacity
the CM capacity
the Shannon capacity
the cutoff rate
At , the product-metric GMI factorises over bit positions and its -th factor is exactly . This is the content of Theorem Equals the CTB Capacity" data-ref-type="theorem">TBICM GMI at Equals the CTB Capacity — the CTB formula is the GMI at .
Quick Check
For Gray-labelled -QAM on AWGN at high SNR, the GMI-optimal decoder scaling is
, and the GMI approaches the CTB capacity
, because the decoder should ignore the LLRs
, because the decoder should use only the most confident LLRs
independent of SNR
High-SNR Gray-QAM has symmetric, quasi-Gaussian per-position bit channels. The saddle-point equation is satisfied at , and the GMI equals exactly. This is the regime in which the CTB formula is tight.
Decoder Scaling Parameter
The exponent parameter in the mismatched-decoding metric that is optimised to maximise the generalised mutual information. For the BICM product metric, scaling is equivalent to multiplying the per-bit LLRs by . At the GMI reduces to the Caire-Taricco-Biglieri BICM capacity.
Related: Generalised Mutual Information, Mismatched Maximum-Metric Decoding, Product Bit Metric (BICM)
GMI Saddle-Point
The unique maximiser of in ; the decoder scaling that achieves the largest random-coding rate under the given mismatched metric. For Gray-QAM on AWGN at high SNR, ; at low SNR or with non-Gray labellings, and the GMI exceeds the CTB capacity strictly.
Related: Decoder Scaling Parameter , Generalised Mutual Information, BICM as Parallel Binary Channels