Generalised Mutual Information

The Rate That Mismatched BICM Actually Achieves

We now turn §2's mismatched-decoding framework into a rate that we can compute and compare with the CM capacity. The object we need is the generalised mutual information IGMI(s)I^{\mathrm{GMI}}(s) — a one-parameter family of lower bounds on the mismatched capacity, indexed by a decoder scaling parameter s>0s > 0.

The story goes like this. A random-coding argument over i.i.d. uniform codebooks, with a decoder that maximises nq(yn,xn)s\prod_n q(y_n, x_n)^s in place of the true likelihood, produces a reliability condition of the form R<IGMI(s)R < I^{\mathrm{GMI}}(s) — a rate threshold that depends on ss through a Chernoff-style bound on the pairwise error probability. The largest such rate, optimised over ss, is the best achievable rate the random-coding argument can deliver: R  =  sups>0IGMI(s).R^\star \;=\; \sup_{s > 0} I^{\mathrm{GMI}}(s). This is Theorem 3.1 of the Guillén-Martínez-Caire monograph for BICM, and it is the central achievability result of this chapter.

Two operational facts stand out. First, at the "natural" choice s=1s = 1, the GMI of the BICM product metric reduces exactly to the Caire-Taricco-Biglieri capacity C\sum_\ell C_\ell of Chapter 5 — a beautiful conceptual closure: the heuristic BICM formula is a rigorous mismatched-decoding achievability result at s=1s = 1. Second, there is a saddle-point structure to the optimisation in ss: for Gray labelling on QAM at high SNR, s=1s^\star = 1; at low SNR, s1s^\star \ne 1 and the sup is achieved strictly above IGMI(1)I^{\mathrm{GMI}}(1). Operationally, the scaling ss acts as a "temperature" on the decoder — at low temperature (ss large) the decoder is overconfident; at high temperature (ss small) it averages away useful signal. The optimum balances these two effects.

This section develops the GMI formula, proves the achievability theorem, identifies the s=1s = 1 specialisation with the CTB formula, and shows when and how the scaling ss^\star departs from 1.

,

Definition:

Generalised Mutual Information

Let p(yx)p(y\mid x) be a memoryless channel, q(y,x)q(y, x) a decoding metric, and PXP_X an input distribution on X\mathcal{X}. The generalised mutual information (GMI) at decoder scaling s>0s > 0 is IGMI(s;q,PX)  =  EPX,Y ⁣[logq(Y,X)sEPX[q(Y,Xˉ)s]],I^{\mathrm{GMI}}(s; q, P_X) \;=\; \mathbb{E}_{P_{X,Y}}\!\left[ \log \frac{q(Y, X)^s}{\mathbb{E}_{P_X}[ q(Y, \bar X)^s ]} \right], where the inner expectation is over an independent copy XˉPX\bar X \sim P_X of the input. Equivalently, IGMI(s)  =  HP(Y)HP,q,s(YX),HP,q,s(YX)=EP ⁣[logq(Y,X)sxˉPX(xˉ)q(Y,xˉ)s].I^{\mathrm{GMI}}(s) \;=\; H_P(Y) - H_{P,q,s}(Y \mid X), \quad H_{P,q,s}(Y\mid X) = -\mathbb{E}_P\!\left[ \log \frac{q(Y,X)^s}{\sum_{\bar x} P_X(\bar x) q(Y, \bar x)^s}\right]. The "mutual information" nomenclature is historical: at s=1s = 1 and q=pq = p, the GMI reduces to the classical mutual information I(X;Y)I(X; Y).

For the BICM product metric qBICMq_{\rm BICM} with uniform input over X\mathcal{X}, the GMI simplifies because qBICM(y,x)=pW(yb)q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell) and the inner expectation factorises over bit positions (uniform inputs make the label bits marginally i.i.d.\ uniform binary): IBICMGMI(s)  =  =0L1EY ⁣[logpW(YB)s12b{0,1}pW(Yb)s].I^{\mathrm{GMI}}_{\rm BICM}(s) \;=\; \sum_{\ell = 0}^{L-1} \mathbb{E}_Y\!\left[\log \frac{p_{W_\ell}(Y\mid B_\ell)^s} {\tfrac{1}{2}\sum_{b\in\{0,1\}} p_{W_\ell}(Y\mid b)^s}\right].

Three points of intuition:

(i) The numerator is the metric evaluated on the transmitted bit BB_\ell; the denominator averages over all possible inputs. The ratio is large when the true xx wins the metric competition by a wide margin — which is what enables reliable decoding.

(ii) At s=1s = 1, q=pq = p, uniform PXP_X: the GMI is Shannon's mutual information I(Y;X)I(Y; X). For mismatched qq, the GMI is strictly less in general, and can be improved by choosing s1s \ne 1.

(iii) The GMI is concave in ss for the product metric (this is a lemma from convexity of log\log), so supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s) is attained at an interior point ss^\star satisfying the first-order condition IGMI/s=0\partial I^{\mathrm{GMI}}/\partial s = 0. The saddle-point structure makes the optimisation numerically painless.

, ,

Theorem: BICM Achievability via GMI (Guillén-Martínez-Caire)

Consider a BICM system with labelling μ\mu, memoryless channel p(yx)p(y\mid x), uniform inputs PXP_X, and the BICM product bit metric qBICM(y,x)=pW(yb)q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell).

(Achievability.) For every rate R  <  IBICMGMI(s) for some s>0,R \;<\; I^{\mathrm{GMI}}_{\rm BICM}(s) \text{ for some } s > 0, there exists a sequence of block codes of blocklength NN whose error probability under the mismatched decoder x^=argmaxnqBICM(yn,xn)s\hat x = \arg\max \prod_n q_{\rm BICM}(y_n, x_n)^s tends to zero as NN \to \infty. The largest such rate is RBICM  =  sups>0IBICMGMI(s).R^\star_{\rm BICM} \;=\; \sup_{s > 0} I^{\mathrm{GMI}}_{\rm BICM}(s).

(CTB specialisation.) At s=1s = 1, the GMI reduces to the Caire- Taricco-Biglieri BICM capacity: IBICMGMI(1)  =  =0L1I(Y;B)  =  CBICM(μ).I^{\mathrm{GMI}}_{\rm BICM}(1) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; C_{\rm BICM}(\mu).

(Gray-optimality of s=1s = 1.) For Gray labelling on square MM-QAM over AWGN at high SNR, s1s^\star \to 1, and therefore RBICMCBICM(μ)R^\star_{\rm BICM} \to C_{\rm BICM}(\mu). For non-Gray labellings and/or low SNR, s1s^\star \ne 1 and RBICMR^\star_{\rm BICM} exceeds CBICM(μ)C_{\rm BICM}(\mu) strictly.

The random-coding argument for mismatched decoding is almost word-for- word Shannon's 1948 random-coding argument, with two modifications. First, the maximum-likelihood decoder is replaced by the maximum-metric decoder; the metric is raised to the power ss before summing the log across symbols. Second, the Chernoff/tilting trick that bounds the pairwise error probability introduces ss as a free parameter — optimising over ss yields the best achievable rate from this argument.

The GMI reduces to Shannon's mutual information when the metric is the true likelihood and s=1s = 1; it reduces to the CTB BICM capacity when the metric is the product bit metric and s=1s = 1. The GMI framework thus unifies matched and mismatched decoding into a single scalar achievability formula.

, ,

GMI(s)(s) vs Decoder Scaling ss at Fixed SNR

For fixed MM-QAM and fixed SNR, the GMI of the BICM product metric is a concave function of the decoder scaling parameter s>0s > 0. This interactive plot sweeps s(0,3]s \in (0, 3] and shows the GMI curve together with its maximiser ss^\star (marked by a red dot) and the value IGMI(1)I^{\mathrm{GMI}}(1) (the CTB BICM capacity, marked by a horizontal dashed line). For Gray 16-QAM at 10 dB the curve is nearly flat near s=1s = 1 and s1s^\star \approx 1. Drop SNR to 0 dB and the peak shifts slightly to s1s^\star \ne 1 — you can read off the GMI gain. The saddle-point structure is what makes the optimisation cheap in practice.

Parameters
10

GMI Saddle-Point: Finding ss^*

Animated walk-through of the saddle-point in the GMI computation. The video sweeps the decoder scaling parameter ss from 00 to 33 and shows the GMI function IGMI(s)I^{\mathrm{GMI}}(s) tracing out a concave curve. As SNR is varied, the peak location ss^\star shifts: at high SNR with Gray labelling, s1s^\star \approx 1 and the curve is nearly flat at the peak; at low SNR, ss^\star moves away from 1 and the curve becomes sharply peaked. The saddle-point equation IGMI/s=0\partial I^{\mathrm{GMI}}/\partial s = 0 is illustrated geometrically — the optimal ss^\star is where the tangent is horizontal. This is the operational knob that the BICM decoder could tune for a small rate boost at low SNR.
The GMI IGMI(s)I^{\mathrm{GMI}}(s) is concave in ss. Its maximum supsIGMI(s)\sup_s I^{\mathrm{GMI}}(s) is the achievable rate of the BICM mismatched decoder. At s=1s = 1, the GMI equals I(Y;B)\sum_\ell I(Y; B_\ell), the Caire-Taricco-Biglieri formula.

Example: 64-QAM at Low SNR: ss^\star Departs from 1

For 64-QAM with Gray labelling on AWGN at SNR=0\text{SNR} = 0 dB, numerically locate the GMI saddle-point ss^\star, compute IGMI(s)I^{\mathrm{GMI}}(s^\star) and IGMI(1)I^{\mathrm{GMI}}(1), and express the rate improvement in bits and in dB of SNR-equivalent.

,

Theorem: BICM GMI at s=1s = 1 Equals the CTB Capacity

For any memoryless channel p(yx)p(y\mid x), any constellation X\mathcal{X} of size M=2LM = 2^L, any labelling μ\mu, and uniform input, IBICMGMI(s=1)  =  =0L1I(Y;B)  =  CBICM(μ).I^{\mathrm{GMI}}_{\rm BICM}(s = 1) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; C_{\rm BICM}(\mu). The Caire-Taricco-Biglieri 1998 BICM capacity is the GMI of the product bit metric at scaling s=1s = 1.

At s=1s = 1 the GMI's inner expectation collapses to EXˉ[q(y,Xˉ)]\mathbb{E}_{\bar X}[ q(y, \bar X)] — and for uniform inputs with the product bit metric this factors across bit positions into a product of per-position mixtures 12(pW(y0)+pW(y1))\tfrac{1}{2}(p_{W_\ell}(y\mid 0) + p_{W_\ell}(y\mid 1)). The GMI then becomes a sum over bit positions of the mutual information of the \ell-th bit channel — exactly the CTB formula.

,

At High SNR Under Gray, s1s^\star \to 1

For square MM-QAM with Gray labelling on AWGN, the per-position bit channels WW_\ell are, at high SNR, symmetric binary-input quasi-Gaussian channels whose decision regions are dominated by a single pair of nearest-neighbour constellation points. The product metric is asymptotically proportional to the squared Euclidean distance from yy to the constellation point — i.e., it coincides with the true symbol log-likelihood up to a constant. The mismatch vanishes, and the saddle-point of the GMI is at s=1s^\star = 1.

This is the theoretical underpinning of the empirical observation from Chapter 5 that Gray-BICM and CM capacity differ by only 0.05\lesssim 0.05 bits at moderate-to-high SNR on square QAM. The GMI framework makes this rigorous via the saddle-point s1s^\star \to 1; the CTB formula is exact as an achievability result in this regime, not just approximate.

At low SNR, or with non-Gray labellings, the per-position channels are far from symmetric Gaussian and the mismatch is O(1)O(1). The saddle point moves away from s=1s = 1, and the achievable rate supsIGMI(s)\sup_s I^{\mathrm{ GMI}}(s) exceeds the naive CTB capacity by a small but measurable amount. This is the operational payoff of the GMI framework: a tighter achievability bound than CTB at low SNR, obtained for free by tuning ss.

Common Mistake: The Scaling ss Is Not the Same as Scaling the LLRs

Mistake:

Thinking that "GMI scaling ss" is the same as the familiar engineering practice of multiplying the LLRs by a constant before feeding them to the LDPC decoder.

Correction:

The GMI's ss is applied inside the exponent of the metric: q(y,x)s  =  pW(yb)s.q(y, x)^s \;=\; \prod_\ell p_{W_\ell}(y\mid b_\ell)^s. In LLR language, this scales the log-metric, i.e., multiplies the per-bit LLR by ss. So for the BICM product metric specifically, GMI scaling is indeed equivalent to scaling the LLRs — but this equivalence only holds because the BICM metric is a product. For non-product metrics (e.g., probabilistic-shaping LLRs with non-uniform priors) the scaling affects the metric in a more subtle way.

Also note that LLR scaling in practical receivers is typically done for fixed-point arithmetic reasons (avoiding saturation) or for estimator mismatch correction (wrong noise variance). The GMI scaling ss^\star is an information-theoretic optimum; it is usually not what a real decoder's LLR-scale tap is tuned to. The two concepts coincide only in the idealised product-metric setting.

🔧Engineering Note

LLR Scaling Taps in Practical Receivers

Commercial LDPC decoders in 5G NR, Wi-Fi, and DVB-S2/S2X implement an LLR-scaling tap that multiplies the per-bit LLR by a scalar gain before the min-sum iterations begin. The gain is typically set by one of:

  1. Saturation management — fixed-point LLRs must stay inside the decoder's word width (typically 5–6 bits). Scaling is chosen so that 99.9% of LLRs fit without clipping.
  2. Noise-variance correction — if the receiver's estimate of σ22{\sigma^2}^{2} is off by a factor α\alpha, the LLRs are scaled by 1/α1/\alpha to compensate.
  3. Damping / pre-conditioning — to improve LDPC convergence at low SNR, a sub-unity scale is sometimes used (damping).

None of these is the GMI-optimal ss^\star of §3. Implementing ss^\star would require estimating the bit-channel marginals pW(yb)p_{W_\ell}(y\mid b) in real time, computing the GMI curve, and locating its peak — a computation that no current silicon does. The rate gain from doing so is 0.1\lesssim 0.1 dB SNR-equivalent at moderate SNR, well below the margin of standard imperfections; so for now, this remains a research knob.

Practical Constraints
  • LLR-scaling taps exist in 5G/WiFi/DVB decoders but are set for fixed-point arithmetic, not GMI optimality

  • GMI-optimal ss^\star requires runtime estimation of bit-channel marginals — not implemented

  • GMI gain at moderate SNR is 0.1\lesssim 0.1 dB, below margin of other receiver imperfections

📋 Ref: Implementation-specific; not standardised

Quick Check

At s=1s = 1, the BICM GMI IBICMGMI(s)I^{\mathrm{GMI}}_{\rm BICM}(s) equals

the Caire-Taricco-Biglieri BICM capacity I(Y;B)\sum_\ell I(Y; B_\ell)

the CM capacity I(Y;X)I(Y; X)

the Shannon capacity log2(1+SNR)\log_2(1 + \text{SNR})

the cutoff rate R0BICMR_0^{\rm BICM}

Quick Check

For Gray-labelled MM-QAM on AWGN at high SNR, the GMI-optimal decoder scaling is

s1s^\star \to 1, and the GMI approaches the CTB capacity

s0s^\star \to 0, because the decoder should ignore the LLRs

ss^\star \to \infty, because the decoder should use only the most confident LLRs

s=0.5s^\star = 0.5 independent of SNR

Decoder Scaling Parameter ss

The exponent parameter in the mismatched-decoding metric q(y,x)sq(y, x)^s that is optimised to maximise the generalised mutual information. For the BICM product metric, scaling ss is equivalent to multiplying the per-bit LLRs by ss. At s=1s = 1 the GMI reduces to the Caire-Taricco-Biglieri BICM capacity.

Related: Generalised Mutual Information, Mismatched Maximum-Metric Decoding, Product Bit Metric (BICM)

GMI Saddle-Point ss^\star

The unique maximiser of IGMI(s)I^{\mathrm{GMI}}(s) in s>0s > 0; the decoder scaling that achieves the largest random-coding rate under the given mismatched metric. For Gray-QAM on AWGN at high SNR, s1s^\star \to 1; at low SNR or with non-Gray labellings, s1s^\star \ne 1 and the GMI exceeds the CTB capacity strictly.

Related: Decoder Scaling Parameter ss, Generalised Mutual Information, BICM as LL Parallel Binary Channels