Ferkans — Interactive Telecom Tutor

The Rate That Mismatched BICM Actually Achieves

We now turn §2's mismatched-decoding framework into a rate that we can compute and compare with the CM capacity. The object we need is the generalised mutual information $I^{\mathrm{GMI}}(s)$ — a one-parameter family of lower bounds on the mismatched capacity, indexed by a decoder scaling parameter $s > 0$ .

The story goes like this. A random-coding argument over i.i.d. uniform codebooks, with a decoder that maximises $\prod_n q(y_n, x_n)^s$ in place of the true likelihood, produces a reliability condition of the form $R < I^{\mathrm{GMI}}(s)$ — a rate threshold that depends on $s$ through a Chernoff-style bound on the pairwise error probability. The largest such rate, optimised over $s$ , is the best achievable rate the random-coding argument can deliver: $R^\star \;=\; \sup_{s > 0} I^{\mathrm{GMI}}(s).$ This is Theorem 3.1 of the Guillén-Martínez-Caire monograph for BICM, and it is the central achievability result of this chapter.

Two operational facts stand out. First, at the "natural" choice $s = 1$ , the GMI of the BICM product metric reduces exactly to the Caire-Taricco-Biglieri capacity $\sum_\ell C_\ell$ of Chapter 5 — a beautiful conceptual closure: the heuristic BICM formula is a rigorous mismatched-decoding achievability result at $s = 1$ . Second, there is a saddle-point structure to the optimisation in $s$ : for Gray labelling on QAM at high SNR, $s^\star = 1$ ; at low SNR, $s^\star \ne 1$ and the sup is achieved strictly above $I^{\mathrm{GMI}}(1)$ . Operationally, the scaling $s$ acts as a "temperature" on the decoder — at low temperature ( $s$ large) the decoder is overconfident; at high temperature ( $s$ small) it averages away useful signal. The optimum balances these two effects.

This section develops the GMI formula, proves the achievability theorem, identifies the $s = 1$ specialisation with the CTB formula, and shows when and how the scaling $s^\star$ departs from 1.

,

Definition:
Generalised Mutual Information

Let $p(y\mid x)$ be a memoryless channel, $q(y, x)$ a decoding metric, and $P_X$ an input distribution on $\mathcal{X}$ . The generalised mutual information (GMI) at decoder scaling $s > 0$ is $I^{\mathrm{GMI}}(s; q, P_X) \;=\; \mathbb{E}_{P_{X,Y}}\!\left[ \log \frac{q(Y, X)^s}{\mathbb{E}_{P_X}[ q(Y, \bar X)^s ]} \right],$ where the inner expectation is over an independent copy $\bar X \sim P_X$ of the input. Equivalently, $I^{\mathrm{GMI}}(s) \;=\; H_P(Y) - H_{P,q,s}(Y \mid X), \quad H_{P,q,s}(Y\mid X) = -\mathbb{E}_P\!\left[ \log \frac{q(Y,X)^s}{\sum_{\bar x} P_X(\bar x) q(Y, \bar x)^s}\right].$ The "mutual information" nomenclature is historical: at $s = 1$ and $q = p$ , the GMI reduces to the classical mutual information $I(X; Y)$ .

For the BICM product metric $q_{\rm BICM}$ with uniform input over $\mathcal{X}$ , the GMI simplifies because $q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell)$ and the inner expectation factorises over bit positions (uniform inputs make the label bits marginally i.i.d.\ uniform binary): $I^{\mathrm{GMI}}_{\rm BICM}(s) \;=\; \sum_{\ell = 0}^{L-1} \mathbb{E}_Y\!\left[\log \frac{p_{W_\ell}(Y\mid B_\ell)^s} {\tfrac{1}{2}\sum_{b\in\{0,1\}} p_{W_\ell}(Y\mid b)^s}\right].$

Three points of intuition:

(i) The numerator is the metric evaluated on the transmitted bit $B_\ell$ ; the denominator averages over all possible inputs. The ratio is large when the true $x$ wins the metric competition by a wide margin — which is what enables reliable decoding.

(ii) At $s = 1$ , $q = p$ , uniform $P_X$ : the GMI is Shannon's mutual information $I(Y; X)$ . For mismatched $q$ , the GMI is strictly less in general, and can be improved by choosing $s \ne 1$ .

(iii) The GMI is concave in $s$ for the product metric (this is a lemma from convexity of $\log$ ), so $\sup_s I^{\mathrm{GMI}}(s)$ is attained at an interior point $s^\star$ satisfying the first-order condition $\partial I^{\mathrm{GMI}}/\partial s = 0$ . The saddle-point structure makes the optimisation numerically painless.

, ,

Theorem: BICM Achievability via GMI (Guillén-Martínez-Caire)

Consider a BICM system with labelling $\mu$ , memoryless channel $p(y\mid x)$ , uniform inputs $P_X$ , and the BICM product bit metric $q_{\rm BICM}(y, x) = \prod_\ell p_{W_\ell}(y\mid b_\ell)$ .

(Achievability.) For every rate $R \;<\; I^{\mathrm{GMI}}_{\rm BICM}(s) \text{ for some } s > 0,$ there exists a sequence of block codes of blocklength $N$ whose error probability under the mismatched decoder $\hat x = \arg\max \prod_n q_{\rm BICM}(y_n, x_n)^s$ tends to zero as $N \to \infty$ . The largest such rate is $R^\star_{\rm BICM} \;=\; \sup_{s > 0} I^{\mathrm{GMI}}_{\rm BICM}(s).$

(CTB specialisation.) At $s = 1$ , the GMI reduces to the Caire- Taricco-Biglieri BICM capacity: $I^{\mathrm{GMI}}_{\rm BICM}(1) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; C_{\rm BICM}(\mu).$

(Gray-optimality of $s = 1$ .) For Gray labelling on square $M$ -QAM over AWGN at high SNR, $s^\star \to 1$ , and therefore $R^\star_{\rm BICM} \to C_{\rm BICM}(\mu)$ . For non-Gray labellings and/or low SNR, $s^\star \ne 1$ and $R^\star_{\rm BICM}$ exceeds $C_{\rm BICM}(\mu)$ strictly.

The random-coding argument for mismatched decoding is almost word-for- word Shannon's 1948 random-coding argument, with two modifications. First, the maximum-likelihood decoder is replaced by the maximum-metric decoder; the metric is raised to the power $s$ before summing the log across symbols. Second, the Chernoff/tilting trick that bounds the pairwise error probability introduces $s$ as a free parameter — optimising over $s$ yields the best achievable rate from this argument.

The GMI reduces to Shannon's mutual information when the metric is the true likelihood and $s = 1$ ; it reduces to the CTB BICM capacity when the metric is the product bit metric and $s = 1$ . The GMI framework thus unifies matched and mismatched decoding into a single scalar achievability formula.

Show Hint

Use a random codebook of rate $R$ , letters i.i.d.\ uniform on $\mathcal{X}$ , blocklength $N$ .

Bound the pairwise error probability $\Pr\{\sum_n \log q(y_n, \bar x_n)^s \ge \sum_n \log q(y_n, x_n)^s\}$ by a Chernoff bound parameterised by $s$ .

Sum over the codewords via the union bound; the error probability tends to zero iff $R < I^{\mathrm{GMI}}(s)$ .

For the CTB specialisation, plug $q = q_{\rm BICM}$ and $s = 1$ into the GMI formula — the inner expectation becomes $\tfrac{1}{2}(p_{W_\ell}(y\mid 0) + p_{W_\ell}(y\mid 1))$ and the sum of log ratios equals $\sum_\ell I(Y; B_\ell)$ .

Proof

Random code + mismatched decoder

Fix $s > 0$ and $R$ . Draw $2^{NR}$ codewords $\{\mathbf{x}_m\}$ i.i.d.\ uniform on $\mathcal{X}^N$ . The decoder outputs $\hat m \;=\; \arg\max_m \sum_{n=1}^N \log q_{\rm BICM}(y_n, x_{m,n})^s.$ Conditional on transmitting $\mathbf{x}_1$ , the pairwise error probability of $\mathbf{x}_1 \to \mathbf{x}_{m}$ ( $m \ne 1$ ) is $P_{\rm pair} = \Pr\!\left[\sum_n \log q(Y_n, \bar X_n)^s \ge \sum_n \log q(Y_n, X_n)^s \,\middle|\, \mathbf{X}_1\right].$

Chernoff bound with the scaling $s$

For any $t > 0$ , by Markov: $P_{\rm pair} \le \mathbb{E}\!\left[ \exp\!\left(t \sum_n [\log q(Y_n, \bar X_n)^s - \log q(Y_n, X_n)^s]\right) \right].$ Choosing $t = 1$ and using the i.i.d.\ codeword letters plus memoryless channel gives $P_{\rm pair} \le \prod_n \mathbb{E}_{P_{X,Y}}\!\left[\frac{\mathbb{E}_{\bar X}[q(Y, \bar X)^s]}{q(Y, X)^s}\right]^{?},$ and after standard manipulation (see Ganti-Lapidoth-Telatar 1999 or Ch. 3 of the monograph) the per-letter factor is exactly $e^{- I^{\mathrm{GMI}}(s)}$ .

Union bound over codewords

Union-bounding over the $2^{NR} - 1$ incorrect codewords, the ensemble-average error probability is bounded by $\bar P_e(R) \le 2^{NR} \cdot e^{-N I^{\mathrm{GMI}}(s)} = 2^{N(R - I^{\mathrm{GMI}}(s) / \ln 2)}.$ This tends to zero iff $R < I^{\mathrm{GMI}}(s) / \ln 2 = I^{\mathrm{GMI}}(s)$ (with a $\log_2$ convention on the GMI). Optimising over $s > 0$ gives the largest achievable rate $R^\star_{\rm BICM} = \sup_s I^{\mathrm{GMI}}(s)$ .

CTB specialisation at $s = 1$

At $s = 1$ , the inner expectation $\mathbb{E}_{\bar X}[q(y, \bar X)]$ is, for the BICM product metric and uniform inputs, $\mathbb{E}_{\bar X}[q_{\rm BICM}(y, \bar X)] = \prod_\ell \mathbb{E}_{\bar B_\ell}[p_{W_\ell}(y\mid \bar B_\ell)] = \prod_\ell \tfrac{1}{2}(p_{W_\ell}(y\mid 0) + p_{W_\ell}(y\mid 1)).$ Therefore $I^{\mathrm{GMI}}_{\rm BICM}(1) = \mathbb{E}_{P_{X,Y}}\!\left[ \sum_\ell \log \frac{p_{W_\ell}(Y\mid B_\ell)} {\tfrac{1}{2}(p_{W_\ell}(Y\mid 0) + p_{W_\ell}(Y\mid 1))}\right] = \sum_\ell I(Y; B_\ell).$ This is the Caire-Taricco-Biglieri formula. The $s = 1$ GMI equals the BICM capacity of Ch. 5 exactly — proving that the CTB formula is a rigorous mismatched-decoding achievability result.

Gray-QAM asymptotics

For Gray labelling on square $M$ -QAM over AWGN at high SNR, the per-position bit channels $W_\ell$ become asymptotically symmetric and Gaussian, so the GMI of the product metric approaches the GMI of the genuine channel, and $\partial I^{\mathrm{GMI}}/\partial s \big|_{s = 1} = 0$ — i.e., $s^\star \to 1$ . For non-Gray labellings the asymmetry between the per-position channels forces $s^\star \ne 1$ . $\blacksquare$

, ,

GMI $(s)$ vs Decoder Scaling $s$ at Fixed SNR

For fixed $M$ -QAM and fixed SNR, the GMI of the BICM product metric is a concave function of the decoder scaling parameter $s > 0$ . This interactive plot sweeps $s \in (0, 3]$ and shows the GMI curve together with its maximiser $s^\star$ (marked by a red dot) and the value $I^{\mathrm{GMI}}(1)$ (the CTB BICM capacity, marked by a horizontal dashed line). For Gray 16-QAM at 10 dB the curve is nearly flat near $s = 1$ and $s^\star \approx 1$ . Drop SNR to 0 dB and the peak shifts slightly to $s^\star \ne 1$ — you can read off the GMI gain. The saddle-point structure is what makes the optimisation cheap in practice.

Parameters

QAM size

M

SNR [dB]10

GMI Saddle-Point: Finding $s^*$

Animated walk-through of the saddle-point in the GMI computation. The video sweeps the decoder scaling parameter

s

from

0

to

3

and shows the GMI function

I^{\mathrm{GMI}}(s)

tracing out a concave curve. As SNR is varied, the peak location

s^\star

shifts: at high SNR with Gray labelling,

s^\star \approx 1

and the curve is nearly flat at the peak; at low SNR,

s^\star

moves away from 1 and the curve becomes sharply peaked. The saddle-point equation

\partial I^{\mathrm{GMI}}/\partial s = 0

is illustrated geometrically — the optimal

s^\star

is where the tangent is horizontal. This is the operational knob that the BICM decoder could tune for a small rate boost at low SNR.

The GMI

I^{\mathrm{GMI}}(s)

is concave in

s

. Its maximum

\sup_s I^{\mathrm{GMI}}(s)

is the achievable rate of the BICM mismatched decoder. At

s = 1

, the GMI equals

\sum_\ell I(Y; B_\ell)

, the Caire-Taricco-Biglieri formula.

Example: 64-QAM at Low SNR: $s^\star$ Departs from 1

For 64-QAM with Gray labelling on AWGN at $\text{SNR} = 0$ dB, numerically locate the GMI saddle-point $s^\star$ , compute $I^{\mathrm{GMI}}(s^\star)$ and $I^{\mathrm{GMI}}(1)$ , and express the rate improvement in bits and in dB of SNR-equivalent.

Solution

Evaluate GMI$(s = 1)$ at 0 dB

At $\text{SNR} = 0$ dB (linear 1), for 64-QAM with Gray labelling, direct numerical evaluation gives $I^{\mathrm{GMI}}(1) = \sum_\ell I(Y; B_\ell) \approx 0.74$ bits/symbol (a heavily power-limited regime — only a small fraction of the $\log_2 64 = 6$ bit capacity is usable at this SNR).

Sweep $s$ and locate the peak

Numerical sweep of $s$ in a fine grid around $s = 1$ (say $s \in \{0.3, 0.4, \ldots, 1.7\}$ ) shows a clear peak at $s^\star \approx 0.83$ with $I^{\mathrm{GMI}}(s^\star) \approx 0.76$ bits/symbol. The gain over $s = 1$ is about $0.02$ bits/symbol — small but numerically real.

Convert to SNR-equivalent

At this SNR, the 64-QAM capacity slope is $\sim 0.18$ bits/dB. A rate gain of $0.02$ bits/symbol corresponds to $\sim 0.11$ dB of SNR. This is the decoder-scaling dividend: a quiet but measurable improvement at low SNR without changing the transmitter at all. In practice, 5G NR decoders do not implement this scaling — it is a research-grade optimisation.

Why $s^\star < 1$ here

At low SNR the LLRs produced by the max-log demapper are noisy and slightly overconfident. Choosing $s < 1$ down-scales the LLRs towards uniformity, effectively applying a "temperature" correction to the decoder that better matches the true posterior at low SNR. At high SNR this effect vanishes and $s^\star \to 1$ .

,

Theorem: BICM GMI at $s = 1$ Equals the CTB Capacity

For any memoryless channel $p(y\mid x)$ , any constellation $\mathcal{X}$ of size $M = 2^L$ , any labelling $\mu$ , and uniform input, $I^{\mathrm{GMI}}_{\rm BICM}(s = 1) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; C_{\rm BICM}(\mu).$ The Caire-Taricco-Biglieri 1998 BICM capacity is the GMI of the product bit metric at scaling $s = 1$ .

At $s = 1$ the GMI's inner expectation collapses to $\mathbb{E}_{\bar X}[ q(y, \bar X)]$ — and for uniform inputs with the product bit metric this factors across bit positions into a product of per-position mixtures $\tfrac{1}{2}(p_{W_\ell}(y\mid 0) + p_{W_\ell}(y\mid 1))$ . The GMI then becomes a sum over bit positions of the mutual information of the $\ell$ -th bit channel — exactly the CTB formula.

Show Hint

Start from the GMI formula at $s = 1$ .

Use uniform inputs and the product-metric factorisation to split both numerator and denominator into a product over $\ell$ .

The $\log \text{(ratio of products)}$ becomes a sum of log-ratios — each one a per-position mutual information.

Proof

Write out the GMI at $s=1$

$I^{\mathrm{GMI}}(1) = \mathbb{E}_{P_{X,Y}}\!\left[ \log \frac{q_{\rm BICM}(Y, X)}{\mathbb{E}_{\bar X}[q_{\rm BICM}(Y, \bar X)]}\right].$ $

Factorise numerator

$q_{\rm BICM}(y, x) = \prod_{\ell=0}^{L-1} p_{W_\ell}(y\mid b_\ell)$ by definition; hence $\log q_{\rm BICM}(y, x) = \sum_\ell \log p_{W_\ell}(y\mid b_\ell)$ .

Factorise denominator

With uniform input $P_X$ on $\mathcal{X}$ , the marginal over the $\ell$ -th bit is $\Pr(B_\ell = 0) = \Pr(B_\ell = 1) = \tfrac{1}{2}$ , and the bits are marginally independent. Therefore $\mathbb{E}_{\bar X}[q_{\rm BICM}(y, \bar X)] = \prod_\ell \mathbb{E}_{\bar B_\ell}[p_{W_\ell}(y\mid \bar B_\ell)] = \prod_\ell \tfrac{1}{2}\!\left[p_{W_\ell}(y\mid 0) + p_{W_\ell}(y\mid 1)\right].$

Combine

$I^{\mathrm{GMI}}(1) = \mathbb{E}_{P_{X,Y}}\!\left[ \sum_\ell \log \frac{p_{W_\ell}(Y\mid B_\ell)} {\tfrac{1}{2}(p_{W_\ell}(Y\mid 0) + p_{W_\ell}(Y\mid 1))}\right] = \sum_\ell I(Y; B_\ell).$ $The inner expectation per$ \ell $is precisely the mutual information$ I(Y; B_\ell) $of the$ \ell $-th BICM bit channel. Summing gives the CTB formula.$ \blacksquare$

,

At High SNR Under Gray, $s^\star \to 1$

For square $M$ -QAM with Gray labelling on AWGN, the per-position bit channels $W_\ell$ are, at high SNR, symmetric binary-input quasi-Gaussian channels whose decision regions are dominated by a single pair of nearest-neighbour constellation points. The product metric is asymptotically proportional to the squared Euclidean distance from $y$ to the constellation point — i.e., it coincides with the true symbol log-likelihood up to a constant. The mismatch vanishes, and the saddle-point of the GMI is at $s^\star = 1$ .

This is the theoretical underpinning of the empirical observation from Chapter 5 that Gray-BICM and CM capacity differ by only $\lesssim 0.05$ bits at moderate-to-high SNR on square QAM. The GMI framework makes this rigorous via the saddle-point $s^\star \to 1$ ; the CTB formula is exact as an achievability result in this regime, not just approximate.

At low SNR, or with non-Gray labellings, the per-position channels are far from symmetric Gaussian and the mismatch is $O(1)$ . The saddle point moves away from $s = 1$ , and the achievable rate $\sup_s I^{\mathrm{ GMI}}(s)$ exceeds the naive CTB capacity by a small but measurable amount. This is the operational payoff of the GMI framework: a tighter achievability bound than CTB at low SNR, obtained for free by tuning $s$ .

Common Mistake: The Scaling $s$ Is Not the Same as Scaling the LLRs

Mistake:

Thinking that "GMI scaling $s$ " is the same as the familiar engineering practice of multiplying the LLRs by a constant before feeding them to the LDPC decoder.

Correction:

The GMI's $s$ is applied inside the exponent of the metric: $q(y, x)^s \;=\; \prod_\ell p_{W_\ell}(y\mid b_\ell)^s.$ In LLR language, this scales the log-metric, i.e., multiplies the per-bit LLR by $s$ . So for the BICM product metric specifically, GMI scaling is indeed equivalent to scaling the LLRs — but this equivalence only holds because the BICM metric is a product. For non-product metrics (e.g., probabilistic-shaping LLRs with non-uniform priors) the scaling affects the metric in a more subtle way.

Also note that LLR scaling in practical receivers is typically done for fixed-point arithmetic reasons (avoiding saturation) or for estimator mismatch correction (wrong noise variance). The GMI scaling $s^\star$ is an information-theoretic optimum; it is usually not what a real decoder's LLR-scale tap is tuned to. The two concepts coincide only in the idealised product-metric setting.

🔧Engineering Note

LLR Scaling Taps in Practical Receivers

Commercial LDPC decoders in 5G NR, Wi-Fi, and DVB-S2/S2X implement an LLR-scaling tap that multiplies the per-bit LLR by a scalar gain before the min-sum iterations begin. The gain is typically set by one of:

Saturation management — fixed-point LLRs must stay inside the decoder's word width (typically 5–6 bits). Scaling is chosen so that 99.9% of LLRs fit without clipping.
Noise-variance correction — if the receiver's estimate of ${\sigma^2}^{2}$ is off by a factor $\alpha$ , the LLRs are scaled by $1/\alpha$ to compensate.
Damping / pre-conditioning — to improve LDPC convergence at low SNR, a sub-unity scale is sometimes used (damping).

None of these is the GMI-optimal $s^\star$ of §3. Implementing $s^\star$ would require estimating the bit-channel marginals $p_{W_\ell}(y\mid b)$ in real time, computing the GMI curve, and locating its peak — a computation that no current silicon does. The rate gain from doing so is $\lesssim 0.1$ dB SNR-equivalent at moderate SNR, well below the margin of standard imperfections; so for now, this remains a research knob.

Practical Constraints

•
LLR-scaling taps exist in 5G/WiFi/DVB decoders but are set for fixed-point arithmetic, not GMI optimality
•
GMI-optimal $s^\star$ requires runtime estimation of bit-channel marginals — not implemented
•
GMI gain at moderate SNR is $\lesssim 0.1$ dB, below margin of other receiver imperfections

📋 Ref: Implementation-specific; not standardised

Quick Check

At $s = 1$ , the BICM GMI $I^{\mathrm{GMI}}_{\rm BICM}(s)$ equals

the Caire-Taricco-Biglieri BICM capacity $\sum_\ell I(Y; B_\ell)$

the CM capacity $I(Y; X)$

the Shannon capacity $\log_2(1 + \text{SNR})$

the cutoff rate $R_0^{\rm BICM}$

Correction:

the Caire-Taricco-Biglieri BICM capacity

\sum_\ell I(Y; B_\ell)

At $s = 1$ , the product-metric GMI factorises over bit positions and its $\ell$ -th factor is exactly $I(Y; B_\ell)$ . This is the content of Theorem $s = 1$ $s = 1$ Equals the CTB Capacity" data-ref-type="theorem">TBICM GMI at $s = 1$ Equals the CTB Capacity — the CTB formula is the GMI at $s = 1$ .

Quick Check

For Gray-labelled $M$ -QAM on AWGN at high SNR, the GMI-optimal decoder scaling is

$s^\star \to 1$ , and the GMI approaches the CTB capacity

$s^\star \to 0$ , because the decoder should ignore the LLRs

$s^\star \to \infty$ , because the decoder should use only the most confident LLRs

$s^\star = 0.5$ independent of SNR

Correction:

s^\star \to 1

, and the GMI approaches the CTB capacity

High-SNR Gray-QAM has symmetric, quasi-Gaussian per-position bit channels. The saddle-point equation $\partial I^{\mathrm{GMI}}/\partial s = 0$ is satisfied at $s = 1$ , and the GMI equals $\sum_\ell I(Y; B_\ell)$ exactly. This is the regime in which the CTB formula is tight.

Decoder Scaling Parameter $s$

The exponent parameter in the mismatched-decoding metric $q(y, x)^s$ that is optimised to maximise the generalised mutual information. For the BICM product metric, scaling $s$ is equivalent to multiplying the per-bit LLRs by $s$ . At $s = 1$ the GMI reduces to the Caire-Taricco-Biglieri BICM capacity.

GMI Saddle-Point $s^\star$

The unique maximiser of $I^{\mathrm{GMI}}(s)$ in $s > 0$ ; the decoder scaling that achieves the largest random-coding rate under the given mismatched metric. For Gray-QAM on AWGN at high SNR, $s^\star \to 1$ ; at low SNR or with non-Gray labellings, $s^\star \ne 1$ and the GMI exceeds the CTB capacity strictly.

Generalised Mutual Information

The Rate That Mismatched BICM Actually Achieves

Definition: Generalised Mutual Information

Theorem: BICM Achievability via GMI (Guillén-Martínez-Caire)

Random code + mismatched decoder

Chernoff bound with the scaling $s$

Union bound over codewords

CTB specialisation at $s = 1$

Gray-QAM asymptotics

GMI(s)(s)(s) vs Decoder Scaling sss at Fixed SNR

Parameters

GMI Saddle-Point: Finding s∗s^*s∗

Example: 64-QAM at Low SNR: s⋆s^\stars⋆ Departs from 1

Evaluate GMI$(s = 1)$ at 0 dB

Sweep $s$ and locate the peak

Convert to SNR-equivalent

Why $s^\star < 1$ here

Theorem: BICM GMI at s=1s = 1s=1 Equals the CTB Capacity

Write out the GMI at $s=1$

Factorise numerator

Factorise denominator

Combine

At High SNR Under Gray, s⋆→1s^\star \to 1s⋆→1

Common Mistake: The Scaling sss Is Not the Same as Scaling the LLRs

LLR Scaling Taps in Practical Receivers

Quick Check

Quick Check

Decoder Scaling Parameter sss

GMI Saddle-Point s⋆s^\stars⋆

Definition:
Generalised Mutual Information

GMI $(s)$ vs Decoder Scaling $s$ at Fixed SNR

GMI Saddle-Point: Finding $s^*$

Example: 64-QAM at Low SNR: $s^\star$ Departs from 1

Theorem: BICM GMI at $s = 1$ Equals the CTB Capacity

At High SNR Under Gray, $s^\star \to 1$

Common Mistake: The Scaling $s$ Is Not the Same as Scaling the LLRs

Decoder Scaling Parameter $s$

GMI Saddle-Point $s^\star$