Ferkans — Interactive Telecom Tutor

Why Gray — and Why Not Always

If BICM capacity depends on the labelling only through the per-bit marginals, which labelling maximises the sum $\sum_\ell I(Y; B_\ell)$ ? On the AWGN channel with QAM, the empirical answer — first quantified by Caire-Taricco-Biglieri — is Gray labelling. On fading channels (Ch. 6) and in iterative-decoding setups (Ch. 8), the answer can flip. This section proves the Gray near-optimality theorem and carefully states what it does not claim.

The intuition for Gray's success is geometric: under Gray labelling, nearest-neighbour constellation points differ in exactly one label bit, so each bit has an approximately independent and equally useful view of the channel. Under SP labelling, the bits are hierarchical, with the top bit effectively noise-free and the bottom bit nearly useless in isolation — a structure that is powerful when conditioning is available (MLC) but punitive when it is not (BICM).

,

Gray vs SP Labelling on 16-QAM: The Capacity Consequence

This animation toggles the 16-QAM labelling between Gray and SP and shows the resulting per-bit capacities

C_{0}, \ldots, C_{3}

as horizontal bars that grow and shrink in real time. Under Gray, the four bars are roughly equal; under SP, the top bars (

C_{0}, C_{1}

) are tall and the bottom bars (

C_{2}, C_{3}

) are short. The total sum of bar heights is the BICM capacity — and it is visibly larger under Gray.

Visualising the Gray-vs-SP BICM capacity gap on 16-QAM at

\text{SNR} = 10

dB. Gray labelling spreads useful information across the four bits; SP concentrates it in two.

16-QAM Constellation with Gray and SP Labels

The same sixteen constellation points with two different labellings. Gray labelling assigns labels so that nearest neighbours differ in exactly one bit — trace any horizontal or vertical step and verify. SP labelling organises labels hierarchically down the Ungerboeck partition tree — the top-level bit $b_0$ selects between two maximally-separated subsets of eight points. Hover over a point to see its 4-bit label and the bit-wise difference to each of its nearest neighbours.

Parameters

Labelling scheme

Theorem: Gray Labelling Near-Optimality on AWGN

Let $\mathcal{X}$ be square $M$ -QAM ( $M = 4^k$ for integer $k$ ) with unit-energy constraint and let $\mu_G$ be its standard Gray labelling (two independent PAM Gray codes). Then, on the AWGN channel,

(a) Uniform high-SNR convergence: for all $\text{SNR}$ above any fixed threshold, $0 \;\le\; C_{\rm CM} - C_{\rm BICM}(\mu_G) \;\le\; c_M \cdot Q\bigl(d_{\min} \sqrt{\text{SNR}/2}\bigr)$ for a constant $c_M$ that depends on $M$ but not on SNR. In particular, the gap vanishes exponentially as $\text{SNR} \to \infty$ .

(b) Uniform bound over all SNR: over the entire SNR range, $C_{\rm CM} - C_{\rm BICM}(\mu_G) \;\le\; 0.05 \text{ bits} \quad \text{for } M = 4, 16, 64, 256.$ This is not an SNR-dependent bound; it is a uniform numerical bound that holds across all SNRs, derived from numerical integration in Caire-Taricco-Biglieri §V.B.

(c) No Gray ordering is universally optimal at low SNR. For $M \ge 16$ there exist non-Gray labellings (e.g., "anti-Gray" constructions specifically designed for low SNR) that slightly exceed standard Gray's BICM capacity at very low SNR. The advantage is at most $\approx 0.02$ bits and occurs only in a regime where BICM is itself operationally irrelevant (capacities well below $1$ bit/symbol). For practical SNRs Gray wins uniformly.

Part (a) follows from the high-SNR asymptotic of Thm. TBICM Capacity — High-SNR Asymptotics: under Gray, nearest-neighbour flips correspond to one-bit errors, so the marginal per-bit entropies capture essentially all the joint entropy. Part (b) is the numerical content of the 1998 paper — it is this small number (which is well below a tenth of a bit) that turned BICM into a design-cookbook item. Part (c) is the caveat that keeps researchers honest: "Gray is optimal" is an overstatement; "Gray is near-optimal on AWGN in the practical SNR range" is the careful claim.

Show Hint

For (a), use the high-SNR asymptotic from Thm. TBICM Capacity — High-SNR Asymptotics.

For (b), integrate the per-bit and symbol mutual informations numerically on a dense SNR grid and take the supremum.

For (c), construct the anti-Gray labelling by pairing each point with its Hamming-farthest partner, then numerically show the $< 0.02$ -bit advantage at low SNR.

Proof

Part (a): high-SNR bound

Fix Gray labelling $\mu_G$ . By Thm. TBICM Capacity — High-SNR Asymptotics, the gap term $I(B_\ell; B_{<\ell} \mid Y)$ decays as $O(Q(\sqrt{\text{SNR}/2} d_{\min}))$ because conditioning on $B_{<\ell}$ changes the posterior of $B_\ell$ only when a nearest-neighbour error occurs (Gray implies the "first" error doesn't touch $B_\ell$ except with vanishing probability). Summing the $L-1$ such terms gives the stated upper bound with $c_M$ equal to a combinatorial factor depending on the number of dominant nearest neighbours of $\mathcal{X}$ .

Part (b): numerical bound over all SNR

For each $M \in \{4, 16, 64, 256\}$ , the authors of Caire-Taricco-Biglieri (1998) compute $C_{\rm CM}(\text{SNR})$ and $C_{\rm BICM, Gray}(\text{SNR})$ on a dense SNR grid spanning $[-10, +40]$ dB by numerical integration of the AWGN likelihood over the $M$ constellation points. The supremum of the difference is tabulated (Table II of §V.B) and is bounded above by $0.05$ bits for all $M$ up to $256$ -QAM. For QAM beyond $256$ the argument extends by the scaling behaviour of the Gray-decomposable rectangular constellations — each additional PAM level contributes at most $\approx 0.01$ bits to the gap.

Part (c): low-SNR anti-Gray construction

At very low SNR the mutual information is dominated by the second-order Taylor expansion of the log-likelihood, and the BICM capacity is approximately $C_{\rm BICM}(\mu) \approx \frac{\text{SNR}}{2 \ln 2} \cdot L - \frac{\text{SNR}^{2}}{16 \ln 2} \cdot f(\mu)$ , where $f(\mu)$ is a fourth-order moment of the labelling. Minimising $f(\mu)$ under the combinatorial constraint that $\mu$ be a valid binary labelling is a small integer program, and for $M \ge 16$ the optimum is not Gray but an anti-Gray labelling that maximises the average Hamming distance between nearest neighbours. The advantage is $O(\text{SNR}^{2})$ — about $0.02$ bits at $\text{SNR} = 0$ dB — vanishing as $\text{SNR} \to \infty$ and negligible at any operating rate above $1$ bit/symbol. $\blacksquare$

,

Example: 64-QAM Gray vs SP: The Numerical Showdown

For 64-QAM at $\text{SNR} = 15$ dB, tabulate the six per-bit capacities under Gray and SP labellings and compute the two BICM capacities and the CM capacity. Quantify the dB-of-SNR cost of each labelling.

Solution

Per-bit capacities at 15 dB

By numerical integration (see simulation $C_\ell$ $C_{ℓ}$ vs. SNR" data-ref-type="interactive_plot">📊Per-Level Bit-Channel Capacities $C_\ell$ vs. SNR),

Gray: $(C_{0}, \ldots, C_{5}) \approx (0.98, 0.87, 0.54, 0.98, 0.87, 0.54)$ , with the two triples corresponding to the $I$ and $Q$ 8-PAM components. Sum $C_{\rm BICM, Gray} \approx 4.78$ bits.
SP: $(C_{0}, \ldots, C_{5}) \approx (1.00, 0.99, 0.93, 0.73, 0.34, 0.08)$ . Sum $C_{\rm BICM, SP} \approx 4.07$ bits.
CM: $C_{\rm CM} \approx 4.83$ bits (by direct numerical integration over the 64-point symbol constellation).

SNR-equivalent cost

The operating rate $\eta = 4$ bits/symbol sits in the waterfall region. The rate $4.78$ bits is reached by CM at about $14.7$ dB, by Gray-BICM at about $15.0$ dB, and by SP-BICM at about $17.0$ dB. So:

Gray-BICM cost: $\approx 0.3$ dB of SNR — negligible.
SP-BICM cost: $\approx 2.3$ dB of SNR — substantial.

The design implication

For 64-QAM on AWGN, switching from Gray to SP costs about $2$ dB of SNR in a BICM system — a large penalty that immediately explains why no modern standard uses SP labelling with non-iterative BICM decoding. However, when the receiver iterates between decoder and demapper (BICM-ID, Ch. 8), SP becomes competitive again because the feedback provides conditioning that closes the MLC-style gap. We return to this in Ch. 8.

When SP Beats Gray: Iterative Decoding (Forward Ref)

The strict dominance of Gray over SP for non-iterative BICM reverses in BICM with iterative decoding (BICM-ID). The reason is a beautiful piece of symmetry: BICM-ID feeds soft a-priori information from the decoder back to the demapper, which effectively gives the demapper conditional rather than unconditional knowledge of the other label bits. Under Gray labelling this extra conditioning yields little benefit (the bits were already nearly independent). Under SP labelling, conditioning on the high-SNR bits dramatically improves the posteriors of the low-SNR bits — exactly the MLC conditioning benefit, now recovered by iteration.

The net effect: on some constellations, BICM-ID with SP labelling closes the CM-capacity gap entirely, matching the performance of MLC/MSD at comparable decoding complexity. This is one of the reasons why BICM- ID remains an active research area. We treat it in Chapter 8.

On Fading, the Labelling Question Reopens

On fading channels, the capacity argument has to be re-examined because the channel gain changes symbol by symbol. Caire-Taricco-Biglieri §IV shows that Gray labelling is still near-optimal in capacity, but the diversity order of a BICM system on a block-fading channel is driven by a completely different quantity: the minimum number of distinct bit positions involved in the code's free-distance events. This quantity depends on the labelling in a non-trivial way and is studied in detail in Chapter 6. The operational conclusion is that on AWGN the choice of labelling is a capacity question (Gray wins); on fading channels it is both a capacity question (Gray still wins) and a diversity-order question (labelling matters, code design matters more).

,

Common Mistake: "Gray Labelling Is Always Optimal"

Mistake:

Claiming that Gray labelling is optimal for every BICM configuration.

Correction:

Gray is near-optimal for non-iterative BICM on AWGN across the practical SNR range, and is the right default. But at very low SNR on a perfectly symmetric channel, anti-Gray constructions can slightly beat Gray (Thm. TGray Labelling Near-Optimality on AWGN(c)); with iterative decoding (BICM-ID, Ch. 8), SP can beat Gray by several dB on some constellations; on fading channels the diversity-optimal labelling may not coincide with the capacity-optimal one (Ch. 6). The universally correct statement is: Gray is the right default for non-iterative BICM on AWGN; revisit the labelling when decoding is iterative or the channel is fading.

Quick Check

Which of the following is the defining property of Gray labelling?

Every pair of constellation points differs in at most one label bit

Every pair of nearest-neighbour constellation points differs in exactly one label bit

Bits are assigned so that the top bit partitions the constellation into coarsest cosets

The sum of Hamming distances between all pairs of labels equals the sum of Euclidean distances

Correction:

Every pair of nearest-neighbour constellation points differs in exactly one label bit

The nearest-neighbour property is the defining feature: Gray labelling ensures that the Euclidean-distance-minimising error flips one bit, not several. This is what makes each bit see an approximately independent binary channel under BICM and what gives Gray its near-optimality on AWGN.

Quick Check

Which of the following scenarios can make SP labelling outperform Gray?

BICM with iterative decoder-demapper feedback (BICM-ID)

Non-iterative BICM on AWGN at moderate SNR

MIMO with spatial multiplexing

Whenever the constellation is non-square

Correction:

BICM with iterative decoder-demapper feedback (BICM-ID)

In BICM-ID, a-priori information from the decoder conditions the demapper's LLR computation, recovering the MLC-style conditioning benefit. Under SP the conditioning is very valuable (the level-wise capacities are highly asymmetric); under Gray the extra conditioning buys less (the levels are already roughly equal). Hence SP can exceed Gray under iterative BICM-ID.

Hamming Distance

The number of positions at which two binary strings of equal length differ. Central to Gray-labelling theory: Gray ensures that Euclidean nearest neighbours are at Hamming distance exactly one.

Near-Optimality (BICM)

The property that $C_{\rm BICM}(\mu_G) / C_{\rm CM} \to 1$ for Gray labelling $\mu_G$ on square QAM as $\text{SNR} \to \infty$ , and more strongly that the absolute gap is bounded by $\lesssim 0.05$ bits uniformly across all SNRs (Thm. TGray Labelling Near-Optimality on AWGN).

Related: Gray Labelling, BICM Capacity

The Gray Labelling Near-Optimality Theorem

Why Gray — and Why Not Always

Gray vs SP Labelling on 16-QAM: The Capacity Consequence

16-QAM Constellation with Gray and SP Labels

Parameters

Theorem: Gray Labelling Near-Optimality on AWGN

Part (a): high-SNR bound

Part (b): numerical bound over all SNR

Part (c): low-SNR anti-Gray construction

Example: 64-QAM Gray vs SP: The Numerical Showdown

Per-bit capacities at 15 dB

SNR-equivalent cost

The design implication

When SP Beats Gray: Iterative Decoding (Forward Ref)

On Fading, the Labelling Question Reopens

Common Mistake: "Gray Labelling Is Always Optimal"

Quick Check

Quick Check

Hamming Distance

Near-Optimality (BICM)