The Gray Labelling Near-Optimality Theorem

Why Gray — and Why Not Always

If BICM capacity depends on the labelling only through the per-bit marginals, which labelling maximises the sum I(Y;B)\sum_\ell I(Y; B_\ell)? On the AWGN channel with QAM, the empirical answer — first quantified by Caire-Taricco-Biglieri — is Gray labelling. On fading channels (Ch. 6) and in iterative-decoding setups (Ch. 8), the answer can flip. This section proves the Gray near-optimality theorem and carefully states what it does not claim.

The intuition for Gray's success is geometric: under Gray labelling, nearest-neighbour constellation points differ in exactly one label bit, so each bit has an approximately independent and equally useful view of the channel. Under SP labelling, the bits are hierarchical, with the top bit effectively noise-free and the bottom bit nearly useless in isolation — a structure that is powerful when conditioning is available (MLC) but punitive when it is not (BICM).

,

Gray vs SP Labelling on 16-QAM: The Capacity Consequence

This animation toggles the 16-QAM labelling between Gray and SP and shows the resulting per-bit capacities C0,,C3C_{0}, \ldots, C_{3} as horizontal bars that grow and shrink in real time. Under Gray, the four bars are roughly equal; under SP, the top bars (C0,C1C_{0}, C_{1}) are tall and the bottom bars (C2,C3C_{2}, C_{3}) are short. The total sum of bar heights is the BICM capacity — and it is visibly larger under Gray.
Visualising the Gray-vs-SP BICM capacity gap on 16-QAM at SNR=10\text{SNR} = 10 dB. Gray labelling spreads useful information across the four bits; SP concentrates it in two.

16-QAM Constellation with Gray and SP Labels

The same sixteen constellation points with two different labellings. Gray labelling assigns labels so that nearest neighbours differ in exactly one bit — trace any horizontal or vertical step and verify. SP labelling organises labels hierarchically down the Ungerboeck partition tree — the top-level bit b0b_0 selects between two maximally-separated subsets of eight points. Hover over a point to see its 4-bit label and the bit-wise difference to each of its nearest neighbours.

Parameters

Theorem: Gray Labelling Near-Optimality on AWGN

Let X\mathcal{X} be square MM-QAM (M=4kM = 4^k for integer kk) with unit-energy constraint and let μG\mu_G be its standard Gray labelling (two independent PAM Gray codes). Then, on the AWGN channel,

(a) Uniform high-SNR convergence: for all SNR\text{SNR} above any fixed threshold, 0    CCMCBICM(μG)    cMQ(dminSNR/2)0 \;\le\; C_{\rm CM} - C_{\rm BICM}(\mu_G) \;\le\; c_M \cdot Q\bigl(d_{\min} \sqrt{\text{SNR}/2}\bigr) for a constant cMc_M that depends on MM but not on SNR. In particular, the gap vanishes exponentially as SNR\text{SNR} \to \infty.

(b) Uniform bound over all SNR: over the entire SNR range, CCMCBICM(μG)    0.05 bitsfor M=4,16,64,256.C_{\rm CM} - C_{\rm BICM}(\mu_G) \;\le\; 0.05 \text{ bits} \quad \text{for } M = 4, 16, 64, 256. This is not an SNR-dependent bound; it is a uniform numerical bound that holds across all SNRs, derived from numerical integration in Caire-Taricco-Biglieri §V.B.

(c) No Gray ordering is universally optimal at low SNR. For M16M \ge 16 there exist non-Gray labellings (e.g., "anti-Gray" constructions specifically designed for low SNR) that slightly exceed standard Gray's BICM capacity at very low SNR. The advantage is at most 0.02\approx 0.02 bits and occurs only in a regime where BICM is itself operationally irrelevant (capacities well below 11 bit/symbol). For practical SNRs Gray wins uniformly.

Part (a) follows from the high-SNR asymptotic of Thm. TBICM Capacity — High-SNR Asymptotics: under Gray, nearest-neighbour flips correspond to one-bit errors, so the marginal per-bit entropies capture essentially all the joint entropy. Part (b) is the numerical content of the 1998 paper — it is this small number (which is well below a tenth of a bit) that turned BICM into a design-cookbook item. Part (c) is the caveat that keeps researchers honest: "Gray is optimal" is an overstatement; "Gray is near-optimal on AWGN in the practical SNR range" is the careful claim.

,

Example: 64-QAM Gray vs SP: The Numerical Showdown

For 64-QAM at SNR=15\text{SNR} = 15 dB, tabulate the six per-bit capacities under Gray and SP labellings and compute the two BICM capacities and the CM capacity. Quantify the dB-of-SNR cost of each labelling.

When SP Beats Gray: Iterative Decoding (Forward Ref)

The strict dominance of Gray over SP for non-iterative BICM reverses in BICM with iterative decoding (BICM-ID). The reason is a beautiful piece of symmetry: BICM-ID feeds soft a-priori information from the decoder back to the demapper, which effectively gives the demapper conditional rather than unconditional knowledge of the other label bits. Under Gray labelling this extra conditioning yields little benefit (the bits were already nearly independent). Under SP labelling, conditioning on the high-SNR bits dramatically improves the posteriors of the low-SNR bits — exactly the MLC conditioning benefit, now recovered by iteration.

The net effect: on some constellations, BICM-ID with SP labelling closes the CM-capacity gap entirely, matching the performance of MLC/MSD at comparable decoding complexity. This is one of the reasons why BICM- ID remains an active research area. We treat it in Chapter 8.

On Fading, the Labelling Question Reopens

On fading channels, the capacity argument has to be re-examined because the channel gain changes symbol by symbol. Caire-Taricco-Biglieri §IV shows that Gray labelling is still near-optimal in capacity, but the diversity order of a BICM system on a block-fading channel is driven by a completely different quantity: the minimum number of distinct bit positions involved in the code's free-distance events. This quantity depends on the labelling in a non-trivial way and is studied in detail in Chapter 6. The operational conclusion is that on AWGN the choice of labelling is a capacity question (Gray wins); on fading channels it is both a capacity question (Gray still wins) and a diversity-order question (labelling matters, code design matters more).

,

Common Mistake: "Gray Labelling Is Always Optimal"

Mistake:

Claiming that Gray labelling is optimal for every BICM configuration.

Correction:

Gray is near-optimal for non-iterative BICM on AWGN across the practical SNR range, and is the right default. But at very low SNR on a perfectly symmetric channel, anti-Gray constructions can slightly beat Gray (Thm. TGray Labelling Near-Optimality on AWGN(c)); with iterative decoding (BICM-ID, Ch. 8), SP can beat Gray by several dB on some constellations; on fading channels the diversity-optimal labelling may not coincide with the capacity-optimal one (Ch. 6). The universally correct statement is: Gray is the right default for non-iterative BICM on AWGN; revisit the labelling when decoding is iterative or the channel is fading.

Quick Check

Which of the following is the defining property of Gray labelling?

Every pair of constellation points differs in at most one label bit

Every pair of nearest-neighbour constellation points differs in exactly one label bit

Bits are assigned so that the top bit partitions the constellation into coarsest cosets

The sum of Hamming distances between all pairs of labels equals the sum of Euclidean distances

Quick Check

Which of the following scenarios can make SP labelling outperform Gray?

BICM with iterative decoder-demapper feedback (BICM-ID)

Non-iterative BICM on AWGN at moderate SNR

MIMO with spatial multiplexing

Whenever the constellation is non-square

Hamming Distance

The number of positions at which two binary strings of equal length differ. Central to Gray-labelling theory: Gray ensures that Euclidean nearest neighbours are at Hamming distance exactly one.

Related: Gray Labelling, The \ell-th BICM Bit Channel

Near-Optimality (BICM)

The property that CBICM(μG)/CCM1C_{\rm BICM}(\mu_G) / C_{\rm CM} \to 1 for Gray labelling μG\mu_G on square QAM as SNR\text{SNR} \to \infty, and more strongly that the absolute gap is bounded by 0.05\lesssim 0.05 bits uniformly across all SNRs (Thm. TGray Labelling Near-Optimality on AWGN).

Related: Gray Labelling, BICM Capacity