The Gray Labelling Near-Optimality Theorem
Why Gray — and Why Not Always
If BICM capacity depends on the labelling only through the per-bit marginals, which labelling maximises the sum ? On the AWGN channel with QAM, the empirical answer — first quantified by Caire-Taricco-Biglieri — is Gray labelling. On fading channels (Ch. 6) and in iterative-decoding setups (Ch. 8), the answer can flip. This section proves the Gray near-optimality theorem and carefully states what it does not claim.
The intuition for Gray's success is geometric: under Gray labelling, nearest-neighbour constellation points differ in exactly one label bit, so each bit has an approximately independent and equally useful view of the channel. Under SP labelling, the bits are hierarchical, with the top bit effectively noise-free and the bottom bit nearly useless in isolation — a structure that is powerful when conditioning is available (MLC) but punitive when it is not (BICM).
Gray vs SP Labelling on 16-QAM: The Capacity Consequence
16-QAM Constellation with Gray and SP Labels
The same sixteen constellation points with two different labellings. Gray labelling assigns labels so that nearest neighbours differ in exactly one bit — trace any horizontal or vertical step and verify. SP labelling organises labels hierarchically down the Ungerboeck partition tree — the top-level bit selects between two maximally-separated subsets of eight points. Hover over a point to see its 4-bit label and the bit-wise difference to each of its nearest neighbours.
Parameters
Theorem: Gray Labelling Near-Optimality on AWGN
Let be square -QAM ( for integer ) with unit-energy constraint and let be its standard Gray labelling (two independent PAM Gray codes). Then, on the AWGN channel,
(a) Uniform high-SNR convergence: for all above any fixed threshold, for a constant that depends on but not on SNR. In particular, the gap vanishes exponentially as .
(b) Uniform bound over all SNR: over the entire SNR range, This is not an SNR-dependent bound; it is a uniform numerical bound that holds across all SNRs, derived from numerical integration in Caire-Taricco-Biglieri §V.B.
(c) No Gray ordering is universally optimal at low SNR. For there exist non-Gray labellings (e.g., "anti-Gray" constructions specifically designed for low SNR) that slightly exceed standard Gray's BICM capacity at very low SNR. The advantage is at most bits and occurs only in a regime where BICM is itself operationally irrelevant (capacities well below bit/symbol). For practical SNRs Gray wins uniformly.
Part (a) follows from the high-SNR asymptotic of Thm. TBICM Capacity — High-SNR Asymptotics: under Gray, nearest-neighbour flips correspond to one-bit errors, so the marginal per-bit entropies capture essentially all the joint entropy. Part (b) is the numerical content of the 1998 paper — it is this small number (which is well below a tenth of a bit) that turned BICM into a design-cookbook item. Part (c) is the caveat that keeps researchers honest: "Gray is optimal" is an overstatement; "Gray is near-optimal on AWGN in the practical SNR range" is the careful claim.
For (a), use the high-SNR asymptotic from Thm. TBICM Capacity — High-SNR Asymptotics.
For (b), integrate the per-bit and symbol mutual informations numerically on a dense SNR grid and take the supremum.
For (c), construct the anti-Gray labelling by pairing each point with its Hamming-farthest partner, then numerically show the -bit advantage at low SNR.
Part (a): high-SNR bound
Fix Gray labelling . By Thm. TBICM Capacity — High-SNR Asymptotics, the gap term decays as because conditioning on changes the posterior of only when a nearest-neighbour error occurs (Gray implies the "first" error doesn't touch except with vanishing probability). Summing the such terms gives the stated upper bound with equal to a combinatorial factor depending on the number of dominant nearest neighbours of .
Part (b): numerical bound over all SNR
For each , the authors of Caire-Taricco-Biglieri (1998) compute and on a dense SNR grid spanning dB by numerical integration of the AWGN likelihood over the constellation points. The supremum of the difference is tabulated (Table II of §V.B) and is bounded above by bits for all up to -QAM. For QAM beyond the argument extends by the scaling behaviour of the Gray-decomposable rectangular constellations — each additional PAM level contributes at most bits to the gap.
Part (c): low-SNR anti-Gray construction
At very low SNR the mutual information is dominated by the second-order Taylor expansion of the log-likelihood, and the BICM capacity is approximately , where is a fourth-order moment of the labelling. Minimising under the combinatorial constraint that be a valid binary labelling is a small integer program, and for the optimum is not Gray but an anti-Gray labelling that maximises the average Hamming distance between nearest neighbours. The advantage is — about bits at dB — vanishing as and negligible at any operating rate above bit/symbol.
Example: 64-QAM Gray vs SP: The Numerical Showdown
For 64-QAM at dB, tabulate the six per-bit capacities under Gray and SP labellings and compute the two BICM capacities and the CM capacity. Quantify the dB-of-SNR cost of each labelling.
Per-bit capacities at 15 dB
By numerical integration (see simulation vs. SNR" data-ref-type="interactive_plot">📊Per-Level Bit-Channel Capacities vs. SNR),
- Gray: , with the two triples corresponding to the and 8-PAM components. Sum bits.
- SP: . Sum bits.
- CM: bits (by direct numerical integration over the 64-point symbol constellation).
SNR-equivalent cost
The operating rate bits/symbol sits in the waterfall region. The rate bits is reached by CM at about dB, by Gray-BICM at about dB, and by SP-BICM at about dB. So:
- Gray-BICM cost: dB of SNR — negligible.
- SP-BICM cost: dB of SNR — substantial.
The design implication
For 64-QAM on AWGN, switching from Gray to SP costs about dB of SNR in a BICM system — a large penalty that immediately explains why no modern standard uses SP labelling with non-iterative BICM decoding. However, when the receiver iterates between decoder and demapper (BICM-ID, Ch. 8), SP becomes competitive again because the feedback provides conditioning that closes the MLC-style gap. We return to this in Ch. 8.
When SP Beats Gray: Iterative Decoding (Forward Ref)
The strict dominance of Gray over SP for non-iterative BICM reverses in BICM with iterative decoding (BICM-ID). The reason is a beautiful piece of symmetry: BICM-ID feeds soft a-priori information from the decoder back to the demapper, which effectively gives the demapper conditional rather than unconditional knowledge of the other label bits. Under Gray labelling this extra conditioning yields little benefit (the bits were already nearly independent). Under SP labelling, conditioning on the high-SNR bits dramatically improves the posteriors of the low-SNR bits — exactly the MLC conditioning benefit, now recovered by iteration.
The net effect: on some constellations, BICM-ID with SP labelling closes the CM-capacity gap entirely, matching the performance of MLC/MSD at comparable decoding complexity. This is one of the reasons why BICM- ID remains an active research area. We treat it in Chapter 8.
On Fading, the Labelling Question Reopens
On fading channels, the capacity argument has to be re-examined because the channel gain changes symbol by symbol. Caire-Taricco-Biglieri §IV shows that Gray labelling is still near-optimal in capacity, but the diversity order of a BICM system on a block-fading channel is driven by a completely different quantity: the minimum number of distinct bit positions involved in the code's free-distance events. This quantity depends on the labelling in a non-trivial way and is studied in detail in Chapter 6. The operational conclusion is that on AWGN the choice of labelling is a capacity question (Gray wins); on fading channels it is both a capacity question (Gray still wins) and a diversity-order question (labelling matters, code design matters more).
Common Mistake: "Gray Labelling Is Always Optimal"
Mistake:
Claiming that Gray labelling is optimal for every BICM configuration.
Correction:
Gray is near-optimal for non-iterative BICM on AWGN across the practical SNR range, and is the right default. But at very low SNR on a perfectly symmetric channel, anti-Gray constructions can slightly beat Gray (Thm. TGray Labelling Near-Optimality on AWGN(c)); with iterative decoding (BICM-ID, Ch. 8), SP can beat Gray by several dB on some constellations; on fading channels the diversity-optimal labelling may not coincide with the capacity-optimal one (Ch. 6). The universally correct statement is: Gray is the right default for non-iterative BICM on AWGN; revisit the labelling when decoding is iterative or the channel is fading.
Quick Check
Which of the following is the defining property of Gray labelling?
Every pair of constellation points differs in at most one label bit
Every pair of nearest-neighbour constellation points differs in exactly one label bit
Bits are assigned so that the top bit partitions the constellation into coarsest cosets
The sum of Hamming distances between all pairs of labels equals the sum of Euclidean distances
The nearest-neighbour property is the defining feature: Gray labelling ensures that the Euclidean-distance-minimising error flips one bit, not several. This is what makes each bit see an approximately independent binary channel under BICM and what gives Gray its near-optimality on AWGN.
Quick Check
Which of the following scenarios can make SP labelling outperform Gray?
BICM with iterative decoder-demapper feedback (BICM-ID)
Non-iterative BICM on AWGN at moderate SNR
MIMO with spatial multiplexing
Whenever the constellation is non-square
In BICM-ID, a-priori information from the decoder conditions the demapper's LLR computation, recovering the MLC-style conditioning benefit. Under SP the conditioning is very valuable (the level-wise capacities are highly asymmetric); under Gray the extra conditioning buys less (the levels are already roughly equal). Hence SP can exceed Gray under iterative BICM-ID.
Hamming Distance
The number of positions at which two binary strings of equal length differ. Central to Gray-labelling theory: Gray ensures that Euclidean nearest neighbours are at Hamming distance exactly one.
Related: Gray Labelling, The -th BICM Bit Channel
Near-Optimality (BICM)
The property that for Gray labelling on square QAM as , and more strongly that the absolute gap is bounded by bits uniformly across all SNRs (Thm. TGray Labelling Near-Optimality on AWGN).
Related: Gray Labelling, BICM Capacity