Labelling and Distance Properties
Why Labelling Matters for Error Probability
Section 1 reduced the BICM PEP on AWGN to a single constellation-dependent number: , the average squared Euclidean distance between pairs of constellation points differing in one label bit. This section compares the two canonical labellings — Gray and set partitioning (Ungerboeck) — through the lens of this quantity, and explains why the operational conclusion of Caire–Taricco–Biglieri (1998) is: on AWGN, use Gray; the set-partition dream of 1982 does not survive once we move the binary code outside the modulation.
The story on fading channels, however, is different, and that difference is where the real design art lives. We set up the vocabulary in this section — the per-bit-position distance profile, the number of DISTINCT competitors — and use it in s03 to state the diversity theorem.
Definition: Gray Labelling
Gray Labelling
A labelling is a Gray labelling if, for every pair of constellation points at Euclidean distance , their labels and differ in exactly one bit position. For square QAM of size , the standard Gray labelling is obtained by independently Gray-coding the in-phase (-PAM) and quadrature (-PAM) components, each according to the reflected binary code.
Gray labelling does NOT imply that ALL bit-flipping pairs are at — only that nearest-neighbour pairs flip one bit. Higher-bit flips correspond to larger Euclidean moves.
Definition: Set-Partition (Ungerboeck) Labelling
Set-Partition (Ungerboeck) Labelling
A labelling is a set-partition (SP) labelling if the subset determined by fixing bit has intra-subset minimum distance at least for under the Ungerboeck partitioning. Equivalently, the most significant label bit isolates the finest partition and the least significant bit isolates the coarsest.
Ungerboeck designed this labelling in 1982 for TCM: the inner (trellis) code carries the coarse bits, protecting the "hard" partition where the constellation gives little geometric help. For BICM the picture flips — the same structure becomes a LIABILITY on AWGN, as Thm. 1 below shows.
Theorem: Gray Labelling Maximises on Square QAM
Among all labellings of a square -QAM constellation with unit average energy, the Gray labelling achieves the maximum average squared intra-subset distance with equality only for labellings equivalent to up to bit permutations. Consequently, the BICM AWGN PEP exponent of Thm. 1 (s01) is maximised by Gray labelling.
Think of it as load-balancing. Each bit position "sees" a binary partition of the constellation, and its contribution to the PEP exponent is the average squared distance across that partition. Gray distributes the distances evenly — no bit has an unusually close-neighbour problem. SP, by contrast, concentrates short distances on the least significant label bit and long distances on the most significant bit. On an AWGN channel, where all bits face the same noise, this concentration is counterproductive: the weak bit drags down the average.
Write and recognise this as the TOTAL pairwise energy minus intra-label-pair terms, divided by .
Show that the total pairwise energy is labelling-independent (it equals for a unit-energy constellation).
Argue that is therefore a LINEAR FUNCTION of the labelling with a constant baseline, and that maximising it is equivalent to minimising the sum of intra-label-pair squared distances (i.e., pairs sharing bit labels).
Gray minimises this by ensuring that pairs with the same label bit are NOT nearest neighbours — which is exactly the Gray definition.
Step 1: Decomposition into total energy and intra-label sums
Expand: The first term on the right is labelling-independent; call it . The second term, call it , is the "intra-bit-label energy": sum over pairs that agree in each bit position.
Step 2: Relating to $d^2_{\rm avg}$
After normalising by , one obtains Since does not depend on , maximising is equivalent to minimising .
Step 3: Gray minimises intra-label energy
The intra-label energy is a sum over pairs of the squared distance weighted by the number of bit positions at which they share the same label value. Adjacent constellation points (at distance ) contribute to with weight , where is the Hamming distance of the two labels. Gray labelling is the unique structure (up to bit permutations) that makes for EVERY adjacent pair, which minimises the shared-label weight at the close distances that dominate the sum.
Step 4: Conclude
Any non-Gray labelling has at least one pair of adjacent points sharing two label bits, which strictly increases by relative to , and therefore strictly decreases . The Gray labelling achieves the minimum of , hence the maximum of .
per Bit Position: Gray vs Set Partitioning
| Constellation | Labelling | avg | ||||
|---|---|---|---|---|---|---|
| 8-PSK (unit energy) | Gray | — | ||||
| 8-PSK (unit energy) | Set-partition | — | ||||
| 16-QAM (unit energy) | Gray | |||||
| 16-QAM (unit energy) | Set-partition | |||||
| 64-QAM (unit energy) | Gray (per-axis Gray) | (avg 6 bits) |
SP Has Larger Total — Why Does Gray Still Win?
A sharp reader will notice from the table above that for 16-QAM, the set-partition labelling actually has a HIGHER AVERAGE of over bit positions than Gray ( vs ). So why does Thm. 1 conclude that Gray maximises ?
The answer is subtle: Thm. 1 holds for the symmetric sum over all pairs under the normalisation used in s01's PEP bound, where each bit position contributes equally to the exponent. The SP labelling's larger AVERAGE comes from huge values at the MSB, but the LSB is catastrophically small — and the LSB term dominates the Bhattacharyya product when the bit positions visited by the interleaver are uniform. Concretely: is governed by the GEOMETRIC mean, which SP pushes toward the weakest bit. Gray keeps every bit above threshold.
This is the same phenomenon we see in water-filling: balancing across parallel channels beats concentrating power on one. Here the "power" is geometric separation, and Gray balances it uniformly.
Example: BPSK: Labelling Is Trivially Optimal
For BPSK () with constellation , compute for the only possible labelling .
Single bit position
With and ,
Specialise Thm. 1
Plugging into the BICM AWGN PEP bound with : This is the classical binary-code PEP on AWGN, as expected. The BICM bound degenerates to the uncoded binary-code bound because there are no inner constellation subsets to average over.
Common Mistake: Set Partitioning Is Worse Than Gray on AWGN, Not Better
Mistake:
Readers coming from the TCM literature often assume that set partitioning is universally superior to Gray, because Ungerboeck's 1982 paper showed it enables huge coding gains for trellis codes. They then carry this prejudice into BICM and get a surprise when simulations show Gray outperforming SP by – dB on AWGN.
Correction:
SP was designed for a SYMBOL-BASED decoder — it arranges the constellation so that the trellis code can "clean up" the coarse partition while the fine partition is geometrically easy. BICM uses a BIT-BASED decoder, and the geometric cleanup story no longer applies: each bit channel stands on its own, and SP's grossly unequal distance profile across bit positions hurts more than it helps. Gray balances the profile, keeping every bit channel roughly equally reliable. On AWGN, balance wins. On fading (s03), neither wins by much on grounds, because the limiting quantity is the DIVERSITY ORDER, not the AWGN exponent.
"More Distance Per Level" Is a Red Herring on AWGN
The table above might tempt one to think: "SP has level-3 distance on 16-QAM — surely that's better than Gray's ?" Yes, for THAT SPECIFIC BIT. But the decoder on AWGN treats all bit channels symmetrically through the uniform interleaver, and so the product is what enters Thm. 1 — and the product is dominated by the worst . Putting of your distance budget on one bit while starving another is a losing strategy when every bit must survive.
This principle reappears in MIMO beamforming, power control, and water-filling: when you have parallel resources that multiply rather than sum in the performance metric, balance beats concentration. The operational lesson from BICM on AWGN is: prefer Gray for the non-iterative receiver.
Gray Labelling
A labelling of a -point constellation in which every pair of nearest neighbours (at distance ) has labels differing in exactly one bit. Equivalent, up to relabelling, to the reflected binary code applied independently to the PAM components of square QAM. Maximises on QAM and is therefore the AWGN-optimal BICM labelling.
Related: Set Partition Labelling, BICM Bit Metric and Its Likelihood Ratio, Reflected Binary Code
Set-Partition (Ungerboeck) Labelling
A labelling in which the -th bit isolates the -th level of Ungerboeck's set-partition tree: bit corresponds to the finest (highest-intra-distance) partition and bit to the coarsest. Designed for TCM where the inner trellis code protects the coarse partition. On BICM, gives a strongly non-uniform bit-channel reliability profile and is suboptimal on AWGN.
Related: Gray Labelling, Ungerboeck Partitioning, Pattern: This Is the Same Argument as TCM PEP
Quick Check
Among labellings of square 16-QAM, which labelling maximises the AWGN BICM PEP exponent that appears in Thm. 1?
Set-partition (Ungerboeck) labelling
Gray labelling
Natural binary labelling
Random labelling
By Thm. 2 of this section, Gray minimises the intra-label-energy — equivalently, maximises . This is why every modern AWGN BICM system uses Gray.
Historical Note: Zehavi's 1992 Paper: the BICM Precursor on Fading
1992–1998Zehavi's 1992 paper "8-PSK trellis codes for a Rayleigh channel" was the first to formally observe that, on a Rayleigh fading channel, the best design is NOT to use Ungerboeck's set-partition trellis code (which was optimal on AWGN, per Ungerboeck 1982). Instead, one should place a bit interleaver between the binary encoder and the 8-PSK mapper with Gray labelling. The resulting scheme harvested the time diversity of the fading channel through the binary code's Hamming distance, without sacrificing much on AWGN.
This was a radical suggestion at the time. Ungerboeck's SP design had been dogma for a decade. Zehavi's trick seemed almost perverse — why throw away the elegant geometric structure? The answer took six more years to crystallise: Caire, Taricco, and Biglieri (1998) provided the information-theoretic justification, proving capacity and diversity theorems that explained WHY Gray + bit-interleaver + binary code is the right architecture for fading. Every modern wireless standard — LTE, 5G NR, Wi-Fi, DVB-S2 — is a direct descendant of Zehavi's observation.