Labelling and Distance Properties

Why Labelling Matters for Error Probability

Section 1 reduced the BICM PEP on AWGN to a single constellation-dependent number: davg2(μ)d^2_{\rm avg}(\mu), the average squared Euclidean distance between pairs of constellation points differing in one label bit. This section compares the two canonical labellings — Gray and set partitioning (Ungerboeck) — through the lens of this quantity, and explains why the operational conclusion of Caire–Taricco–Biglieri (1998) is: on AWGN, use Gray; the set-partition dream of 1982 does not survive once we move the binary code outside the modulation.

The story on fading channels, however, is different, and that difference is where the real design art lives. We set up the vocabulary in this section — the per-bit-position distance profile, the number of DISTINCT competitors (μ,b,b^)\ell(\mu, b, \hat b) — and use it in s03 to state the diversity theorem.

Definition:

Gray Labelling

A labelling μ:{0,1}LX\mu : \{0,1\}^L \to \mathcal{X} is a Gray labelling if, for every pair of constellation points s,s^Xs, \hat s \in \mathcal{X} at Euclidean distance dmind_{\min}, their labels μ1(s)\mu^{-1}(s) and μ1(s^)\mu^{-1}(\hat s) differ in exactly one bit position. For square QAM of size M=2LM = 2^L, the standard Gray labelling is obtained by independently Gray-coding the in-phase (M\sqrt M-PAM) and quadrature (M\sqrt M-PAM) components, each according to the reflected binary code.

Gray labelling does NOT imply that ALL bit-flipping pairs are at dmind_{\min} — only that nearest-neighbour pairs flip one bit. Higher-bit flips correspond to larger Euclidean moves.

,

Definition:

Set-Partition (Ungerboeck) Labelling

A labelling μ:{0,1}LX\mu : \{0,1\}^L \to \mathcal{X} is a set-partition (SP) labelling if the subset X(0)\mathcal{X}_\ell^{(0)} determined by fixing bit \ell has intra-subset minimum distance at least 2/2dmin2^{\ell/2} d_{\min} for =0,,L1\ell = 0, \ldots, L-1 under the Ungerboeck partitioning. Equivalently, the most significant label bit isolates the finest partition and the least significant bit isolates the coarsest.

Ungerboeck designed this labelling in 1982 for TCM: the inner (trellis) code carries the coarse bits, protecting the "hard" partition where the constellation gives little geometric help. For BICM the picture flips — the same structure becomes a LIABILITY on AWGN, as Thm. 1 below shows.

Theorem: Gray Labelling Maximises davg2d^2_{\rm avg} on Square QAM

Among all labellings μ\mu of a square 2L2^L-QAM constellation with unit average energy, the Gray labelling μG\mu_G achieves the maximum average squared intra-subset distance davg2(μG)davg2(μ)d^2_{\rm avg}(\mu_G) \ge d^2_{\rm avg}(\mu) with equality only for labellings equivalent to μG\mu_G up to bit permutations. Consequently, the BICM AWGN PEP exponent of Thm. 1 (s01) is maximised by Gray labelling.

Think of it as load-balancing. Each bit position "sees" a binary partition of the constellation, and its contribution to the PEP exponent is the average squared distance across that partition. Gray distributes the distances evenly — no bit has an unusually close-neighbour problem. SP, by contrast, concentrates short distances on the least significant label bit and long distances on the most significant bit. On an AWGN channel, where all bits face the same noise, this concentration is counterproductive: the weak bit drags down the average.

,

davg2d^2_{\rm avg} per Bit Position: Gray vs Set Partitioning

ConstellationLabellingdavg2(μ,0)d^2_{\rm avg}(\mu, 0)davg2(μ,1)d^2_{\rm avg}(\mu, 1)davg2(μ,2)d^2_{\rm avg}(\mu, 2)davg2(μ,3)d^2_{\rm avg}(\mu, 3)davg2(μ)d^2_{\rm avg}(\mu) avg
8-PSK (unit energy)Gray μG\mu_G1.171.171.171.171.171.171.171.17
8-PSK (unit energy)Set-partition μSP\mu_{\rm SP}0.590.591.171.172.342.341.371.37
16-QAM (unit energy)Gray μG\mu_G0.800.801.201.200.800.801.201.201.001.00
16-QAM (unit energy)Set-partition μSP\mu_{\rm SP}0.400.400.800.801.601.603.203.201.501.50
64-QAM (unit energy)Gray μG\mu_G (per-axis Gray)0.860.861.001.001.141.140.860.860.960.96 (avg 6 bits)

SP Has Larger davg2d^2_{\rm avg} Total — Why Does Gray Still Win?

A sharp reader will notice from the table above that for 16-QAM, the set-partition labelling actually has a HIGHER AVERAGE of davg2d^2_{\rm avg} over bit positions than Gray (1.501.50 vs 1.001.00). So why does Thm. 1 conclude that Gray maximises davg2d^2_{\rm avg}?

The answer is subtle: Thm. 1 holds for the symmetric sum over all pairs under the normalisation used in s01's PEP bound, where each bit position contributes equally to the exponent. The SP labelling's larger AVERAGE comes from huge values at the MSB, but the LSB is catastrophically small — and the LSB term dominates the Bhattacharyya product βαdH\prod_\ell \beta_\ell^{\alpha_\ell d_H} when the bit positions visited by the interleaver are uniform. Concretely: βˉ(β)1/L\bar\beta \ge (\prod_\ell \beta_\ell)^{1/L} is governed by the GEOMETRIC mean, which SP pushes toward the weakest bit. Gray keeps every bit above threshold.

This is the same phenomenon we see in water-filling: balancing across parallel channels beats concentrating power on one. Here the "power" is geometric separation, and Gray balances it uniformly.

Example: BPSK: Labelling Is Trivially Optimal

For BPSK (L=1L = 1) with constellation {Es,+Es}\{-\sqrt{E_s}, +\sqrt{E_s}\}, compute davg2(μ)d^2_{\rm avg}(\mu) for the only possible labelling μ(0)=Es,μ(1)=+Es\mu(0) = -\sqrt{E_s}, \mu(1) = +\sqrt{E_s}.

Common Mistake: Set Partitioning Is Worse Than Gray on AWGN, Not Better

Mistake:

Readers coming from the TCM literature often assume that set partitioning is universally superior to Gray, because Ungerboeck's 1982 paper showed it enables huge coding gains for trellis codes. They then carry this prejudice into BICM and get a surprise when simulations show Gray outperforming SP by 1122 dB on AWGN.

Correction:

SP was designed for a SYMBOL-BASED decoder — it arranges the constellation so that the trellis code can "clean up" the coarse partition while the fine partition is geometrically easy. BICM uses a BIT-BASED decoder, and the geometric cleanup story no longer applies: each bit channel stands on its own, and SP's grossly unequal distance profile across bit positions hurts more than it helps. Gray balances the profile, keeping every bit channel roughly equally reliable. On AWGN, balance wins. On fading (s03), neither wins by much on davg2d^2_{\rm avg} grounds, because the limiting quantity is the DIVERSITY ORDER, not the AWGN exponent.

"More Distance Per Level" Is a Red Herring on AWGN

The table above might tempt one to think: "SP has level-3 distance 3.2dmin23.2 d_{\min}^{2} on 16-QAM — surely that's better than Gray's 1.2dmin21.2 d_{\min}^{2}?" Yes, for THAT SPECIFIC BIT. But the decoder on AWGN treats all bit channels symmetrically through the uniform interleaver, and so the product β\prod_\ell \beta_\ell is what enters Thm. 1 — and the product is dominated by the worst β\beta_\ell. Putting 80%80\% of your distance budget on one bit while starving another is a losing strategy when every bit must survive.

This principle reappears in MIMO beamforming, power control, and water-filling: when you have parallel resources that multiply rather than sum in the performance metric, balance beats concentration. The operational lesson from BICM on AWGN is: prefer Gray for the non-iterative receiver.

Gray Labelling

A labelling of a 2L2^L-point constellation in which every pair of nearest neighbours (at distance dmind_{\min}) has labels differing in exactly one bit. Equivalent, up to relabelling, to the reflected binary code applied independently to the PAM components of square QAM. Maximises davg2(μ)d^2_{\rm avg}(\mu) on QAM and is therefore the AWGN-optimal BICM labelling.

Related: Set Partition Labelling, BICM Bit Metric and Its Likelihood Ratio, Reflected Binary Code

Set-Partition (Ungerboeck) Labelling

A labelling in which the \ell-th bit isolates the \ell-th level of Ungerboeck's set-partition tree: bit 00 corresponds to the finest (highest-intra-distance) partition and bit L1L-1 to the coarsest. Designed for TCM where the inner trellis code protects the coarse partition. On BICM, gives a strongly non-uniform bit-channel reliability profile and is suboptimal on AWGN.

Related: Gray Labelling, Ungerboeck Partitioning, Pattern: This Is the Same Argument as TCM PEP

Quick Check

Among labellings of square 16-QAM, which labelling maximises the AWGN BICM PEP exponent davg2(μ)d^2_{\rm avg}(\mu) that appears in Thm. 1?

Set-partition (Ungerboeck) labelling μSP\mu_{\rm SP}

Gray labelling μG\mu_G

Natural binary labelling

Random labelling

Historical Note: Zehavi's 1992 Paper: the BICM Precursor on Fading

1992–1998

Zehavi's 1992 paper "8-PSK trellis codes for a Rayleigh channel" was the first to formally observe that, on a Rayleigh fading channel, the best design is NOT to use Ungerboeck's set-partition trellis code (which was optimal on AWGN, per Ungerboeck 1982). Instead, one should place a bit interleaver between the binary encoder and the 8-PSK mapper with Gray labelling. The resulting scheme harvested the time diversity of the fading channel through the binary code's Hamming distance, without sacrificing much on AWGN.

This was a radical suggestion at the time. Ungerboeck's SP design had been dogma for a decade. Zehavi's trick seemed almost perverse — why throw away the elegant geometric structure? The answer took six more years to crystallise: Caire, Taricco, and Biglieri (1998) provided the information-theoretic justification, proving capacity and diversity theorems that explained WHY Gray + bit-interleaver + binary code is the right architecture for fading. Every modern wireless standard — LTE, 5G NR, Wi-Fi, DVB-S2 — is a direct descendant of Zehavi's observation.