Labelling with Iteration: Why SP Comes Back

Gray Won. Then Iteration Arrived.

Chapter 6's conclusion was clean: on AWGN, Gray labelling maximises davg2(μ)d^2_{\rm avg}(\mu); on fully-interleaved Rayleigh fading, Gray gives Lmin(μG)=1L_{\min}(\mu_G) = 1 and factors out of the diversity formula. Set partitioning lost on both counts. The engineering slogan was "always use Gray."

BICM-ID changes the rule. The FIRST iteration still sees the one-shot demapper — where Gray dominates. But as the iteration progresses, the demapper is fed ever-sharper a priori, and what matters is no longer the one-shot MI Tdem(0,SNR)T_{\rm dem}(0, \mathrm{SNR}) but the ENDPOINT Tdem(1,SNR)T_{\rm dem}(1, \mathrm{SNR}) and the STEEPNESS of the curve between the two endpoints. Set partitioning wins at both: its one-shot MI is lower (the first iteration is worse), but its ENDPOINT is much higher AND its slope is much steeper, so the tunnel it opens at high IAI_A is far wider than Gray's. Under iteration, set partitioning has the better convergence threshold.

The lesson is a broader one about design-criterion matching. The right criterion depends on what the receiver can do. Give the receiver one shot, maximise Tdem(0,SNR)T_{\rm dem}(0, \mathrm{SNR}) — that is Gray. Give it iteration, shape the curve up to Tdem(1,SNR)=1T_{\rm dem}(1, \mathrm{SNR}) = 1 — that is SP (or better, a labelling explicitly designed for BICM-ID). The one-shot design and the iterative design are DIFFERENT OPTIMISATIONS, and the objects that win them are different.

Theorem: SP Has a Better Endpoint Than Gray: Tdem(1,SNR)T_{\rm dem}(1, \mathrm{SNR})

Consider a constellation X\mathcal{X} of size M=2LM = 2^L on an AWGN channel at SNR γ\gamma. Under the set-partition labelling μSP\mu_{\rm SP} with Ungerboeck chain rule — each bit level doubles the sub-constellation minimum distance — the demapper-with-perfect- a-priori extrinsic MI at position \ell satisfies Tdem(1,γ;μSP,)=J ⁣(σSP),σSP=4dSPN0,T_{\rm dem}(1, \gamma; \mu_{\rm SP}, \ell) = J\!\left(\sigma_\ell^{\rm SP}\right), \quad \sigma_\ell^{\rm SP} = \tfrac{4\, d_\ell^{\rm SP}}{\sqrt{N_0}}, where dSPd_\ell^{\rm SP} is the minimum distance of the SP sub- constellation at level \ell, which equals 2dmin2^\ell d_{\min} by the Ungerboeck chain rule. Averaging over bit positions, Tdem(1,γ;μSP)1T_{\rm dem}(1, \gamma; \mu_{\rm SP}) \to 1 as γ\gamma \to \infty at every bit position, with slope set by the LARGEST minimum distance available.

By contrast, Gray labelling has dGray=dmind_\ell^{\rm Gray} = d_{\min} at every bit level (Gray does not magnify minimum distances when sub-constellations are conditioned), so Tdem(1,γ;μGray)=J(4dmin/N0)T_{\rm dem}(1, \gamma; \mu_{\rm Gray}) = J(4 d_{\min}/\sqrt{N_0}). At finite SNR, Tdem(1,γ;μSP)>Tdem(1,γ;μGray)T_{\rm dem}(1, \gamma; \mu_{\rm SP}) > T_{\rm dem}(1, \gamma; \mu_{\rm Gray}) by a margin that grows with constellation size.

When a priori on all OTHER bits is perfect, the demapper is reduced to a BI-AWGN decision between two points — the one whose label has the target bit value 0, vs. the one with bit value 1, with all other bits fixed. The discrimination difficulty is set by the distance between those two points. Under SP labelling, that distance is the LARGEST in the constellation (Ungerboeck's choice maximised it on purpose). Under Gray, the distance is whatever happens to lie between the two points — typically the minimum distance dmind_{\min}. The ratio can be 2\sqrt{2} for 16-QAM (a 3 dB difference) and larger for 64-QAM.

,

Definition:

Anti-Gray Labelling μaG\mu_{\rm aG}

The anti-Gray labelling is obtained by applying an XOR bit- complement operation on a rotated copy of the Gray labelling — a simple combinatorial rearrangement that minimises the number of adjacent-pair bit disagreements rather than maximising it. Under anti-Gray, bit flips on neighbouring constellation points flip MANY label bits rather than one. Operationally, anti-Gray is a compromise between Gray (one-shot optimal) and SP (iteration-optimal): it has a modestly higher Tdem(1,γ)T_{\rm dem}(1, \gamma) than Gray but a flatter curve overall, so it wins at moderate IAI_A levels and on short-iteration receivers.

Anti-Gray is rarely deployed in practice — the M16a labelling of [?chindapol-ritcey-2001] strictly dominates it for 16-QAM at all operating points — but it is a useful pedagogical labelling because its demapper curve LIES BETWEEN Gray and SP at every IAI_A. If you plot all three on the same EXIT chart, the picture gives an immediate sense of the labelling-design continuum.

BER vs Iteration Count for Gray, SP, and Anti-Gray

BER of 16-QAM BICM-ID at a fixed Eb/N0E_b/N_0, as a function of the outer iteration count t=0,1,,15t = 0, 1, \ldots, 15, for the three labelings (Gray μG\mu_G, set-partition μSP\mu_{\rm SP}, anti-Gray μaG\mu_{\rm aG}). At low SNR, Gray leads at t=0t = 0 but all three stall; at the SP convergence threshold, SP drops by 4 orders of magnitude in the first 5 iterations while Gray stalls; at high SNR all three eventually converge but SP reaches the error floor first. This plot is the direct operational complement to the EXIT chart of s03.

Parameters
4

Labelling Comparison for 16-QAM: One-Shot BICM vs. BICM-ID

LabellingTdem(0,5 dB)T_{\rm dem}(0, 5\text{ dB})Tdem(1,5 dB)T_{\rm dem}(1, 5\text{ dB})BICM thresh. (1-shot)BICM-ID thresh. (SP-style) + 8 iterBER at Eb/N0=6E_b/N_0 = 6 dB, 10 iter
Gray μG\mu_G0.860.825.5 dB5.4 dB5×104\sim 5 \times 10^{-4}
Set Partition μSP\mu_{\rm SP}0.660.997.2 dB4.7 dB2×106\sim 2 \times 10^{-6}
Anti-Gray μaG\mu_{\rm aG}0.780.916.1 dB5.0 dB3×105\sim 3 \times 10^{-5}
M16a (Chindapol–Ritcey)0.720.996.6 dB4.5 dB8×107\sim 8 \times 10^{-7}
Natural binary μNB\mu_{\rm NB}0.580.768.1 dB7.8 dB102\sim 10^{-2} (stalls)

Example: EXIT Slope Comparison: SP vs. Gray at IA=0.5I_A = 0.5

For 16-QAM at Eb/N0=4E_b/N_0 = 4 dB, numerically evaluate the slopes Tdem(IA,γ)/IA\partial T_{\rm dem}(I_A, \gamma)/\partial I_A at IA=0.5I_A = 0.5 for Gray μG\mu_G and set-partition μSP\mu_{\rm SP} labelings, and interpret the difference.

🔧Engineering Note

When to Use Which Labelling

The operational advice from two decades of BICM/BICM-ID deployment: use Gray by default. If the receiver has no iterative-demapper capability, Gray is uniquely optimal; if it has iteration but can only afford 1–2 passes (low-latency control channels, URLLC), Gray still wins because the first iteration is the most productive. Switch to SP (or a BICM-ID-optimised labelling like M16a) ONLY when (i) the operating SNR is within 1 dB of capacity, where closing the last 0.3 dB matters; (ii) the receiver can afford 5\geq 5 iterations; and (iii) the outer code is strong (LDPC or turbo, not short convolutional). DVB-S2X's very-low-SNR MODCODs meet all three conditions and use BICM-ID with an APSK-SP hybrid; 5G NR data channels meet (i) and (iii) but rarely (ii), so they stay with Gray.

Practical Constraints
  • Default choice: Gray (optimal for 1-shot decoding and short iteration)

  • Switch to SP only when SNR is near capacity AND iterations ≥ 5

  • Short-block regimes (< 1000 bits): stay with Gray regardless

  • APSK on satellite: use SP-like labelling for VL-SNR, Gray elsewhere

📋 Ref: DVB-S2X Annex E; 5G NR TS 38.212 §5.4.2 (BICM, no mandatory iteration)
,

Why SP's Endpoint Is Near 1, Not Exactly 1

Why does SP's Tdem(1,SNR)T_{\rm dem}(1, \mathrm{SNR}) approach but not quite equal 1 at finite SNR? Because even with perfect a priori on the other bits, the target bit is still observed through AWGN — its MI is bounded by the BI-AWGN capacity at the conditioned sub- constellation's minimum distance. At very high SNR that capacity approaches 1 bit; at Eb/N0=5E_b/N_0 = 5 dB for 16-QAM-SP the conditioned sub-constellation is two points at distance 2dmin2 d_{\min} (level-1 SP), and the per-bit MI is J(8dmin/N0)0.99J(8 d_{\min}/\sqrt{N_0}) \approx 0.99. So the curve ends at 0.99\approx 0.99, not 1. The SHORTFALL from 1 is the remaining noise margin, and it determines the error floor of BICM-ID after tunnel opens: a BER-per-iteration plot hits a plateau governed by this shortfall, not by the outer code's distance structure.

Common Mistake: Extending "Gray is Optimal" to Iterative Receivers

Mistake:

The Ch. 5–6 conclusion — Gray labelling is near-optimal for BICM — is so well-established in the literature that many designers apply it reflexively to BICM-ID systems as well, on the grounds that "Gray is always the safe choice." On an iterative receiver at high SNR, this reflex gives up 1.0–1.5 dB relative to SP.

Correction:

The Gray-optimality theorem of Ch. 5 is a ONE-SHOT result; it does not survive the feedback loop of BICM-ID. Under iteration, the relevant demapper metric is not the one-shot MI Tdem(0,SNR)T_{\rm dem}(0, \mathrm{SNR}) but the shape of the entire curve, and Gray's flat curve is its iterative DOWNFALL. When designing for BICM-ID, always plot the EXIT chart with the candidate labelings before committing; when designing for one-shot BICM, Gray is still the right default. This is a LABEL-AGNOSTIC receiver-architecture decision: pair the labelling to the decoder.

Quick Check

Why does set-partition labelling outperform Gray under BICM-ID?

SP has a smaller minimum Euclidean distance.

SP has a steeper demapper EXIT curve — the demapper benefits more per bit of a priori information.

SP uses more constellation points.

SP gives better one-shot BER.

Historical Note: Zehavi Thought SP Was Obsolete. BICM-ID Brought It Back.

1992–2001

Ephraim Zehavi's 1992 paper on bit-interleaved 8-PSK established that Gray beats SP for one-shot decoding on fading channels — the observation that Caire, Taricco, and Biglieri formalised into BICM in 1998. By the mid-1990s "Gray is always right" had become orthodoxy in the coded-modulation community, and set partitioning was relegated to legacy TCM designs (V.32, V.34 modems) that were being replaced by BICM. When Li and Ritcey showed in 1997 that iterative decoding with SP labelling could recover 2+ dB on Rayleigh fading, the initial reception was sceptical: "SP is obsolete." Ten Brink's EXIT-chart analysis (2001) then supplied the QUANTITATIVE explanation — SP's steep EXIT curve — and the community accepted that labelling-optimality is receiver-dependent. The story is a clean instance of Caire's "design-criterion depends on what the decoder can do" motif: change the decoder, and the optimal modulator changes too. SP did not come back because it had been undervalued; it came back because its design target (maximise sub-constellation minimum distance) aligned with a new decoder architecture (iterative with a priori).

,

Key Takeaway

The optimal labelling depends on the decoder. For one-shot BICM, maximise Tdem(0,SNR)T_{\rm dem}(0, \mathrm{SNR}) — use Gray. For BICM-ID with many iterations, reshape the whole demapper EXIT curve — use set partition or a designed BICM-ID labelling (M16a family). The mechanism is that SP's Ungerboeck chain rule doubles sub- constellation minimum distance at each bit level, which translates into a steep demapper EXIT slope and a high endpoint Tdem(1,SNR)1T_{\rm dem}(1, \mathrm{SNR}) \to 1. Gray has flat levels and a flat curve. Choose the labelling that matches the receiver's iteration budget.

Why This Matters: BICM-ID Labelling in Practice: DVB-S2X vs 5G NR

A concrete illustration of the Gray-vs-SP tradeoff appears in the BICM architectures of two major modern standards. DVB-S2X's very- low-SNR modes (QPSK 2/9 down to QPSK 1/5) operate within 0.5 dB of capacity on rain-faded satellite links; these modes specify an SP-inspired APSK labelling and a receiver that performs 3–8 iterations between the APSK demapper and the LDPC decoder, saving 0.3–0.5 dB over one-shot Gray. 5G NR, by contrast, targets throughput rather than coverage and operates 2–4 dB above capacity; NR's modulation spec mandates Gray labelling on 64-QAM and 256-QAM, and iterative demapping is OPTIONAL (most commercial chipsets do 1–2 passes as a CRC-failure fallback). The labelling choice is an engineering reflection of the operating regime: near capacity, iterate with SP; well above capacity, Gray with a stronger code.