Ferkans — Interactive Telecom Tutor

Gray Won. Then Iteration Arrived.

Chapter 6's conclusion was clean: on AWGN, Gray labelling maximises $d^2_{\rm avg}(\mu)$ ; on fully-interleaved Rayleigh fading, Gray gives $L_{\min}(\mu_G) = 1$ and factors out of the diversity formula. Set partitioning lost on both counts. The engineering slogan was "always use Gray."

BICM-ID changes the rule. The FIRST iteration still sees the one-shot demapper — where Gray dominates. But as the iteration progresses, the demapper is fed ever-sharper a priori, and what matters is no longer the one-shot MI $T_{\rm dem}(0, \mathrm{SNR})$ but the ENDPOINT $T_{\rm dem}(1, \mathrm{SNR})$ and the STEEPNESS of the curve between the two endpoints. Set partitioning wins at both: its one-shot MI is lower (the first iteration is worse), but its ENDPOINT is much higher AND its slope is much steeper, so the tunnel it opens at high $I_A$ is far wider than Gray's. Under iteration, set partitioning has the better convergence threshold.

The lesson is a broader one about design-criterion matching. The right criterion depends on what the receiver can do. Give the receiver one shot, maximise $T_{\rm dem}(0, \mathrm{SNR})$ — that is Gray. Give it iteration, shape the curve up to $T_{\rm dem}(1, \mathrm{SNR}) = 1$ — that is SP (or better, a labelling explicitly designed for BICM-ID). The one-shot design and the iterative design are DIFFERENT OPTIMISATIONS, and the objects that win them are different.

Theorem: SP Has a Better Endpoint Than Gray: $T_{\rm dem}(1, \mathrm{SNR})$

Consider a constellation $\mathcal{X}$ of size $M = 2^L$ on an AWGN channel at SNR $\gamma$ . Under the set-partition labelling $\mu_{\rm SP}$ with Ungerboeck chain rule — each bit level doubles the sub-constellation minimum distance — the demapper-with-perfect- a-priori extrinsic MI at position $\ell$ satisfies $T_{\rm dem}(1, \gamma; \mu_{\rm SP}, \ell) = J\!\left(\sigma_\ell^{\rm SP}\right), \quad \sigma_\ell^{\rm SP} = \tfrac{4\, d_\ell^{\rm SP}}{\sqrt{N_0}},$ where $d_\ell^{\rm SP}$ is the minimum distance of the SP sub- constellation at level $\ell$ , which equals $2^\ell d_{\min}$ by the Ungerboeck chain rule. Averaging over bit positions, $T_{\rm dem}(1, \gamma; \mu_{\rm SP}) \to 1$ as $\gamma \to \infty$ at every bit position, with slope set by the LARGEST minimum distance available.

By contrast, Gray labelling has $d_\ell^{\rm Gray} = d_{\min}$ at every bit level (Gray does not magnify minimum distances when sub-constellations are conditioned), so $T_{\rm dem}(1, \gamma; \mu_{\rm Gray}) = J(4 d_{\min}/\sqrt{N_0})$ . At finite SNR, $T_{\rm dem}(1, \gamma; \mu_{\rm SP}) > T_{\rm dem}(1, \gamma; \mu_{\rm Gray})$ by a margin that grows with constellation size.

When a priori on all OTHER bits is perfect, the demapper is reduced to a BI-AWGN decision between two points — the one whose label has the target bit value 0, vs. the one with bit value 1, with all other bits fixed. The discrimination difficulty is set by the distance between those two points. Under SP labelling, that distance is the LARGEST in the constellation (Ungerboeck's choice maximised it on purpose). Under Gray, the distance is whatever happens to lie between the two points — typically the minimum distance $d_{\min}$ . The ratio can be $\sqrt{2}$ for 16-QAM (a 3 dB difference) and larger for 64-QAM.

Show Hint

With perfect a priori on $L - 1$ bits, the demapper-with-a-priori reduces to a TWO-POINT LLR computation between the two candidate symbols consistent with the known bits and the two values of the target bit.

For SP, the Ungerboeck chain rule ensures that a subset conditioned on $L - 1$ bits is a two-point subset at the LARGEST available Euclidean distance — the level- $\ell$ sub-constellation distance $2^\ell d_{\min}$ .

For Gray, the two conditioning-compatible points differ in exactly one bit with distance $d_{\min}$ , regardless of level.

Plug into the BI-AWGN LLR variance formula $\sigma^2 = 8 d^2/N_0$ and compute $J(\sigma)$ .

Proof

Step 1: Two-point reduction under perfect a priori

With $\lambda_{A,k} \to \pm \infty$ for all $k \ne \ell$ , the a-priori probability $P(b_k \mid \lambda_{A,k})$ concentrates at the true value of $b_k$ . The demapper sums in DDemapper with A Priori Information reduce to TWO terms: $s_0 \in \mathcal{X}_\ell^{(0)}$ consistent with all other (known) bits, and $s_1 \in \mathcal{X}_\ell^{(1)}$ analogously. The demapper LLR becomes $\lambda_\ell = (y - s_1)^2/N_0 - (y - s_0)^2/N_0$ (up to constants), which on AWGN is a BI-AWGN LLR with separation $\|s_0 - s_1\|$ .

Step 2: SP maximises the separation

For set-partition labelling with the Ungerboeck chain rule, the level- $\ell$ sub-constellation obtained by fixing the $L - \ell$ most-significant bits consists of two points at distance $2^\ell d_{\min}$ (this is the chain rule's defining property). Hence $\|s_0 - s_1\| = 2^\ell d_{\min}$ for the SP level- $\ell$ bit, and $\sigma_\ell^{\rm SP} = 4 \cdot 2^\ell d_{\min}/\sqrt{N_0}$ . The corresponding MI is $J(\sigma_\ell^{\rm SP})$ .

Step 3: Gray gives only minimum distance

For Gray labelling, the two points that flip one label bit are adjacent in the labelling graph; by construction of Gray codes they are Euclidean neighbours at the minimum distance $d_{\min}$ . Hence $\|s_0 - s_1\| = d_{\min}$ regardless of the level $\ell$ . So $\sigma^{\rm Gray} = 4 d_{\min}/\sqrt{N_0}$ and the per-bit endpoint MI is $J(4 d_{\min}/\sqrt{N_0})$ .

Step 4: Average over bit positions

Averaging $J(\sigma_\ell^{\rm SP})$ over $\ell$ yields, for 16-QAM at $E_s/N_0 = 5$ dB, $T_{\rm dem}(1, \gamma; \mu_{\rm SP}) \approx 0.99$ versus $T_{\rm dem}(1, \gamma; \mu_{\rm Gray}) \approx 0.82$ (numerical). The difference is an EIGHTH of a bit of endpoint MI — enough to determine whether the tunnel is open in a near-capacity BICM-ID design. The same comparison extended to 64-QAM widens the gap to 0.25 bits. $\blacksquare$

,

Definition:
Anti-Gray Labelling $\mu_{\rm aG}$

The anti-Gray labelling is obtained by applying an XOR bit- complement operation on a rotated copy of the Gray labelling — a simple combinatorial rearrangement that minimises the number of adjacent-pair bit disagreements rather than maximising it. Under anti-Gray, bit flips on neighbouring constellation points flip MANY label bits rather than one. Operationally, anti-Gray is a compromise between Gray (one-shot optimal) and SP (iteration-optimal): it has a modestly higher $T_{\rm dem}(1, \gamma)$ than Gray but a flatter curve overall, so it wins at moderate $I_A$ levels and on short-iteration receivers.

Anti-Gray is rarely deployed in practice — the M16a labelling of [?chindapol-ritcey-2001] strictly dominates it for 16-QAM at all operating points — but it is a useful pedagogical labelling because its demapper curve LIES BETWEEN Gray and SP at every $I_A$ . If you plot all three on the same EXIT chart, the picture gives an immediate sense of the labelling-design continuum.

BER vs Iteration Count for Gray, SP, and Anti-Gray

BER of 16-QAM BICM-ID at a fixed $E_b/N_0$ , as a function of the outer iteration count $t = 0, 1, \ldots, 15$ , for the three labelings (Gray $\mu_G$ , set-partition $\mu_{\rm SP}$ , anti-Gray $\mu_{\rm aG}$ ). At low SNR, Gray leads at $t = 0$ but all three stall; at the SP convergence threshold, SP drops by 4 orders of magnitude in the first 5 iterations while Gray stalls; at high SNR all three eventually converge but SP reaches the error floor first. This plot is the direct operational complement to the EXIT chart of s03.

Parameters

SNR [dB]4

Labelling Comparison for 16-QAM: One-Shot BICM vs. BICM-ID

Labelling	$T_{\rm dem}(0, 5\text{ dB})$	$T_{\rm dem}(1, 5\text{ dB})$	BICM thresh. (1-shot)	BICM-ID thresh. (SP-style) + 8 iter	BER at $E_b/N_0 = 6$ dB, 10 iter
Gray $\mu_G$	0.86	0.82	5.5 dB	5.4 dB	$\sim 5 \times 10^{-4}$
Set Partition $\mu_{\rm SP}$	0.66	0.99	7.2 dB	4.7 dB	$\sim 2 \times 10^{-6}$
Anti-Gray $\mu_{\rm aG}$	0.78	0.91	6.1 dB	5.0 dB	$\sim 3 \times 10^{-5}$
M16a (Chindapol–Ritcey)	0.72	0.99	6.6 dB	4.5 dB	$\sim 8 \times 10^{-7}$
Natural binary $\mu_{\rm NB}$	0.58	0.76	8.1 dB	7.8 dB	$\sim 10^{-2}$ (stalls)

Example: EXIT Slope Comparison: SP vs. Gray at $I_A = 0.5$

For 16-QAM at $E_b/N_0 = 4$ dB, numerically evaluate the slopes $\partial T_{\rm dem}(I_A, \gamma)/\partial I_A$ at $I_A = 0.5$ for Gray $\mu_G$ and set-partition $\mu_{\rm SP}$ labelings, and interpret the difference.

Solution

Monte Carlo EXIT curve at $I_A = 0.5$

At $I_A = 0.5$ , solve $\sigma_A = J^{-1}(0.5) \approx 1.25$ . Generate $10^5$ Monte Carlo samples: bit $B$ , consistent-Gaussian a priori $\Lambda_A$ , channel observation $y$ , demapper LLR $\Lambda_E$ . Estimate $I(B; \Lambda_E)$ via histogram. Repeat at $I_A = 0.45$ and $I_A = 0.55$ to numerically differentiate.

Slopes

Numerical result: $\partial T_{\rm dem}^{\rm SP}/\partial I_A \approx 0.62$ , $\partial T_{\rm dem}^{\rm Gray}/\partial I_A \approx 0.18$ . SP's slope is more than three times Gray's at $I_A = 0.5$ .

Interpretation

The steep SP slope means each "up" step on the EXIT staircase increases $I_E$ by a large amount — the iteration makes rapid progress. Gray's flat curve means each up step is small, and the staircase hugs the diagonal. Consequently, SP opens a much wider tunnel between itself and the inverted decoder curve at middle $I_A$ values, giving a lower convergence threshold. The slope at $I_A = 1/2$ is a good one-number summary of "how much BICM-ID gain is there for this labelling." $\blacksquare$

🔧Engineering Note

When to Use Which Labelling

The operational advice from two decades of BICM/BICM-ID deployment: use Gray by default. If the receiver has no iterative-demapper capability, Gray is uniquely optimal; if it has iteration but can only afford 1–2 passes (low-latency control channels, URLLC), Gray still wins because the first iteration is the most productive. Switch to SP (or a BICM-ID-optimised labelling like M16a) ONLY when (i) the operating SNR is within 1 dB of capacity, where closing the last 0.3 dB matters; (ii) the receiver can afford $\geq 5$ iterations; and (iii) the outer code is strong (LDPC or turbo, not short convolutional). DVB-S2X's very-low-SNR MODCODs meet all three conditions and use BICM-ID with an APSK-SP hybrid; 5G NR data channels meet (i) and (iii) but rarely (ii), so they stay with Gray.

Practical Constraints

•
Default choice: Gray (optimal for 1-shot decoding and short iteration)
•
Switch to SP only when SNR is near capacity AND iterations ≥ 5
•
Short-block regimes (< 1000 bits): stay with Gray regardless
•
APSK on satellite: use SP-like labelling for VL-SNR, Gray elsewhere

📋 Ref: DVB-S2X Annex E; 5G NR TS 38.212 §5.4.2 (BICM, no mandatory iteration)

,

Why SP's Endpoint Is Near 1, Not Exactly 1

Why does SP's $T_{\rm dem}(1, \mathrm{SNR})$ approach but not quite equal 1 at finite SNR? Because even with perfect a priori on the other bits, the target bit is still observed through AWGN — its MI is bounded by the BI-AWGN capacity at the conditioned sub- constellation's minimum distance. At very high SNR that capacity approaches 1 bit; at $E_b/N_0 = 5$ dB for 16-QAM-SP the conditioned sub-constellation is two points at distance $2 d_{\min}$ (level-1 SP), and the per-bit MI is $J(8 d_{\min}/\sqrt{N_0}) \approx 0.99$ . So the curve ends at $\approx 0.99$ , not 1. The SHORTFALL from 1 is the remaining noise margin, and it determines the error floor of BICM-ID after tunnel opens: a BER-per-iteration plot hits a plateau governed by this shortfall, not by the outer code's distance structure.

Common Mistake: Extending "Gray is Optimal" to Iterative Receivers

Mistake:

The Ch. 5–6 conclusion — Gray labelling is near-optimal for BICM — is so well-established in the literature that many designers apply it reflexively to BICM-ID systems as well, on the grounds that "Gray is always the safe choice." On an iterative receiver at high SNR, this reflex gives up 1.0–1.5 dB relative to SP.

Correction:

The Gray-optimality theorem of Ch. 5 is a ONE-SHOT result; it does not survive the feedback loop of BICM-ID. Under iteration, the relevant demapper metric is not the one-shot MI $T_{\rm dem}(0, \mathrm{SNR})$ but the shape of the entire curve, and Gray's flat curve is its iterative DOWNFALL. When designing for BICM-ID, always plot the EXIT chart with the candidate labelings before committing; when designing for one-shot BICM, Gray is still the right default. This is a LABEL-AGNOSTIC receiver-architecture decision: pair the labelling to the decoder.

Quick Check

Why does set-partition labelling outperform Gray under BICM-ID?

SP has a smaller minimum Euclidean distance.

SP has a steeper demapper EXIT curve — the demapper benefits more per bit of a priori information.

SP uses more constellation points.

SP gives better one-shot BER.

Correction:

SP has a steeper demapper EXIT curve — the demapper benefits more per bit of a priori information.

Correct. SP concentrates bit discrimination across levels (Ungerboeck chain rule: level- $\ell$ sub-constellation distance is $2^\ell d_{\min}$ ), so learning one bit via iteration UNLOCKS larger distances for the others. Gray's distances are flat across levels, so iteration unlocks nothing.

Historical Note: Zehavi Thought SP Was Obsolete. BICM-ID Brought It Back.

1992–2001

Ephraim Zehavi's 1992 paper on bit-interleaved 8-PSK established that Gray beats SP for one-shot decoding on fading channels — the observation that Caire, Taricco, and Biglieri formalised into BICM in 1998. By the mid-1990s "Gray is always right" had become orthodoxy in the coded-modulation community, and set partitioning was relegated to legacy TCM designs (V.32, V.34 modems) that were being replaced by BICM. When Li and Ritcey showed in 1997 that iterative decoding with SP labelling could recover 2+ dB on Rayleigh fading, the initial reception was sceptical: "SP is obsolete." Ten Brink's EXIT-chart analysis (2001) then supplied the QUANTITATIVE explanation — SP's steep EXIT curve — and the community accepted that labelling-optimality is receiver-dependent. The story is a clean instance of Caire's "design-criterion depends on what the decoder can do" motif: change the decoder, and the optimal modulator changes too. SP did not come back because it had been undervalued; it came back because its design target (maximise sub-constellation minimum distance) aligned with a new decoder architecture (iterative with a priori).

,

Key Takeaway

The optimal labelling depends on the decoder. For one-shot BICM, maximise $T_{\rm dem}(0, \mathrm{SNR})$ — use Gray. For BICM-ID with many iterations, reshape the whole demapper EXIT curve — use set partition or a designed BICM-ID labelling (M16a family). The mechanism is that SP's Ungerboeck chain rule doubles sub- constellation minimum distance at each bit level, which translates into a steep demapper EXIT slope and a high endpoint $T_{\rm dem}(1, \mathrm{SNR}) \to 1$ . Gray has flat levels and a flat curve. Choose the labelling that matches the receiver's iteration budget.

Why This Matters: BICM-ID Labelling in Practice: DVB-S2X vs 5G NR

A concrete illustration of the Gray-vs-SP tradeoff appears in the BICM architectures of two major modern standards. DVB-S2X's very- low-SNR modes (QPSK 2/9 down to QPSK 1/5) operate within 0.5 dB of capacity on rain-faded satellite links; these modes specify an SP-inspired APSK labelling and a receiver that performs 3–8 iterations between the APSK demapper and the LDPC decoder, saving 0.3–0.5 dB over one-shot Gray. 5G NR, by contrast, targets throughput rather than coverage and operates 2–4 dB above capacity; NR's modulation spec mandates Gray labelling on 64-QAM and 256-QAM, and iterative demapping is OPTIONAL (most commercial chipsets do 1–2 passes as a CRC-failure fallback). The labelling choice is an engineering reflection of the operating regime: near capacity, iterate with SP; well above capacity, Gray with a stronger code.

Labelling with Iteration: Why SP Comes Back

Gray Won. Then Iteration Arrived.

Theorem: SP Has a Better Endpoint Than Gray: Tdem(1,SNR)T_{\rm dem}(1, \mathrm{SNR})Tdem​(1,SNR)

Step 1: Two-point reduction under perfect a priori

Step 2: SP maximises the separation

Step 3: Gray gives only minimum distance

Step 4: Average over bit positions

Definition: Anti-Gray Labelling μaG\mu_{\rm aG}μaG​

BER vs Iteration Count for Gray, SP, and Anti-Gray

Parameters

Labelling Comparison for 16-QAM: One-Shot BICM vs. BICM-ID

Example: EXIT Slope Comparison: SP vs. Gray at IA=0.5I_A = 0.5IA​=0.5

Monte Carlo EXIT curve at $I_A = 0.5$

Slopes

Interpretation

When to Use Which Labelling

Why SP's Endpoint Is Near 1, Not Exactly 1

Common Mistake: Extending "Gray is Optimal" to Iterative Receivers

Quick Check

Historical Note: Zehavi Thought SP Was Obsolete. BICM-ID Brought It Back.

Key Takeaway

Why This Matters: BICM-ID Labelling in Practice: DVB-S2X vs 5G NR

Theorem: SP Has a Better Endpoint Than Gray: $T_{\rm dem}(1, \mathrm{SNR})$

Definition:
Anti-Gray Labelling $\mu_{\rm aG}$

Example: EXIT Slope Comparison: SP vs. Gray at $I_A = 0.5$