Set Partitioning of Signal Constellations

Why Partition a Constellation?

Here is the point of this chapter in one sentence: at equal transmit power, we can buy 3 to 6 dB of coding gain over the uncoded signal by splitting each symbol decision into two steps β€” first pick a subset of the constellation using a convolutional code, then pick a point inside that subset using the remaining uncoded bits β€” provided the subsets are chosen so that the minimum distance inside each subset is much larger than the minimum distance of the parent constellation.

The procedure that achieves this is called set partitioning. It was introduced by Ungerboeck in 1982 and it revolutionized modem design overnight β€” the V.32 (9.6 kbps) and V.34 (33.6 kbps) dial-up modems were direct instantiations of his idea. Before attacking the machinery, let us look at the geometric picture: each split of a constellation doubles the minimum squared Euclidean distance inside the resulting subsets, and the coding gain is essentially the number of levels of splitting times 3 dB (a bit less, in practice, because the final convolutional code imposes its own structure).

The coded-modulation insight from Β§Why Separate Coding and Modulation Is Suboptimal said: do not code in the binary label space, code in the signal space. Set partitioning is the constructive answer β€” it tells us which binary labels to use so that a convolutional code acting on the subset index maximizes free Euclidean distance in the underlying signal space.

Definition:

Coded Modulation

A coded modulation scheme jointly designs channel coding and modulation so that the code lives in the signal space itself: each encoder output is a point in a constellation XβŠ‚C\mathcal{X} \subset \mathbb{C}, not a binary symbol in {0,1}\{0,1\}. The design objective is to maximize the minimum Euclidean distance between distinct code sequences, not the Hamming distance between distinct binary codewords.

The contrast is with the classical "coding then modulation" pipeline (a binary encoder whose output is passed through a separate constellation mapper), which is generally suboptimal at moderate-to-high spectral efficiencies β€” as we argued in Β§Why Separate Coding and Modulation Is Suboptimal.

,

Definition:

Set Partition (Ungerboeck Chain)

Let X\mathcal{X} be a signal constellation with ∣X∣=2m|\mathcal{X}| = 2^m points and minimum squared distance Ξ”02\Delta_0^2. A set partition (or Ungerboeck chain) is a sequence of successive binary splits

X=D0β€…β€Šβ†’β€…β€Š{D1(0),D1(1)}β€…β€Šβ†’β€…β€Š{D2(0),D2(1),D2(2),D2(3)}β€…β€Šβ†’β€…β€Šβ‹―β€…β€Šβ†’β€…β€Š{Dm(0),…,Dm(2mβˆ’1)},\mathcal{X} = D_0 \;\to\; \{D_1^{(0)}, D_1^{(1)}\} \;\to\; \{D_2^{(0)}, D_2^{(1)}, D_2^{(2)}, D_2^{(3)}\} \;\to\; \cdots \;\to\; \{D_m^{(0)}, \ldots, D_m^{(2^m-1)}\},

where at level ii the parent set Diβˆ’1D_{i-1} is split into two disjoint equal-size children, and Ξ”i2\Delta_i^2 denotes the minimum squared Euclidean distance within any child at level ii (the intra-subset minimum squared Euclidean distance, or MSED).

A set partition is Ungerboeck-optimal if at every level ii the split is chosen so that Ξ”i2\Delta_i^2 is as large as possible.

Equivalently, level ii labels a point by the first ii bits of a binary address; points sharing an ii-bit prefix lie in the same subset Di(k)D_i^{(k)}. At level mm every subset contains a single point and Ξ”m2=∞\Delta_m^2 = \infty by convention.

Definition:

Intra-Subset Minimum Squared Euclidean Distance

For a subset DβŠ†XD \subseteq \mathcal{X} with ∣D∣β‰₯2|D| \geq 2, the intra-subset MSED is

Ξ”2(D)β€…β€Šβ‰œβ€…β€Šmin⁑x,xβ€²βˆˆDxβ‰ xβ€²βˆ₯xβˆ’xβ€²βˆ₯2.\Delta^2(D) \;\triangleq\; \min_{\substack{x, x' \in D \\ x \neq x'}} \|x - x'\|^2.

For a full partition level ii, Ξ”i2\Delta_i^2 denotes the smallest Ξ”2(D)\Delta^2(D) taken over all subsets DD at that level.

Intra-subset MSED is the only constellation property that matters for Ungerboeck's design: the convolutional code only picks which subset to transmit from, so the relevant distance between any two competing TCM sequences is governed by the intra-subset MSEDs along the two paths.

Definition:

Coset (Subset) of a Partition

A coset (or subset) at level ii of an Ungerboeck partition is one of the 2i2^i disjoint sets Di(k)D_i^{(k)}, k=0,…,2iβˆ’1k = 0, \ldots, 2^i - 1, produced by ii successive binary splits. Each coset has ∣X∣/2i=2mβˆ’i|\mathcal{X}|/2^i = 2^{m-i} points and intra-subset MSED Ξ”i2\Delta_i^2.

The word "coset" is borrowed from lattice terminology: Forney's lattice-theoretic reinterpretation of TCM (Chapter 4) shows that an Ungerboeck partition is, for 2D QAM constellations, literally a chain of sublattice cosets of Z2\mathbb{Z}^2. For now, "coset" and "subset" are synonymous.

Definition:

Mapping by Set Partitioning (Ungerboeck Labeling)

A binary labeling bmβˆ’1bmβˆ’2β‹―b0↦x∈Xb_{m-1} b_{m-2} \cdots b_0 \mapsto x \in \mathcal{X} is a mapping by set partitioning if the bit bmβˆ’1b_{m-1} (highest-order) selects the first split, bmβˆ’2b_{m-2} selects the second split given the first, and so on β€” i.e., the prefix bmβˆ’1β‹―bmβˆ’ib_{m-1} \cdots b_{m-i} identifies the coset Di(k)D_i^{(k)} containing xx.

Under this labeling, flipping a low-order bit bjb_j moves the point to a different representative inside the same level-(j+1)(j+1) subset, so the resulting Euclidean displacement is β‰₯Ξ”j+1\geq \Delta_{j+1}.

Set Partition

A sequence of successive binary splits of a signal constellation in which each split is chosen to maximize the minimum Euclidean distance inside the resulting children. Introduced by Ungerboeck (1982), the set partition is the constellation half of a TCM design; the convolutional code is the sequencing half.

Related: Set Partition (Ungerboeck Chain), Intra-Subset Minimum Squared Euclidean Distance

Intra-Subset Minimum Distance

The minimum Euclidean distance between any two points in a given subset of a partition. For Ungerboeck's partition of 8-PSK: level 1 gives Ξ”12=2\Delta_1^2 = 2, level 2 gives Ξ”22=4\Delta_2^2 = 4 (diametrically opposite pairs), so the MSED doubles at each level.

Related: Intra-Subset Minimum Squared Euclidean Distance, Set Partition (Ungerboeck Chain)

Coset (in Set Partitioning)

A subset of a constellation that arises at some level of an Ungerboeck partition. For a partition of X\mathcal{X} with ∣X∣=2m|\mathcal{X}|=2^m into 2i2^i cosets at level ii, each coset has 2mβˆ’i2^{m-i} points. The terminology matches Forney's lattice-theoretic reinterpretation (Chapter 4) in which cosets of sublattices are the Ungerboeck subsets for QAM constellations.

Related: Coset (Subset) of a Partition, Set Partition (Ungerboeck Chain)

Theorem: MSED Doubling Along the 8-PSK Ungerboeck Partition

Consider the 8-PSK constellation X={ej2Ο€k/8:k=0,…,7}\mathcal{X} = \{e^{j 2\pi k/8} : k = 0, \ldots, 7\} (unit-energy, Es=1E_s = 1) with minimum squared distance Ξ”02=2(1βˆ’cos⁑(Ο€/4))=2βˆ’2\Delta_0^2 = 2(1 - \cos(\pi/4)) = 2 - \sqrt{2}. Partition X\mathcal{X} in three levels by successively taking every other point. Then

Ξ”02=2βˆ’2β‰ˆ0.586,Ξ”12=2,Ξ”22=4,\Delta_0^2 = 2 - \sqrt{2} \approx 0.586, \qquad \Delta_1^2 = 2, \qquad \Delta_2^2 = 4,

so Ξ”12/Ξ”02=2+2β‰ˆ3.414\Delta_1^2 / \Delta_0^2 = 2 + \sqrt{2} \approx 3.414 and Ξ”22/Ξ”12=2\Delta_2^2 / \Delta_1^2 = 2. Equivalently, at each level the intra-subset MSED at least doubles (in fact the first split more than triples it).

The 8-PSK points are the 8th roots of unity. Taking every other one gives a rotated QPSK (4 points on the unit circle, 90∘90^\circ apart, so Ξ”12=2\Delta_1^2 = 2). Taking every other point of the QPSK gives a BPSK (2 diametrically opposite points, Ξ”22=4\Delta_2^2 = 4). Each split discards the nearest neighbours and keeps the farthest ones.

,

Ungerboeck Set-Partition Tree for 8-PSK

Animated construction of the 2-4-8 partition tree. At each level the 8-PSK constellation is split into two equal halves, and the intra-subset minimum squared distance Ξ”i2\Delta_i^2 is displayed. Watch Ξ”i2\Delta_i^2 double (in fact more than double at the first step) as ii increases.
At each split, the nearest-neighbour pairs are discarded β€” the intra-subset MSED grows from Ξ”02β‰ˆ0.586\Delta_0^2 \approx 0.586 to Ξ”12=2\Delta_1^2 = 2 to Ξ”22=4\Delta_2^2 = 4.

Intra-Subset MSED at Each Partition Level

For each level ii of the Ungerboeck partition of a chosen constellation, plot the intra-subset minimum squared Euclidean distance Ξ”i2\Delta_i^2. Try 8-PSK, 16-QAM, and 16-PSK and observe the geometry: for 2D QAM the ratios Ξ”i+12/Ξ”i2=2\Delta_{i+1}^2/\Delta_i^2 = 2 hold essentially throughout, while for MM-PSK the first split is asymmetric because adjacent and antipodal pairs are geometrically different.

Parameters

Example: Ungerboeck Partition of 16-QAM

Let X\mathcal{X} be the standard 16-QAM constellation {a+jb:a,b∈{Β±1,Β±3}}\{a + j b : a, b \in \{\pm 1, \pm 3\}\} with minimum inter-point distance d0=2d_0 = 2 (so Ξ”02=4\Delta_0^2 = 4). Construct the four-level Ungerboeck partition and report Ξ”02,Ξ”12,Ξ”22,Ξ”32\Delta_0^2, \Delta_1^2, \Delta_2^2, \Delta_3^2.

,

Historical Note: Ungerboeck 1982 β€” The Paper That Changed Modem Design

1982–1994

Gottfried Ungerboeck (IBM ZΓΌrich, later Broadcom Fellow) published "Channel Coding with Multilevel/Phase Signals" in IEEE Transactions on Information Theory in January 1982. The paper was met with initial skepticism β€” the prevailing view was that coding gain on AWGN was essentially capped at the 2–3 dB obtainable from short binary block codes, and that any further gain required expensive bandwidth expansion.

Ungerboeck's insight was geometric rather than algebraic: instead of coding in the binary message space, code in the signal space by labeling constellation points via a partition tree. His central claim β€” 3 to 6 dB of coding gain without any bandwidth expansion (the same symbol rate, the same constellation size chosen to compensate for the rate penalty of the code) β€” was verified experimentally almost immediately. By 1984 the V.32 modem standard (9.6 kbps over a 3 kHz telephone line) had adopted an 8-state 32-CROSS TCM; by 1994 the V.34 modem (33.6 kbps) used a 4D 16-state TCM with shell-mapping shaping. The V.34 modem was, for a decade, the most sophisticated digital communications system in consumer use.

Two follow-up papers in IEEE Communications Magazine (1987) made the ideas accessible to a much broader audience and are the most-cited Ungerboeck works today.

, ,

Common Mistake: A Constellation Admits Many Set Partitions β€” Not All Are Optimal

Mistake:

Assuming that any binary labeling of a constellation yields an Ungerboeck-optimal partition.

Correction:

A partition is Ungerboeck-optimal only if at every level the split maximizes the intra-subset MSED. For 8-PSK, the parity-by-angle split is optimal; splitting by "top half vs. bottom half" is not β€” it yields Ξ”12=2βˆ’2\Delta_1^2 = 2 - \sqrt{2}, no better than the parent. For 16-QAM, the checkerboard (parity of a+ba+b) is optimal; splitting by aβ‰₯0a \geq 0 vs. a<0a < 0 is not. In practice, construct the partition by always discarding nearest neighbours at each level; this gives the unique Ungerboeck chain up to symmetry.

Quick Check

For the Ungerboeck partition of unit-energy 8-PSK, what is the ratio Ξ”22/Ξ”02\Delta_2^2 / \Delta_0^2?

22

44

4/(2βˆ’2)β‰ˆ6.834/(2 - \sqrt{2}) \approx 6.83

88

Quick Check

Which of the following binary labelings of QPSK is an Ungerboeck labeling (mapping by set partitioning) rather than a Gray labeling?

(00,01,11,10)↔(1,j,βˆ’1,βˆ’j)(00,01,11,10) \leftrightarrow (1, j, -1, -j)

(00,10,11,01)↔(1,j,βˆ’1,βˆ’j)(00,10,11,01) \leftrightarrow (1, j, -1, -j)

(00,11,01,10)↔(1,j,βˆ’1,βˆ’j)(00,11,01,10) \leftrightarrow (1, j, -1, -j)

(01,10,00,11)↔(1,j,βˆ’1,βˆ’j)(01,10,00,11) \leftrightarrow (1, j, -1, -j)

Key Takeaway

The partition doubles the MSED. Each level of an Ungerboeck partition at least doubles (and in the first 8-PSK level more than triples) the intra-subset minimum squared distance. A TCM scheme that uses a rate-m~/(m~+1)\tilde{m}/(\tilde{m}+1) convolutional code to select a level-(m~+1)(\tilde{m}+1) coset inherits this distance amplification: the free Euclidean distance of the code is at worst Ξ”m~+1\Delta_{\tilde{m}+1}, regardless of how bad the convolutional code is. The coding gain is bought entirely by the partition geometry, not by the code.