Ferkans — Interactive Telecom Tutor

From Distance to Design Criterion

We have seen that the gap to capacity in the bandwidth-limited regime is mostly a coding deficit. The question is: how do we design the constellation points $\mathcal{X} \subset \mathbb{R}^N$ to recover it?

The answer, at high SNR, comes from the pairwise error probability: the probability that the receiver chooses a wrong codeword $\hat{\mathbf{x}}$ when the true one is $\mathbf{x}$ . This probability depends only on the Euclidean distance $\|\mathbf{x} - \hat{\mathbf{x}}\|$ and the noise variance — not on the structure of the constellation. Summing over all wrong codewords via the union bound then translates a geometric property of the point set into an asymptotic error exponent. The design criterion that falls out is the starting point for Ungerboeck's TCM in Chapter 2 and for the space-time coding criteria in Chapter 10.

,

Definition:
Pairwise Error Probability on AWGN

Let $\mathbf{x}, \hat{\mathbf{x}} \in \mathbb{R}^N$ be two candidate codewords and let $\mathbf{y} = \mathbf{x} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, (N_0/2) \mathbf{I}_N)$ . The pairwise error probability (PEP) is the probability that the maximum-likelihood receiver, given the choice between $\mathbf{x}$ and $\hat{\mathbf{x}}$ , prefers $\hat{\mathbf{x}}$ :

$P(\mathbf{x} \to \hat{\mathbf{x}}) \;=\; \Pr\{\|\mathbf{y} - \hat{\mathbf{x}}\|^2 \le \|\mathbf{y} - \mathbf{x}\|^2\} \;=\; Q\!\left(\frac{\|\boldsymbol{\Delta}\|}{2 \sigma}\right),$

where $\boldsymbol{\Delta} = \mathbf{x} - \hat{\mathbf{x}}$ is the error vector and $\sigma = \sqrt{N_0/2}$ . Equivalently,

$P(\mathbf{x} \to \hat{\mathbf{x}}) \;=\; Q\!\left(\sqrt{\frac{\|\boldsymbol{\Delta}\|^2}{2N_0}}\right).$

The PEP depends on $\mathbf{x}$ and $\hat{\mathbf{x}}$ only through $\|\boldsymbol{\Delta}\|$ . This is a distinctive feature of the AWGN channel: the direction of the error vector is irrelevant, and only its length matters. Over fading channels this is no longer true, and the PEP will acquire additional structure (see Chapter 10).

,

Theorem: Pairwise Error Probability on the AWGN Channel

For $\mathbf{x}, \hat{\mathbf{x}} \in \mathbb{R}^N$ transmitted over the AWGN channel with noise variance $\sigma^2 = N_0/2$ per real dimension, the ML pairwise error probability is

$P(\mathbf{x} \to \hat{\mathbf{x}}) \;=\; Q\!\!\left(\frac{\|\boldsymbol{\Delta}\|}{2\sigma}\right) \;\le\; \tfrac{1}{2} \exp\!\left(-\frac{\|\boldsymbol{\Delta}\|^2}{8\sigma^2}\right) \;=\; \tfrac{1}{2} \exp\!\left(-\frac{\|\boldsymbol{\Delta}\|^2}{4 N_0}\right),$

where the upper bound uses the Chernoff inequality $Q(x) \le \tfrac{1}{2} e^{-x^2/2}$ .

Project the noise onto the direction $\boldsymbol{\Delta}$ : the component along $\boldsymbol{\Delta}$ is zero-mean Gaussian with variance $\sigma^2$ . The ML detector picks $\hat{\mathbf{x}}$ when this component exceeds $\|\boldsymbol{\Delta}\|/2$ — a one-dimensional Gaussian tail. This is why only $\|\boldsymbol{\Delta}\|$ enters the PEP: the $(N-1)$ orthogonal noise components cannot cause the pair error.

Show Hint

Project the received vector onto the direction $\boldsymbol{\Delta} / \|\boldsymbol{\Delta}\|$ .

Show that the ML decision between $\mathbf{x}$ and $\hat{\mathbf{x}}$ depends only on this projection.

Use the fact that the projection of a spherically symmetric Gaussian onto any unit vector is a scalar Gaussian of the same variance.

Proof

Define the decision statistic

The ML detector chooses $\hat{\mathbf{x}}$ over $\mathbf{x}$ iff $\|\mathbf{y} - \hat{\mathbf{x}}\|^2 < \|\mathbf{y} - \mathbf{x}\|^2$ . Expanding both sides:

$\|\mathbf{y}\|^2 - 2\mathbf{y}^T \hat{\mathbf{x}} + \|\hat{\mathbf{x}}\|^2 \;<\; \|\mathbf{y}\|^2 - 2\mathbf{y}^T \mathbf{x} + \|\mathbf{x}\|^2,$

which simplifies to $2\mathbf{y}^T (\mathbf{x} - \hat{\mathbf{x}}) < \|\mathbf{x}\|^2 - \|\hat{\mathbf{x}}\|^2$ , or $2\mathbf{y}^T \boldsymbol{\Delta} < \|\mathbf{x}\|^2 - \|\hat{\mathbf{x}}\|^2$ .

Substitute $\mathbf{y} = \mathbf{x} + \ntn{noise}$

Plugging in and using $\|\mathbf{x}\|^2 - \|\hat{\mathbf{x}}\|^2 = \|\boldsymbol{\Delta}\|^2 + 2 \hat{\mathbf{x}}^T\boldsymbol{\Delta}$ (after expanding $\|\boldsymbol{\Delta}\|^2 = (\mathbf{x} - \hat{\mathbf{x}})^T \boldsymbol{\Delta}$ ), the condition becomes

$2 (\mathbf{x} + \mathbf{w})^T \boldsymbol{\Delta} \;<\; \|\boldsymbol{\Delta}\|^2 + 2 \hat{\mathbf{x}}^T \boldsymbol{\Delta},$

i.e., $2 \mathbf{w}^{T} \boldsymbol{\Delta} < -\|\boldsymbol{\Delta}\|^2$ , or equivalently $\mathbf{w}^{T} \boldsymbol{\Delta} < -\tfrac{1}{2}\|\boldsymbol{\Delta}\|^2$ .

Evaluate the Gaussian tail

The scalar $\mathbf{w}^{T} \boldsymbol{\Delta}$ is a linear combination of independent $\mathcal{N}(0, \sigma^2)$ variables, hence is itself $\mathcal{N}(0, \sigma^2 \|\boldsymbol{\Delta}\|^2)$ . Therefore

$P(\mathbf{x} \to \hat{\mathbf{x}}) = \Pr\!\!\left\{ \frac{\mathbf{w}^{T} \boldsymbol{\Delta}}{\sigma \|\boldsymbol{\Delta}\|} < -\frac{\|\boldsymbol{\Delta}\|}{2\sigma} \right\} = Q\!\!\left(\frac{\|\boldsymbol{\Delta}\|}{2\sigma}\right).$

Applying the Chernoff bound $Q(x) \le \tfrac{1}{2} e^{-x^2/2}$ gives the stated exponential form. $\blacksquare$

,

Pairwise Geometry: Decision Boundary and Noise Circle

Two constellation points $\mathbf{x}, \hat{\mathbf{x}}$ at Euclidean distance $d$ in $\mathbb{R}^2$ , with a noise cloud centered on the transmitted point. The decision boundary is the perpendicular bisector; a pair error occurs whenever the noise pushes $\mathbf{y}$ across it. As the distance $d$ grows, the boundary moves away and the tail probability $Q(d / 2 \sigma)$ shrinks exponentially.

Parameters

Distance

d

2

E_s/N_0

[dB]8

Definition:
Minimum Distance and Asymptotic Coding Gain

Let $\mathcal{X} \subset \mathbb{R}^N$ be a finite constellation of $M$ equiprobable signal vectors of average energy $E_s$ per 2D dimension. Its minimum Euclidean distance is

$d_{\rm E, \min}^2(\mathcal{X}) \;=\; \min_{\substack{\mathbf{x}, \hat{\mathbf{x}} \in \mathcal{X} \\ \mathbf{x} \ne \hat{\mathbf{x}}}} \|\mathbf{x} - \hat{\mathbf{x}}\|^2.$

The normalized minimum distance is $d_{\rm E, \min}^2 / E_s$ , which is invariant under scaling of the constellation. The asymptotic coding gain of $\mathcal{X}$ over a baseline constellation $\mathcal{X}_{\rm ref}$ of the same spectral efficiency $\eta$ is

$\gamma_c^{\rm asy}(\mathcal{X}, \mathcal{X}_{\rm ref}) \;=\; 10 \log_{10} \frac{d_{\rm E, \min}^2(\mathcal{X}) / E_s(\mathcal{X})} {d_{\rm E, \min}^2(\mathcal{X}_{\rm ref}) / E_s(\mathcal{X}_{\rm ref})} \quad [\text{dB}].$

The asymptotic coding gain depends on the constellation only through the ratio $d_{\rm E,\min}^2 / E_s$ . Rescaling the constellation does not change $\gamma_c^{\rm asy}$ — this is a sensible invariance: if you double every point, you double the distances and quadruple the energy, leaving the ratio alone.

,

Theorem: Union Bound on Error Probability

For a constellation $\mathcal{X}$ transmitted over AWGN with ML decoding, the codeword-error probability satisfies the union bound

$P_e \;\le\; \frac{1}{M} \sum_{\mathbf{x} \in \mathcal{X}} \sum_{\hat{\mathbf{x}} \ne \mathbf{x}} Q\!\!\left(\frac{\|\mathbf{x} - \hat{\mathbf{x}}\|}{2\sigma}\right).$

At high SNR the sum is dominated by pairs at minimum distance. Let $K_{\min}$ be the average multiplicity of nearest neighbors (per codeword). Then

$P_e \;\approx\; K_{\min}\, Q\!\!\left(\frac{d_{\rm E, \min}}{2\sigma}\right) \;=\; K_{\min}\, Q\!\!\left(\sqrt{\frac{d_{\rm E, \min}^2}{2 N_0}}\right),$

which makes $d_{\rm E, \min}^2$ the dominant design parameter at high SNR.

Every wrong codeword contributes a term in the union bound, but at high SNR the smallest distances dominate because the $Q$ function is exponentially decreasing. The minimum-distance term has the largest multiplier $K_{\min}$ per codeword, and everything else is exponentially smaller. At low SNR the full distance spectrum matters, and the union bound can be quite loose — a recurring theme in code performance analysis.

Show Hint

For each transmitted codeword, bound the probability that the ML detector picks any incorrect codeword by the union bound over all incorrect choices.

Average over the uniform choice of transmitted codeword.

At high SNR, argue that the dominant terms are those with $\|\boldsymbol{\Delta}\| = d_{\rm E, \min}$ and identify $K_{\min}$ .

Proof

Per-codeword union bound

Conditioned on $\mathbf{x}$ transmitted, a codeword error occurs iff the ML detector picks some $\hat{\mathbf{x}} \ne \mathbf{x}$ . The union bound gives

$\Pr\{\text{error} \mid \mathbf{x}\} \;\le\; \sum_{\hat{\mathbf{x}} \ne \mathbf{x}} P(\mathbf{x} \to \hat{\mathbf{x}}) \;=\; \sum_{\hat{\mathbf{x}} \ne \mathbf{x}} Q\!\!\left(\frac{\|\mathbf{x}-\hat{\mathbf{x}}\|}{2\sigma}\right).$

Average over equiprobable codewords

Averaging over $\mathbf{x}$ uniformly over $\mathcal{X}$ gives

$P_e \;=\; \mathbb{E}_{\mathbf{x}}\, \Pr\{\text{error} \mid \mathbf{x}\} \;\le\; \frac{1}{M} \sum_{\mathbf{x}} \sum_{\hat{\mathbf{x}} \ne \mathbf{x}} Q\!\!\left(\frac{\|\mathbf{x}-\hat{\mathbf{x}}\|}{2\sigma}\right).$

High-SNR behavior

Split the double sum into contributions from pairs at distance $d_{\rm E,\min}$ (call this count $K_{\min} M$ , so that the per-codeword average is $K_{\min}$ ) and pairs at larger distances. The ratio of a $d^2$ -term to the $d_{\rm E,\min}^2$ -term under the Chernoff bound is $\exp(-(d^2 - d_{\rm E,\min}^2)/(4 N_0))$ , which vanishes exponentially as $N_0 \to 0$ . Therefore at high SNR

$P_e \;\approx\; K_{\min}\, Q\!\!\left(\frac{d_{\rm E,\min}}{2\sigma}\right),$

which is the stated asymptotic. $\blacksquare$

,

Example: BPSK and QPSK Have the Same Asymptotic Coding Gain

Compute $d_{\rm E,\min}^2 / E_s$ for BPSK (points $\pm 1$ ) and QPSK (points $(\pm 1, \pm 1)/\sqrt{2}$ ). Interpret the result.

Solution

BPSK

BPSK has two points at $\pm 1$ , so $d_{\rm E,\min}^2 = 4$ and $E_s = 1$ . Hence $d_{\rm E,\min}^2 / E_s = 4$ .

QPSK

QPSK has four points at $(\pm 1, \pm 1)/\sqrt{2}$ . The minimum distance is between adjacent points (e.g., $(1,1)/\sqrt{2}$ and $(1,-1)/\sqrt{2}$ ), which is $\sqrt{2}$ so $d_{\rm E,\min}^2 = 2$ . The energy is $E_s = 1$ (both coordinates squared summed). Hence $d_{\rm E,\min}^2 / E_s = 2$ .

Normalize by dimension

BPSK uses one real dimension per symbol with $d_{\rm E,\min}^2/E_s = 4$ . QPSK uses two real dimensions with $d_{\rm E,\min}^2/E_s = 2$ , which is $4$ per real dimension. Normalizing by dimension, both schemes have the same coding gain per real dimension — which is why the $P_b$ -vs- $E_b/N_0$ curves of BPSK and QPSK coincide at high SNR. QPSK just packs two BPSK signals orthogonally onto one complex dimension; it is the same code geometry, doubled up.

Brute-Force Minimum Distance Computation

Complexity:

O(M^2 N)

for a constellation of

M

points in

\mathbb{R}^N

Input: Constellation

\mathcal{X} = \{\mathbf{x}_1, \ldots, \mathbf{x}_M\} \subset \mathbb{R}^N

Output:

d_{\rm E,\min}^2

and multiplicity

K_{\min}

1.

d_{\min}^2 \leftarrow +\infty

2. for

i = 1, \ldots, M - 1

do

3.

\quad

for

j = i + 1, \ldots, M

do

4.

\quad\quad d^2 \leftarrow \|\mathbf{x}_i - \mathbf{x}_j\|^2

5.

\quad\quad

if

d^2 < d_{\min}^2

then

6.

\quad\quad\quad d_{\min}^2 \leftarrow d^2

;

\ K \leftarrow 1

7.

\quad\quad

else if

d^2 = d_{\min}^2

then

8.

\quad\quad\quad K \leftarrow K + 1

9.

\quad\quad

end if

10.

\quad

end for

11. end for

12.

K_{\min} \leftarrow 2 K / M \quad

(average per codeword, double-counted)

13. return

(d_{\min}^2, K_{\min})

For small constellations ( $M \leq 1024$ ) this is fine. For larger constellations, structured codes (lattices, trellis codes) admit decomposition-based algorithms that bypass the quadratic cost.

Hamming Distance ≠ Euclidean Distance

Binary coding is dominated by the Hamming distance $d_H$ of the code: the minimum number of coordinate positions in which two codewords differ. Coded modulation is dominated by the Euclidean distance $d_{\rm E, \min}$ of the transmitted constellation: the minimum $\ell_2$ distance between two transmitted signal vectors.

These are not the same thing. Two binary codewords differing in many positions may map to two QAM sequences that happen to be close in signal space, if the QAM labeling is poorly chosen. The Ungerboeck insight (Chapter 2) is precisely that the code must be designed to maximize $d_{\rm E,\min}$ of the transmitted signal, not $d_H$ of the binary label — and that this requires a coordinated choice of code and labeling.

,

Common Mistake: Union bound is loose at low SNR

Mistake:

Assuming that the union-bound expression $P_e \approx K_{\min} Q(d_{\rm E,\min}/(2\sigma))$ is accurate at all SNRs.

Correction:

The union bound is asymptotically tight as SNR $\to \infty$ . At low SNR (where the error probability is large), the bound can exceed 1, and the full distance spectrum contributes meaningfully. Improved bounds (Gallager, Poltyrev, Divsalar) exist but are more intricate; in practice the first-order union-bound expression guides design but must be validated by simulation at the operating SNR of interest.

🔧Engineering Note

The Role of $d_{\rm E, \min}$ in Modern Standards

Design tables for DVB-S2X, 5G NR LDPC codes, and Wi-Fi 6/7 explicitly report the minimum Euclidean distance between coded QAM sequences, not just the minimum Hamming distance of the binary LDPC code. For high-rate QAM constellations, the effective Euclidean distance multiplier of the code-plus-labeling combination is the quantity that sets the operating SNR. Gray labeling achieves close to the best effective $d_{\rm E,\min}$ among binary labelings; set-partitioning labeling (Ungerboeck) does better when coupled with a matched code.

Practical Constraints

•
5G NR uses QPSK / 16-QAM / 64-QAM / 256-QAM with Gray labels in the base tables; 1024-QAM was added in Rel-17 for IAB
•
DVB-S2X uses APSK with specially-designed labels that are approximately Gray on the amplitude bits and rotationally structured on the phase bits
•
LDPC designers tune the degree distribution to match a given QAM order so as to maximize effective $d_{\rm E, \min}$ at the target error rate

Quick Check

If you double the minimum Euclidean distance of a constellation at fixed average energy, by roughly how many dB does the high-SNR error probability improve, asymptotically?

3 dB

6 dB

10 dB

It depends on the multiplicity $K_{\min}$ .

Correction:

6 dB

Pairwise error probability is $\approx Q(d_{\rm E,\min}/(2\sigma))$ . Doubling $d_{\rm E,\min}$ quadruples its square, which is a 6 dB improvement in the effective SNR. This is why every factor of $\sqrt{2}$ in minimum distance is called "3 dB of coding gain" and every factor of 2 is "6 dB."

Key Takeaway

At high SNR, the minimum Euclidean distance is the design parameter. The pairwise error probability depends only on $\|\boldsymbol{\Delta}\|$ , and the union bound shows that $P_e \approx K_{\min} Q(d_{\rm E,\min}/(2\sigma))$ . The coding-gain design criterion on AWGN is therefore: maximize $d_{\rm E,\min}^2 / E_s$ at the target spectral efficiency. This is the criterion that Ungerboeck's TCM optimizes; it will generalize to a determinant-and-rank criterion on fading MIMO channels (Chapter 10).

Pairwise error probability (PEP)

The probability that the ML detector, given a binary choice between two candidate codewords, picks the wrong one under AWGN: $P(\mathbf{x} \to \hat{\mathbf{x}}) = Q(\|\boldsymbol{\Delta}\|/(2\sigma))$ , where $\boldsymbol{\Delta}$ is the error vector.

Minimum Euclidean distance

The smallest $\ell_2$ distance between any two distinct signal vectors in a finite constellation. Denoted $d_{\rm E,\min}$ ; the normalized quantity $d_{\rm E,\min}^2 / E_s$ determines asymptotic coding gain.

Historical Note: Massey's 1974 Zürich Paper — The Seeds of Coded Modulation

1974

At the 1974 Zürich Seminar on Digital Communications, James Massey gave a talk that proved seminal: he argued that the conventional separation of coding (over a binary channel) from modulation (over a Gaussian channel) was responsible for a large part of the gap to capacity in bandwidth-limited systems, and that a joint design of code and modulator could close much of that gap. The argument rested on the observation that the Hamming-distance criterion of classical coding theory was the wrong criterion when the downstream channel was AWGN with a higher-order constellation. This insight took eight years to become an engineering reality — Ungerboeck's TCM paper in 1982 — but Massey's talk is the moment the coded-modulation program was first announced.

Why This Matters: On AWGN We Optimize $d_{\min}^2$ ; On Fading We Will Optimize Rank and Determinant

On AWGN, the PEP is $Q(d_{\rm E,\min} / (2\sigma))$ : one number, the minimum distance. On a Rayleigh fading MIMO channel, the PEP depends on the full singular-value spectrum of the error matrix $\boldsymbol{\Delta}$ , and the analog of $d_{\rm E,\min}$ is a pair of numbers: the rank of $\boldsymbol{\Delta}$ (diversity order) and the product of its nonzero eigenvalues (coding gain). Chapter 10 will rederive the fading PEP from scratch; the structural parallel with what we have just done is worth keeping in mind.

Pairwise Error Probability and the Coding-Gain Criterion