Ferkans — Interactive Telecom Tutor

From Capacity to Error Probability

Chapter 5 told us what rate a BICM transmitter can support — the BICM capacity $C_{\rm BICM}(\mu)$ . The punchline there was that with a Gray labelling, $C_{\rm BICM}$ sits within a few tenths of a bit of the CM capacity $C_{\rm CM}$ over essentially every SNR of practical interest. Capacity, however, is only half of the design story. What the hardware engineer actually has to hit is a target bit-error rate — $10^{-5}$ for broadband data, $10^{-3}$ for voice, perhaps $10^{-9}$ for optical backhaul — at some specified SNR. Two BICM systems with the same capacity can reach those targets at wildly different SNRs, depending on how the labelling, the binary code, and the interleaver interact.

We now turn to the error-probability side of the Caire–Taricco–Biglieri analysis. The backbone of this analysis is the pairwise error probability (PEP): the probability that a maximum-likelihood decoder prefers a wrong codeword $\hat{\mathbf{c}}$ over the transmitted one $\mathbf{c}$ when those are the only two candidates. From the PEP we assemble union bounds on the codeword and bit error probabilities. The technique — Chernoff bounding applied to the log-likelihood metric — is the same one we used for TCM in Ch. 2. The novelty is that for BICM, the metric is a BIT metric, not a symbol metric, and that is what makes the labelling $\mu$ play a role through the average squared distance rather than the minimum distance.

Definition:
BICM Codeword Pair and Hamming Distance

Let $\mathbf{c} = (c_1, \ldots, c_K) \in \{0,1\}^K$ and $\hat{\mathbf{c}} = (\hat c_1, \ldots, \hat c_K) \in \{0,1\}^K$ be two distinct BICM codewords (at the output of the binary encoder, before mapping). Their Hamming distance is $d_H(\mathbf{c}, \hat{\mathbf{c}}) \triangleq |\{i : c_i \ne \hat c_i\}|,$ the number of bit positions at which they disagree. For a binary linear code, the distribution of $d_H$ over all nonzero codewords is captured by the weight enumerator $W(z) = \sum_{d \ge 1} W_d z^d$ , where $W_d$ is the number of codewords of Hamming weight $d$ . The smallest $d$ with $W_d > 0$ is the minimum Hamming distance $d_{H,\min}$ (or simply $d_H$ when the context is clear).

The interleaver $\pi$ permutes $\mathbf{c}$ before mapping, so the two codewords at the encoder output correspond to two constellation-symbol sequences at the mapper output. For the error-probability analysis we model the interleaver as placing the $d_H$ differing bit positions at $d_H$ DIFFERENT symbol times — the fully-interleaved assumption of s03.

,

Definition:
Pairwise Error Probability

Given a channel and a maximum-likelihood (ML) decoder, the pairwise error probability (PEP) between two codewords $\mathbf{c}$ and $\hat{\mathbf{c}}$ is $P(\mathbf{c} \to \hat{\mathbf{c}}) \triangleq \Pr\!\left(\text{ML decoder prefers } \hat{\mathbf{c}} \text{ over } \mathbf{c} \;\middle|\; \mathbf{c} \text{ sent}\right).$ The codeword error probability $P_e$ is upper-bounded by the union over all competing codewords, $P_e \le \sum_{\hat{\mathbf{c}} \ne \mathbf{c}} P(\mathbf{c} \to \hat{\mathbf{c}}),$ and the bit error probability $P_b$ is upper-bounded by the same sum, weighted by the number of input-information-bit errors each PEP event causes.

The PEP is a two-codeword metric. At high SNR it dominates the error probability because only the nearest competitor matters; at low SNR the union bound over-counts and becomes loose (s04).

,

Definition:
BICM Bit Metric and Its Likelihood Ratio

For a BICM receiver, the ML decoder operates on the bit metric $\lambda_\ell(y; b) \triangleq \log \sum_{s \in \mathcal{X}_\ell^{(b)}} p(y \mid s), \qquad b \in \{0,1\},$ where $\mathcal{X}_\ell^{(b)}$ is the subset of constellation points whose $\ell$ -th label bit equals $b$ . The log-likelihood ratio for bit position $\ell$ is $\Lambda_\ell(y) = \lambda_\ell(y; 0) - \lambda_\ell(y; 1)$ . A binary ML decoder on the interleaved coded stream then computes $\sum_i \Lambda_{\ell_i}(y_i) \cdot (1 - 2 \hat c_i)$ for each candidate codeword $\hat{\mathbf{c}}$ and picks the maximiser.

In the high-SNR regime the "max-log" approximation $\lambda_\ell(y; b) \approx \max_{s \in \mathcal{X}_\ell^{(b)}} \log p(y \mid s)$ is common; it reduces the bit metric to a nearest-subset computation. All PEP bounds in this section hold for the exact bit metric; the max-log version introduces a small additional loss that is characterised in [?fabregas-martinez-caire-2008].

,

Definition:
Average Squared Intra-Subset Euclidean Distance $d^2_{\rm avg}(\mu, \ell)$

For labelling $\mu : \{0,1\}^L \to \mathcal{X}$ and bit position $\ell \in \{0, \ldots, L-1\}$ , define the average squared intra-subset Euclidean distance $d^2_{\rm avg}(\mu, \ell) \triangleq \frac{1}{|\mathcal{X}_\ell^{(0)}| \cdot |\mathcal{X}_\ell^{(1)}|} \sum_{s \in \mathcal{X}_\ell^{(0)}} \sum_{\hat s \in \mathcal{X}_\ell^{(1)}} \|s - \hat s\|^2.$ That is, $d^2_{\rm avg}(\mu, \ell)$ is the mean over all pairs of constellation points that differ in bit position $\ell$ of their squared Euclidean distance. The averaged-over-positions quantity is $d^2_{\rm avg}(\mu) \triangleq \frac{1}{L} \sum_{\ell=0}^{L-1} d^2_{\rm avg}(\mu, \ell).$ This quantity replaces the minimum squared distance $d_{\min}^{2}$ of uncoded modulation analysis in the BICM PEP exponent.

Notice that $d^2_{\rm avg}$ is an AVERAGE, not a minimum. This is the fundamental operational difference between BICM PEP analysis and uncoded symbol-error analysis, and it is exactly what makes Gray labelling attractive on AWGN: Gray spreads pairs of "bit-different" symbols uniformly over all Euclidean distances, so the average is dominated by close neighbours and does not explode.

Theorem: BICM Pairwise Error Probability on AWGN

Consider a BICM system with labelling $\mu$ , constellation $\mathcal{X}$ of size $M = 2^L$ with unit average energy per symbol, on an AWGN channel $y = s + w$ with $w \sim \mathcal{CN}(0, N_0)$ . For two BICM codewords $\mathbf{c}, \hat{\mathbf{c}}$ at Hamming distance $d_H$ , the PEP under ML decoding of the bit metric is bounded by $P(\mathbf{c} \to \hat{\mathbf{c}}) \le Q\!\left(\sqrt{\tfrac{1}{2}\, d_H \cdot d^2_{\rm avg}(\mu) \cdot \tfrac{E_s}{N_0}}\right) \le \tfrac{1}{2} \exp\!\left(-\tfrac{d_H}{4} \cdot d^2_{\rm avg}(\mu) \cdot \tfrac{E_s}{N_0}\right).$ Equivalently, the error exponent per unit $E_s/N_0$ grows linearly in $d_H$ with slope $d^2_{\rm avg}(\mu)/4$ .

The point is that the binary code contributes its Hamming distance $d_H$ and the labelling/constellation contributes its average squared distance $d^2_{\rm avg}(\mu)$ . These multiply in the exponent — exactly the same structure as the TCM free-distance bound in Ch. 2, where the trellis contributed its number of error events and the constellation contributed $d_{\min}^{2}$ . The difference is only that for BICM the "per-event distance" averages over many label pairs rather than taking the minimum.

Show Hint

Start from the union bound for ML on bit metrics: $P(\mathbf{c} \to \hat{\mathbf{c}}) \le \mathbb{E}\big[\exp(\sum_{i: c_i \ne \hat c_i} \log\sqrt{p(y_i | \hat c_i)/p(y_i | c_i)})\big]$ .

Use the Bhattacharyya bound: $P(\mathbf{c} \to \hat{\mathbf{c}}) \le \prod_{i : c_i \ne \hat c_i} \beta_{\ell_i}$ , where $\beta_\ell = \mathbb{E}[\sqrt{p(y|b_\ell = 1) / p(y|b_\ell = 0)}]$ is the Bhattacharyya factor of the $\ell$ -th bit channel.

Under the fully-interleaved assumption the bit-position indices $\ell_i$ are i.i.d. uniform over $\{0, \ldots, L-1\}$ , so the product becomes $\bar\beta^{d_H}$ where $\bar\beta = \frac{1}{L}\sum_\ell \beta_\ell$ .

Upper-bound $\bar\beta$ by $\exp(-d^2_{\rm avg}(\mu) E_s/(4N_0))$ using the max over label-pair distances inside the subset sum.

Proof

Step 1: Express ML decision via bit-metric score

Under the interleaver $\pi$ , the $i$ -th coded bit is mapped to bit position $\ell_i$ of symbol $s_i \in \mathcal{X}$ transmitted at time $i$ . The ML decision on $\mathbf{c}$ vs $\hat{\mathbf{c}}$ reduces to the sign of $\Delta_{\mathbf{c}, \hat{\mathbf{c}}}(\mathbf{y}) = \sum_{i : c_i \ne \hat c_i} \log \frac{\lambda_{\ell_i}(y_i; \hat c_i)}{\lambda_{\ell_i}(y_i; c_i)},$ since bits where $c_i = \hat c_i$ contribute zero. An error occurs when this sum is positive given $\mathbf{c}$ was sent.

Step 2: Chernoff bound on the sum

By the Chernoff inequality with exponent $s = 1/2$ (the Bhattacharyya-optimal choice for equiprobable hypotheses), $P(\mathbf{c} \to \hat{\mathbf{c}}) \le \mathbb{E}\!\left[\exp(\tfrac{1}{2} \Delta_{\mathbf{c}, \hat{\mathbf{c}}}(\mathbf{y}))\right] = \prod_{i : c_i \ne \hat c_i} \beta_{\ell_i},$ where we used independence of noise samples and defined $\beta_\ell \triangleq \mathbb{E}\!\left[\sqrt{\lambda_\ell(Y; 1)/\lambda_\ell(Y; 0)} \;\middle|\; B_\ell = 0\right].$ Here $\beta_\ell$ is the Bhattacharyya factor of the $\ell$ -th BICM bit channel; it is a number in $[0, 1]$ that captures the "distinguishability" of the two bit values given $y$ .

Step 3: Averaging over interleaver positions

Because the interleaver is uniformly random, each $\ell_i$ is independent and uniform over $\{0, \ldots, L-1\}$ . Taking expectation over the interleaver, $\mathbb{E}_\pi\!\left[\prod_{i : c_i \ne \hat c_i} \beta_{\ell_i}\right] = \bar\beta^{d_H}, \qquad \bar\beta \triangleq \frac{1}{L}\sum_{\ell=0}^{L-1} \beta_\ell.$ The PEP depends on the binary code only through $d_H$ , and on the labelling/constellation/noise only through the scalar $\bar\beta$ .

Step 4: Bounding $\bar\beta$ by the average distance

For each bit channel $\ell$ on AWGN, $\beta_\ell = \frac{1}{|\mathcal{X}_\ell^{(0)}|} \sum_{s \in \mathcal{X}_\ell^{(0)}} \mathbb{E}\!\left[\sqrt{\frac{\sum_{\hat s \in \mathcal{X}_\ell^{(1)}} \exp(-\|y - \hat s\|^2/N_0)}{\sum_{s' \in \mathcal{X}_\ell^{(0)}} \exp(-\|y - s'\|^2/N_0)}} \;\middle|\; s\right].$ A standard Jensen-type bound (used throughout coded-modulation PEP analysis — see [?proakis-2008]) yields $\beta_\ell \le \exp\!\left(-\frac{d^2_{\rm avg}(\mu, \ell)}{4N_0}\right).$ Averaging over $\ell$ and using $\exp$ convexity gives $\bar\beta \le \exp(-d^2_{\rm avg}(\mu) E_s/(4N_0))$ after normalisation by $E_s$ .

Step 5: Assemble and tighten via Q-function

Substituting back into $\bar\beta^{d_H}$ yields the exponential bound stated in the theorem. The tighter Q-function form $Q(\sqrt{d_H d^2_{\rm avg}(\mu) E_s/(2N_0)})$ follows from the observation that the dominant pair $(\mathbf{s}, \hat{\mathbf{s}})$ in the bit-metric error event is Gaussian-noise-dominated with effective distance $\sqrt{d_H d^2_{\rm avg}(\mu)}$ , which then goes through the standard Gaussian tail $Q(\cdot)$ . $\blacksquare$

,

Key Takeaway

The BICM AWGN PEP factorises into code and constellation contributions. The Hamming distance $d_H$ of the binary code and the average squared distance $d^2_{\rm avg}(\mu)$ of the labelled constellation enter multiplicatively in the exponent. Pick a code with large $d_H$ and a labelling with large $d^2_{\rm avg}$ — AND these can be designed independently, which is the whole point of the BICM architecture.

BICM Pairwise Error Probability on AWGN

Exact $Q$ -form PEP from Thm. 1 as a function of $E_s/N_0$ (in dB), for two Gray-labelled QAM sizes and Hamming distances $d_H$ in the range $\{2, \ldots, 20\}$ . The slope at high SNR is the same for all $d_H$ (one-sided AWGN exponent); the shift is $10\log_{10}(d_H) + 10 \log_{10}(d^2_{\rm avg}(\mu))$ dB. Use the plot to read off the SNR required for $10^{-5}$ PEP — this is your design SNR for the nearest competitor.

Parameters

Hamming distance

d_H

10

QAM size

Pairwise Error: Euclidean Distance Depends on Labelling

A short animation showing two BICM codewords differing in

d_H

bit positions. On the left, the Gray-labelled mapper produces two constellation sequences whose per-symbol Euclidean distance averages to

d^2_{\rm avg}(\mu_G) \approx d_{\min}^{2}

. On the right, the set-partition (Ungerboeck) mapper produces sequences whose per-symbol distance averages to a LARGER value at certain bit levels. The accumulated squared distance — plotted as the growing gap between the two trajectories — is what Thm. 1 turns into an error-probability bound.

Upper panel: bit sequences

\mathbf{c}

and

\hat{\mathbf{c}}

with

d_H = 6

. Lower panel: the corresponding

\mathbf{s}, \hat{\mathbf{s}}

trajectories in the signal constellation. The running sum

\sum_i \|s_i - \hat s_i\|^2

governs the Gaussian PEP.

Example: Computing $d^2_{\rm avg}(\mu_G)$ for Gray-Labelled 16-QAM

Compute $d^2_{\rm avg}(\mu_G, \ell)$ for each bit position $\ell \in \{0,1, 2, 3\}$ of a unit-average-energy 16-QAM constellation with Gray labelling, and the overall $d^2_{\rm avg}(\mu_G)$ . Compare with $d_{\min}^{2}$ .

Solution

Setup: 16-QAM with unit average energy

The 16-QAM constellation has points at $\{\pm 1, \pm 3\} \times \{\pm 1, \pm 3\}$ scaled by $\sqrt{1/10}$ so that the average energy is one. The minimum distance is $d_{\min}^{2} = 4/10 = 0.4$ . Gray labelling assigns the four label bits so that each Gray-adjacent pair of constellation points differs in exactly one label bit.

Bit 0 (in-phase LSB)

Bit 0 partitions the constellation into the left half ( $I < 0$ ) and right half ( $I > 0$ ). For each $s$ on the left, $\mathcal{X}_0^{(1)}$ contains four mirror points on the right; the squared distances are $\{4, 4 + 4, 4 + 16, 4 + 16 + 4\}/10$ — or, computing properly, $d^2_{\rm avg}(\mu_G, 0) = \frac{1}{8 \cdot 8}\sum_{s, \hat s} \|s - \hat s\|^2 = \frac{1}{10} \cdot \mathbb{E}[(\Delta I)^2 + (\Delta Q)^2] = 0.8.$ (A direct enumeration over the $64$ pairs gives the same number.)

Bits 1, 2, 3 (in-phase MSB, quadrature LSB, quadrature MSB)

By symmetry of the Gray labelling, bits 2 and 3 play the same role for the quadrature axis that bits 0 and 1 play for the in-phase axis, so $d^2_{\rm avg}(\mu_G, 0) = d^2_{\rm avg}(\mu_G, 2) = 0.8, \quad d^2_{\rm avg}(\mu_G, 1) = d^2_{\rm avg}(\mu_G, 3) = 1.2.$ The MSB splits the constellation into outer and inner rings; the mean separation is larger there.

Averaging over bit positions

$d^2_{\rm avg}(\mu_G) = \frac{1}{4}(0.8 + 1.2 + 0.8 + 1.2) = 1.0.$ $By comparison,$ d_{\min}^{2} = 0.4 $, so the BICM exponent is effectively$ d_H \cdot 1.0/(4 N_0) $, a factor of$ 2.5\times $better than if we only had$ d_{\min}^{2}$ in the exponent. This numerical separation is exactly why BICM does not suffer as much on AWGN as a naive "treat each bit as an independent channel" argument would predict.

Common Mistake: $d^2_{\rm avg}$ Is an AVERAGE, Not the Minimum

Mistake:

A common mistake when reading the BICM PEP bound for the first time is to conclude, by analogy with uncoded QAM, that the exponent is governed by $d_{\min}^{2}$ and that labelling only matters insofar as it affects $d_{\min}$ of the underlying constellation.

Correction:

The BICM exponent involves $d^2_{\rm avg}(\mu)$ , not $d_{\min}^{2}$ . Two labellings on the same constellation (same $d_{\min}$ ) can produce dramatically different $d^2_{\rm avg}$ . For 16-QAM, the natural binary labelling has $d^2_{\rm avg}$ smaller than Gray's, because bit positions with close neighbours get more weight. On AWGN, this difference can be $0.5\text{–}1$ dB. On fading (s03), a completely different story unfolds — there, MIN distance over label pairs matters via $L_{\min}(\mu)$ .

Pattern: This Is the Same Argument as TCM PEP

The proof of Thm. 1 — Chernoff bound on a sum of log-likelihoods, followed by Bhattacharyya averaging — is the same pattern we used in Ch. 2 to bound the TCM pairwise error along a trellis error event. The only change is the decoding metric: in TCM we use the symbol metric $-\|y - s\|^2$ ; in BICM we use the bit metric $\lambda_\ell(y; b)$ . Because the bit metric averages over a subset of symbols rather than pointing to a single one, the Bhattacharyya factor becomes a subset-wise average, and the constellation contribution to the exponent becomes $d^2_{\rm avg}$ rather than $d_{\min}^{2}$ .

This is a useful principle to internalise: every coded-modulation PEP bound has the same skeleton — Chernoff, Bhattacharyya, sum-over-error- event-positions. What changes between schemes is only which distance metric lives inside the Bhattacharyya factor.

Quick Check

The BICM AWGN PEP bound of Thm. 1 scales, for large $E_s/N_0$ , as $\exp(-\alpha E_s/N_0)$ with exponent $\alpha$ equal to:

$d_H d_{\min}^{2} / 4$

$d_H \cdot d^2_{\rm avg}(\mu) / 4$

$L_{\min}(\mu) \cdot d^2_{\rm avg}(\mu) / 4$

$d_{H,{\rm free}} \cdot d_{\min}^{2} / 2$

Correction:

d_H \cdot d^2_{\rm avg}(\mu) / 4

From Thm. 1: the exponent is $d_H \cdot d^2_{\rm avg}(\mu) \cdot E_s/(4N_0)$ , factorising into code ( $d_H$ ) and constellation/labelling ( $d^2_{\rm avg}$ ) contributions.

Pairwise Error Probability (PEP)

The probability that a maximum-likelihood decoder prefers a specific wrong codeword $\hat{\mathbf{c}}$ over the transmitted one $\mathbf{c}$ , given only those two are candidates. On AWGN the PEP is a Q-function of the Euclidean distance between the codewords; for BICM, the Euclidean distance is replaced by the bit-metric Bhattacharyya factor.

Bhattacharyya Factor

For a binary-input channel with transition densities $p(y | 0)$ and $p(y | 1)$ , the Bhattacharyya factor is $\beta = \int \sqrt{p(y|0)\, p(y|1)}\, dy \in [0, 1]$ . It is the Chernoff-optimal exponent for hypothesis testing between equiprobable binary inputs. For a BICM bit channel, $\beta_\ell$ depends on the labelling $\mu$ and the noise level.

BICM Pairwise Error Probability