BICM Pairwise Error Probability

From Capacity to Error Probability

Chapter 5 told us what rate a BICM transmitter can support — the BICM capacity CBICM(μ)C_{\rm BICM}(\mu). The punchline there was that with a Gray labelling, CBICMC_{\rm BICM} sits within a few tenths of a bit of the CM capacity CCMC_{\rm CM} over essentially every SNR of practical interest. Capacity, however, is only half of the design story. What the hardware engineer actually has to hit is a target bit-error rate — 10510^{-5} for broadband data, 10310^{-3} for voice, perhaps 10910^{-9} for optical backhaul — at some specified SNR. Two BICM systems with the same capacity can reach those targets at wildly different SNRs, depending on how the labelling, the binary code, and the interleaver interact.

We now turn to the error-probability side of the Caire–Taricco–Biglieri analysis. The backbone of this analysis is the pairwise error probability (PEP): the probability that a maximum-likelihood decoder prefers a wrong codeword c^\hat{\mathbf{c}} over the transmitted one c\mathbf{c} when those are the only two candidates. From the PEP we assemble union bounds on the codeword and bit error probabilities. The technique — Chernoff bounding applied to the log-likelihood metric — is the same one we used for TCM in Ch. 2. The novelty is that for BICM, the metric is a BIT metric, not a symbol metric, and that is what makes the labelling μ\mu play a role through the average squared distance rather than the minimum distance.

Definition:

BICM Codeword Pair and Hamming Distance

Let c=(c1,,cK){0,1}K\mathbf{c} = (c_1, \ldots, c_K) \in \{0,1\}^K and c^=(c^1,,c^K){0,1}K\hat{\mathbf{c}} = (\hat c_1, \ldots, \hat c_K) \in \{0,1\}^K be two distinct BICM codewords (at the output of the binary encoder, before mapping). Their Hamming distance is dH(c,c^){i:cic^i},d_H(\mathbf{c}, \hat{\mathbf{c}}) \triangleq |\{i : c_i \ne \hat c_i\}|, the number of bit positions at which they disagree. For a binary linear code, the distribution of dHd_H over all nonzero codewords is captured by the weight enumerator W(z)=d1WdzdW(z) = \sum_{d \ge 1} W_d z^d, where WdW_d is the number of codewords of Hamming weight dd. The smallest dd with Wd>0W_d > 0 is the minimum Hamming distance dH,mind_{H,\min} (or simply dHd_H when the context is clear).

The interleaver π\pi permutes c\mathbf{c} before mapping, so the two codewords at the encoder output correspond to two constellation-symbol sequences at the mapper output. For the error-probability analysis we model the interleaver as placing the dHd_H differing bit positions at dHd_H DIFFERENT symbol times — the fully-interleaved assumption of s03.

,

Definition:

Pairwise Error Probability

Given a channel and a maximum-likelihood (ML) decoder, the pairwise error probability (PEP) between two codewords c\mathbf{c} and c^\hat{\mathbf{c}} is P(cc^)Pr ⁣(ML decoder prefers c^ over c  |  c sent).P(\mathbf{c} \to \hat{\mathbf{c}}) \triangleq \Pr\!\left(\text{ML decoder prefers } \hat{\mathbf{c}} \text{ over } \mathbf{c} \;\middle|\; \mathbf{c} \text{ sent}\right). The codeword error probability PeP_e is upper-bounded by the union over all competing codewords, Pec^cP(cc^),P_e \le \sum_{\hat{\mathbf{c}} \ne \mathbf{c}} P(\mathbf{c} \to \hat{\mathbf{c}}), and the bit error probability PbP_b is upper-bounded by the same sum, weighted by the number of input-information-bit errors each PEP event causes.

The PEP is a two-codeword metric. At high SNR it dominates the error probability because only the nearest competitor matters; at low SNR the union bound over-counts and becomes loose (s04).

,

Definition:

BICM Bit Metric and Its Likelihood Ratio

For a BICM receiver, the ML decoder operates on the bit metric λ(y;b)logsX(b)p(ys),b{0,1},\lambda_\ell(y; b) \triangleq \log \sum_{s \in \mathcal{X}_\ell^{(b)}} p(y \mid s), \qquad b \in \{0,1\}, where X(b)\mathcal{X}_\ell^{(b)} is the subset of constellation points whose \ell-th label bit equals bb. The log-likelihood ratio for bit position \ell is Λ(y)=λ(y;0)λ(y;1)\Lambda_\ell(y) = \lambda_\ell(y; 0) - \lambda_\ell(y; 1). A binary ML decoder on the interleaved coded stream then computes iΛi(yi)(12c^i)\sum_i \Lambda_{\ell_i}(y_i) \cdot (1 - 2 \hat c_i) for each candidate codeword c^\hat{\mathbf{c}} and picks the maximiser.

In the high-SNR regime the "max-log" approximation λ(y;b)maxsX(b)logp(ys)\lambda_\ell(y; b) \approx \max_{s \in \mathcal{X}_\ell^{(b)}} \log p(y \mid s) is common; it reduces the bit metric to a nearest-subset computation. All PEP bounds in this section hold for the exact bit metric; the max-log version introduces a small additional loss that is characterised in [?fabregas-martinez-caire-2008].

,

Definition:

Average Squared Intra-Subset Euclidean Distance davg2(μ,)d^2_{\rm avg}(\mu, \ell)

For labelling μ:{0,1}LX\mu : \{0,1\}^L \to \mathcal{X} and bit position {0,,L1}\ell \in \{0, \ldots, L-1\}, define the average squared intra-subset Euclidean distance davg2(μ,)1X(0)X(1)sX(0)s^X(1)ss^2.d^2_{\rm avg}(\mu, \ell) \triangleq \frac{1}{|\mathcal{X}_\ell^{(0)}| \cdot |\mathcal{X}_\ell^{(1)}|} \sum_{s \in \mathcal{X}_\ell^{(0)}} \sum_{\hat s \in \mathcal{X}_\ell^{(1)}} \|s - \hat s\|^2. That is, davg2(μ,)d^2_{\rm avg}(\mu, \ell) is the mean over all pairs of constellation points that differ in bit position \ell of their squared Euclidean distance. The averaged-over-positions quantity is davg2(μ)1L=0L1davg2(μ,).d^2_{\rm avg}(\mu) \triangleq \frac{1}{L} \sum_{\ell=0}^{L-1} d^2_{\rm avg}(\mu, \ell). This quantity replaces the minimum squared distance dmin2d_{\min}^{2} of uncoded modulation analysis in the BICM PEP exponent.

Notice that davg2d^2_{\rm avg} is an AVERAGE, not a minimum. This is the fundamental operational difference between BICM PEP analysis and uncoded symbol-error analysis, and it is exactly what makes Gray labelling attractive on AWGN: Gray spreads pairs of "bit-different" symbols uniformly over all Euclidean distances, so the average is dominated by close neighbours and does not explode.

Theorem: BICM Pairwise Error Probability on AWGN

Consider a BICM system with labelling μ\mu, constellation X\mathcal{X} of size M=2LM = 2^L with unit average energy per symbol, on an AWGN channel y=s+wy = s + w with wCN(0,N0)w \sim \mathcal{CN}(0, N_0). For two BICM codewords c,c^\mathbf{c}, \hat{\mathbf{c}} at Hamming distance dHd_H, the PEP under ML decoding of the bit metric is bounded by P(cc^)Q ⁣(12dHdavg2(μ)EsN0)12exp ⁣(dH4davg2(μ)EsN0).P(\mathbf{c} \to \hat{\mathbf{c}}) \le Q\!\left(\sqrt{\tfrac{1}{2}\, d_H \cdot d^2_{\rm avg}(\mu) \cdot \tfrac{E_s}{N_0}}\right) \le \tfrac{1}{2} \exp\!\left(-\tfrac{d_H}{4} \cdot d^2_{\rm avg}(\mu) \cdot \tfrac{E_s}{N_0}\right). Equivalently, the error exponent per unit Es/N0E_s/N_0 grows linearly in dHd_H with slope davg2(μ)/4d^2_{\rm avg}(\mu)/4.

The point is that the binary code contributes its Hamming distance dHd_H and the labelling/constellation contributes its average squared distance davg2(μ)d^2_{\rm avg}(\mu). These multiply in the exponent — exactly the same structure as the TCM free-distance bound in Ch. 2, where the trellis contributed its number of error events and the constellation contributed dmin2d_{\min}^{2}. The difference is only that for BICM the "per-event distance" averages over many label pairs rather than taking the minimum.

,

Key Takeaway

The BICM AWGN PEP factorises into code and constellation contributions. The Hamming distance dHd_H of the binary code and the average squared distance davg2(μ)d^2_{\rm avg}(\mu) of the labelled constellation enter multiplicatively in the exponent. Pick a code with large dHd_H and a labelling with large davg2d^2_{\rm avg} — AND these can be designed independently, which is the whole point of the BICM architecture.

BICM Pairwise Error Probability on AWGN

Exact QQ-form PEP from Thm. 1 as a function of Es/N0E_s/N_0 (in dB), for two Gray-labelled QAM sizes and Hamming distances dHd_H in the range {2,,20}\{2, \ldots, 20\}. The slope at high SNR is the same for all dHd_H (one-sided AWGN exponent); the shift is 10log10(dH)+10log10(davg2(μ))10\log_{10}(d_H) + 10 \log_{10}(d^2_{\rm avg}(\mu)) dB. Use the plot to read off the SNR required for 10510^{-5} PEP — this is your design SNR for the nearest competitor.

Parameters
10

Pairwise Error: Euclidean Distance Depends on Labelling

A short animation showing two BICM codewords differing in dHd_H bit positions. On the left, the Gray-labelled mapper produces two constellation sequences whose per-symbol Euclidean distance averages to davg2(μG)dmin2d^2_{\rm avg}(\mu_G) \approx d_{\min}^{2}. On the right, the set-partition (Ungerboeck) mapper produces sequences whose per-symbol distance averages to a LARGER value at certain bit levels. The accumulated squared distance — plotted as the growing gap between the two trajectories — is what Thm. 1 turns into an error-probability bound.
Upper panel: bit sequences c\mathbf{c} and c^\hat{\mathbf{c}} with dH=6d_H = 6. Lower panel: the corresponding s,s^\mathbf{s}, \hat{\mathbf{s}} trajectories in the signal constellation. The running sum isis^i2\sum_i \|s_i - \hat s_i\|^2 governs the Gaussian PEP.

Example: Computing davg2(μG)d^2_{\rm avg}(\mu_G) for Gray-Labelled 16-QAM

Compute davg2(μG,)d^2_{\rm avg}(\mu_G, \ell) for each bit position {0,1,2,3}\ell \in \{0,1, 2, 3\} of a unit-average-energy 16-QAM constellation with Gray labelling, and the overall davg2(μG)d^2_{\rm avg}(\mu_G). Compare with dmin2d_{\min}^{2}.

Common Mistake: davg2d^2_{\rm avg} Is an AVERAGE, Not the Minimum

Mistake:

A common mistake when reading the BICM PEP bound for the first time is to conclude, by analogy with uncoded QAM, that the exponent is governed by dmin2d_{\min}^{2} and that labelling only matters insofar as it affects dmind_{\min} of the underlying constellation.

Correction:

The BICM exponent involves davg2(μ)d^2_{\rm avg}(\mu), not dmin2d_{\min}^{2}. Two labellings on the same constellation (same dmind_{\min}) can produce dramatically different davg2d^2_{\rm avg}. For 16-QAM, the natural binary labelling has davg2d^2_{\rm avg} smaller than Gray's, because bit positions with close neighbours get more weight. On AWGN, this difference can be 0.510.5\text{–}1 dB. On fading (s03), a completely different story unfolds — there, MIN distance over label pairs matters via Lmin(μ)L_{\min}(\mu).

Pattern: This Is the Same Argument as TCM PEP

The proof of Thm. 1 — Chernoff bound on a sum of log-likelihoods, followed by Bhattacharyya averaging — is the same pattern we used in Ch. 2 to bound the TCM pairwise error along a trellis error event. The only change is the decoding metric: in TCM we use the symbol metric ys2-\|y - s\|^2; in BICM we use the bit metric λ(y;b)\lambda_\ell(y; b). Because the bit metric averages over a subset of symbols rather than pointing to a single one, the Bhattacharyya factor becomes a subset-wise average, and the constellation contribution to the exponent becomes davg2d^2_{\rm avg} rather than dmin2d_{\min}^{2}.

This is a useful principle to internalise: every coded-modulation PEP bound has the same skeleton — Chernoff, Bhattacharyya, sum-over-error- event-positions. What changes between schemes is only which distance metric lives inside the Bhattacharyya factor.

Quick Check

The BICM AWGN PEP bound of Thm. 1 scales, for large Es/N0E_s/N_0, as exp(αEs/N0)\exp(-\alpha E_s/N_0) with exponent α\alpha equal to:

dHdmin2/4d_H d_{\min}^{2} / 4

dHdavg2(μ)/4d_H \cdot d^2_{\rm avg}(\mu) / 4

Lmin(μ)davg2(μ)/4L_{\min}(\mu) \cdot d^2_{\rm avg}(\mu) / 4

dH,freedmin2/2d_{H,{\rm free}} \cdot d_{\min}^{2} / 2

Pairwise Error Probability (PEP)

The probability that a maximum-likelihood decoder prefers a specific wrong codeword c^\hat{\mathbf{c}} over the transmitted one c\mathbf{c}, given only those two are candidates. On AWGN the PEP is a Q-function of the Euclidean distance between the codewords; for BICM, the Euclidean distance is replaced by the bit-metric Bhattacharyya factor.

Related: Bhattacharyya Factor, Chernoff Bound, BICM Codeword Pair and Hamming Distance

Bhattacharyya Factor

For a binary-input channel with transition densities p(y0)p(y | 0) and p(y1)p(y | 1), the Bhattacharyya factor is β=p(y0)p(y1)dy[0,1]\beta = \int \sqrt{p(y|0)\, p(y|1)}\, dy \in [0, 1]. It is the Chernoff-optimal exponent for hypothesis testing between equiprobable binary inputs. For a BICM bit channel, β\beta_\ell depends on the labelling μ\mu and the noise level.

Related: Chernoff Bound, Pattern: This Is the Same Argument as TCM PEP, Bit Metric