Ferkans — Interactive Telecom Tutor

ex-ch05-01

Easy

For 16-QAM with Gray labelling (two PAM Gray codes, one per axis), verify that every pair of nearest-neighbour constellation points (at distance $\Delta$ ) differs in exactly one label bit. How many nearest-neighbour pairs are there total?

Show Hint

Draw the 4x4 grid and label each point with a 4-bit label $(b_0^I, b_1^I, b_0^Q, b_1^Q)$ using the 4-PAM Gray code $(-3, -1, +1, +3) \to (00, 01, 11, 10)$ .

List the $I$ -direction neighbours and the $Q$ -direction neighbours.

There are $24$ horizontal nearest-neighbour pairs and $24$ vertical ones.

Solution

Label the grid

The four 4-PAM Gray codes are $(-3, -1, +1, +3) \to (00, 01, 11, 10)$ . A 16-QAM point at $(x, y)$ has 4-bit label $(\text{Gray}(x), \text{Gray}(y))$ , concatenation giving $(b_0^I, b_1^I, b_0^Q, b_1^Q)$ .

I-direction neighbours

Horizontal neighbours differ only in their 2-bit $I$ sublabel. The PAM Gray transitions are $(00 \leftrightarrow 01)$ , $(01 \leftrightarrow 11)$ , $(11 \leftrightarrow 10)$ — each differing in exactly one of $b_0^I, b_1^I$ . The $Q$ sublabel is the same on both ends. So each pair differs in exactly one bit. With 4 rows and 3 horizontal pairs per row, there are $4 \cdot 3 = 12$ horizontal pairs.

Q-direction neighbours

By symmetry, $12$ vertical pairs, each also at Hamming distance $1$ .

Total

$24$ nearest-neighbour pairs, all at Hamming distance exactly one. This is the Gray property. $\blacksquare$

ex-ch05-02

Easy

State the BICM capacity formula and explain, in one sentence each, why the formula (a) is an upper bound on achievable rate under any product-form demapping metric, and (b) is achieved under exact marginal demapping.

Show Hint

Use the result of Thm. TThe BICM Capacity Decomposition.

Solution

The formula

$C_{\rm BICM}(\mu) = \sum_{\ell = 0}^{L-1} I(Y; B_\ell)$ .

Upper bound under product-form decoding

Any product-form decoding metric is mismatched relative to the true joint symbol likelihood, and the generalised mutual information (GMI) under such a metric equals the sum of marginals — a standard result from Merhav-Kaplan-Lapidoth- Shamai on mismatched decoding.

Achievability under exact marginal demapping

With exact marginal likelihoods $p_{W_\ell}(y \mid b)$ , the decoder operates on a memoryless binary mixture channel with capacity $\tfrac{1}{L} \sum_\ell I(Y; B_\ell)$ ; Shannon's theorem delivers reliable coded-bit rate up to this capacity, and coded-bit rate $\times L$ is the information rate.

ex-ch05-03

Medium

Prove that for any labelling $\mu$ and any channel $p(y \mid x)$ with uniform inputs, $\sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;\le\; I(Y; X),$ with equality if and only if the label bits are jointly independent given $Y$ .

Show Hint

Apply the chain rule to $I(Y; X) = I(Y; B_0, B_1, \ldots, B_{L-1})$ .

Express the difference as a sum of conditional mutual informations.

Use the a priori independence of the label bits ( $I(B_\ell; B_{<\ell}) = 0$ ) to rewrite each difference term.

Solution

Chain rule

$I(Y; X) = I(Y; B_0, \ldots, B_{L-1}) = \sum_{\ell} I(Y; B_\ell \mid B_0, \ldots, B_{\ell-1})$ .

Subtract

Difference $= \sum_{\ell \ge 1} [I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)]$ . (The $\ell = 0$ term cancels.)

Rewrite each term

Using the identity $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(B_\ell; B_{<\ell} \mid Y) - I(B_\ell; B_{<\ell})$ and the prior-independence $I(B_\ell; B_{<\ell}) = 0$ , we get $\sum_\ell I(B_\ell; B_{<\ell} \mid Y) \ge 0$ .

Equality condition

Each term $I(B_\ell; B_{<\ell} \mid Y) = 0$ iff $B_\ell$ and $B_{<\ell}$ are conditionally independent given $Y$ ; all terms vanish iff $B_0, \ldots, B_{L-1}$ are jointly conditionally independent given $Y$ . $\blacksquare$

ex-ch05-04

Medium

Compute the BICM bit-channel transition $p_{W_\ell}(y \mid b)$ for 4-PAM with Gray labelling $(-3, -1, +1, +3) \to (00, 01, 11, 10)$ under AWGN with noise variance $\sigma^2$ . Do this explicitly for the first bit $(\ell = 0)$ and the second bit $(\ell = 1)$ .

Show Hint

For $\ell = 0$ , $\mathcal{X}_0^{(0)} = \{-3, -1\}$ and $\mathcal{X}_0^{(1)} = \{+1, +3\}$ .

For $\ell = 1$ , $\mathcal{X}_1^{(0)} = \{-3, +3\}$ and $\mathcal{X}_1^{(1)} = \{-1, +1\}$ .

The Gaussian density is $\phi(y - x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp(-\tfrac{1}{2\sigma^2}(y-x)^2)$ .

Solution

Bit 0 (MSB of Gray code)

$\mathcal{X}_0^{(0)} = \{-3, -1\}$ (labels $00, 01$ ) and $\mathcal{X}_0^{(1)} = \{+1, +3\}$ (labels $11, 10$ ). Therefore $p_{W_0}(y \mid 0) = \tfrac12 [\phi(y+3) + \phi(y+1)],\qquad p_{W_0}(y \mid 1) = \tfrac12 [\phi(y-1) + \phi(y-3)].$ Each half corresponds to "two points on one side of zero." This is an approximately symmetric binary-input channel with the decision boundary at $y = 0$ .

Bit 1 (LSB of Gray code)

$\mathcal{X}_1^{(0)} = \{-3, +3\}$ (labels $00, 10$ ) and $\mathcal{X}_1^{(1)} = \{-1, +1\}$ (labels $01, 11$ ). Therefore $p_{W_1}(y \mid 0) = \tfrac12 [\phi(y+3) + \phi(y-3)],\qquad p_{W_1}(y \mid 1) = \tfrac12 [\phi(y+1) + \phi(y-1)].$ Each half is a two-component Gaussian mixture. The decision boundaries are at $y = \pm 2$ .

Remark

The first bit is the easier bit: points that carry $b_0 = 0$ are all on one side of $y = 0$ , so the posterior resembles a binary-input antipodal channel. The second bit is harder: points carrying $b_1 = 0$ (the outer pair) and $b_1 = 1$ (the inner pair) are interleaved, and the channel has two decision regions. This mirrors the numerical table of Thm. TThe BICM Capacity Decomposition.

ex-ch05-05

Medium

For 4-PAM under SP labelling, define $\mathcal{X}_0^{(0)} = \{-3, -1\}$ and $\mathcal{X}_0^{(1)} = \{+1, +3\}$ (same as Gray), but $\mathcal{X}_1^{(0)} = \{-3, +1\}$ (the coarser coset) and $\mathcal{X}_1^{(1)} = \{-1, +3\}$ (the finer coset). Write the bit-channel transitions and argue why the $\ell = 1$ channel has lower unconditional capacity than its Gray counterpart.

Show Hint

Compare which two points are averaged together for bit $\ell = 1$ in each case.

Gray groups outer pair / inner pair; SP groups $(-3, +1)$ / $(-1, +3)$ — points at distance $4$ .

Solution

SP transitions for bit 1

$p_{W_1}^{\rm SP}(y \mid 0) = \tfrac12 [\phi(y+3) + \phi(y-1)]$ and $p_{W_1}^{\rm SP}(y \mid 1) = \tfrac12 [\phi(y+1) + \phi(y-3)]$ .

Why SP is worse here

Under SP, each $b_1$ -subset consists of two points at distance $4$ — both mixtures are bimodal with widely separated modes. The two conditional distributions are in fact identical up to a shift by 4, so the total variation between $p(y \mid 0)$ and $p(y \mid 1)$ is small, and the mutual information $I(Y; B_1^{\rm SP})$ is significantly less than $I(Y; B_1^{\rm Gray})$ .

The geometric intuition

SP labelling encodes bit $b_1$ in a long-range structure (points separated by $4$ ), which makes each unconditional bit-channel distribution bimodal and hard to distinguish. Gray encodes bit $b_1$ in short-range structure (inner pair vs outer pair), giving a cleaner unconditional binary channel. The MLC decoder (which conditions on $b_0$ ) turns the SP bimodal channel into a unimodal one and reaps the hierarchy benefit — but BICM, being unconditional, pays the bimodal price.

ex-ch05-06

Medium

Use the interactive plot 📊CM vs BICM Capacity for QAM on AWGN or direct numerical integration to verify that for 16-QAM at $\text{SNR} = 10$ dB, $C_{\rm CM} - C_{\rm BICM, Gray} < 0.05$ bits. Then find the SNR at which the gap is maximised and report its value.

Show Hint

The maximum gap typically occurs in the mid-SNR 'waterfall' region, not at very low or very high SNR.

For 16-QAM, the maximum gap is near $\text{SNR} = 4$ dB and is $\approx 0.04$ bits.

Solution

At $\ntn{snr} = 10$ dB

The simulation reports $C_{\rm CM} \approx 3.218$ and $C_{\rm BICM, Gray} \approx 3.176$ bits/symbol, giving a gap of $0.042$ bits, well below the 0.05-bit ceiling.

Find the max gap

Sweeping the plot from $\text{SNR} = -5$ to $+25$ dB, the gap $C_{\rm CM} - C_{\rm BICM, Gray}$ peaks at approximately $\text{SNR} = 4$ dB with a value of $\approx 0.047$ bits/symbol. At higher SNR the gap decays exponentially (Thm. TGray Labelling Near-Optimality on AWGN); at lower SNR both capacities go to zero and the gap shrinks.

Implication

The uniform bound in Thm. TGray Labelling Near-Optimality on AWGN(b) ( $\le 0.05$ bits for $M \le 256$ ) is essentially tight for 16-QAM; the achievable numerical gap is within a hair of the theoretical upper bound and is negligible at the operating point of any real 16-QAM system.

ex-ch05-07

Medium

A BICM system uses a rate- $R_c = 3/4$ LDPC code driving 256-QAM ( $L = 8$ ) on AWGN. What is the spectral efficiency $\eta$ ? Using the interactive plot 📊CM vs BICM Capacity for QAM on AWGN, estimate the SNR at which reliable communication is possible (Shannon limit of BICM-Gray-256-QAM at this rate).

Show Hint

$\eta = R_c \cdot L$ .

Find the SNR where $C_{\rm BICM, Gray}$ crosses $\eta$ .

Solution

Spectral efficiency

$\eta = 0.75 \cdot 8 = 6$ bits/symbol.

BICM Shannon limit

From the plot, for $M = 256$ the BICM-Gray curve reaches $6$ bits/symbol at approximately $\text{SNR} \approx 19$ dB. (Compare the Shannon limit $\log_2(1 + \text{SNR}) = 6 \Rightarrow \text{SNR} = 63 \approx 18$ dB — the gap is the 256-QAM modulation-capacity penalty of $\approx 1$ dB.)

Implication

A capacity-approaching LDPC on BICM-Gray-256-QAM cannot achieve vanishing error below $19$ dB of SNR. Adding the standard finite-length penalty of $1$ – $1.5$ dB puts a typical rate- $3/4$ 256-QAM waterfall at $\approx 20$ – $20.5$ dB for BER $10^{-5}$ .

ex-ch05-08

Hard

Derive the high-SNR asymptotic behaviour of $C_{\rm BICM}(\mu_G)$ for square 16-QAM with Gray labelling: $C_{\rm BICM, Gray}(\text{SNR}) = 4 - \Theta(Q(\sqrt{\text{SNR}/5}))$ as $\text{SNR} \to \infty$ . (The constant inside the $\Theta$ depends on the average number of nearest neighbours.)

Show Hint

At high SNR, the bit-channel capacity $C_\ell \to 1$ bit; the correction is the binary-channel entropy at crossover probability $\approx Q(\sqrt{\text{SNR}/5})$ .

$d_{\min}/\sqrt{E_s/2} = 2/\sqrt{5}$ for unit-energy 16-QAM.

Binary entropy $h_2(p) \approx -p \log_2 p$ for small $p$ .

Solution

Per-bit crossover probability

Under Gray labelling, the dominant error event for each bit is a single nearest-neighbour flip, occurring with probability $P_\ell \approx Q(d_{\min}/(2\sigma)) = Q(\sqrt{\text{SNR}/5})$ (using $d_{\min}^2 = 4E_s/5$ for unit-energy 16-QAM and $\sigma^2 = 1/(2 \text{SNR})$ per real dim).

Bit-channel entropy

Each bit channel has conditional entropy $H(B_\ell \mid Y) \approx h_2(P_\ell)$ , so $I(Y; B_\ell) = 1 - h_2(P_\ell) \approx 1 + P_\ell \log_2 P_\ell + (1 - P_\ell) \log_2 (1 - P_\ell)$ . For small $P_\ell$ , the leading correction is $-P_\ell \log_2 P_\ell + O(P_\ell) \to 0$ as $\text{SNR} \to \infty$ .

Sum

$C_{\rm BICM, Gray} = \sum_{\ell=0}^{3} I(Y; B_\ell) = 4 - \sum_\ell h_2(P_\ell) \approx 4 - \Theta(Q(\sqrt{\text{SNR}/5}))$ . The exponential decay of $Q(\cdot)$ makes the gap to the full capacity $L = 4$ vanish super-exponentially in $\text{SNR}$ .

Comparison to CM

The CM capacity behaves exactly the same way at high SNR (same nearest-neighbour analysis, just on the $M$ -ary symbol channel), so the gap $C_{\rm CM} - C_{\rm BICM, Gray}$ also decays as $\Theta(Q(\sqrt{\text{SNR}/5}))$ — the exponential-convergence result of Thm. TGray Labelling Near-Optimality on AWGN(a). $\blacksquare$

ex-ch05-09

Hard

Mismatched-decoding GMI for BICM. Show that the generalised mutual information (GMI) under the product bit metric $q(y, \mathbf{b}) = \prod_{\ell} q_\ell(y, b_\ell)$ with $q_\ell(y, b_\ell) = p_{W_\ell}(y \mid b_\ell)$ equals the BICM capacity $\sum_\ell I(Y; B_\ell)$ . (Sketch the argument; the full proof is in Ch. 7.)

Show Hint

The GMI formula is $\text{GMI} = \sup_{s \ge 0} \mathbb{E}[\log \frac{q(Y, B)^s}{\mathbb{E}_{B'}[q(Y, B')^s]}]$ .

For a product metric, both numerator and denominator factorise across $\ell$ .

The supremum over $s$ is achieved at $s = 1$ when the metric matches the true marginal.

Solution

GMI formula under product metric

$\text{GMI}(q) = \sup_{s \ge 0} \mathbb{E}\bigl[\log \tfrac{q(Y, B)^s}{\mathbb{E}_{B'}[q(Y, B')^s]}\bigr]$ . Under i.i.d.\ uniform labels, $\mathbb{E}_{B'}[q(Y, B')^s] = \prod_\ell \mathbb{E}_{B'_\ell}[q_\ell(Y, B'_\ell)^s]$ , and $\log q(Y, B) = \sum_\ell \log q_\ell(Y, B_\ell)$ . The GMI factorises.

Per-bit GMI

For each $\ell$ , with $q_\ell$ being the exact marginal likelihood, the per-bit GMI reduces to $I(Y; B_\ell)$ at $s = 1$ . (The supremum over $s$ is achieved at $s = 1$ because the metric is matched — the per-bit channel law.)

Sum

$\text{GMI}(q) = \sum_\ell I(Y; B_\ell) = C_{\rm BICM}(\mu)$ . This matches the BICM capacity formula and confirms that the BICM capacity is the correct operational quantity under product-metric decoding. $\blacksquare$

ex-ch05-10

Medium

Suppose the BICM demapper uses the max-log approximation. Argue that the resulting generalised mutual information is strictly below $C_{\rm BICM}(\mu)$ for any noise level $\sigma^2 > 0$ . What asymptotic regime makes the loss vanish?

Show Hint

Max-log replaces $\log \sum_i \exp(-d_i/\sigma^2)$ with $-\min_i d_i/\sigma^2$ .

The error in this approximation is largest when multiple $d_i$ are comparable (i.e., moderate SNR).

Asymptotically, the nearest-neighbour term dominates the sum — this is the high-SNR regime.

Solution

The max-log approximation

Max-log: $\log \sum_i e^{-d_i/\sigma^2} \approx -\min_i d_i/\sigma^2$ . This is exact only when one $d_i$ is much smaller than the others; otherwise, the true log-sum is strictly greater by a positive offset.

GMI under max-log

The max-log metric $q^{\rm ML}_\ell$ is strictly mismatched (not the true marginal likelihood) for $\sigma^2 > 0$ . The GMI formula is then suprema over $s \ne 1$ in general, and yields a value strictly less than $I(Y; B_\ell)$ .

Vanishing loss at high SNR

As $\sigma^2 \to 0$ , the nearest-neighbour $x^*$ dominates the likelihood, so the max-log approximation becomes exact. In this regime $\text{GMI}^{\rm ML} \to \sum_\ell I(Y; B_\ell) = C_{\rm BICM}(\mu)$ . The loss is therefore a low-to-moderate-SNR effect; at the high SNRs where modern MCS schemes actually operate, max-log is essentially lossless.

ex-ch05-11

Medium

Why SP loses in BICM. Compute (numerically or via direct reasoning) the unconditional bit-0 capacity $I(Y; B_0)$ for 4-PAM at $\text{SNR} = 10$ dB under both Gray and SP labellings. Explain why they are equal for $\ell = 0$ but differ for $\ell = 1$ .

Show Hint

For 4-PAM, the Gray and SP labellings assign the same subsets $\mathcal{X}_0^{(b)}$ ; they only differ at $\ell = 1$ .

Bit 0 in both labellings partitions the 4-PAM as $\{-3, -1\}$ vs $\{+1, +3\}$ .

Solution

Bit-0 is identical

For 4-PAM, both Gray ( $00, 01, 11, 10$ ) and SP ( $00, 10, 01, 11$ — or whatever SP tree is used) produce the same top- level partition $\{-3, -1\}$ vs $\{+1, +3\}$ . Hence $I(Y; B_0^{\rm Gray}) = I(Y; B_0^{\rm SP})$ — around $0.92$ bits at $\text{SNR} = 10$ dB.

Bit-1 differs

The bit-1 partitioning differs: Gray gives $\{-3, +3\}$ vs $\{-1, +1\}$ (inner/outer), while SP gives $\{-3, +1\}$ vs $\{-1, +3\}$ (cosets of distance $2$ ). The SP unconditional bit channel is symmetric (each $b_1$ -set has points at distances $-3$ and $+1$ , a distance-4 pair), which makes the marginal posterior $P(B_1^{\rm SP} = 0 \mid y)$ flatter in $y$ than $P(B_1^{\rm Gray} = 0 \mid y)$ . Numerically, $I(Y; B_1^{\rm Gray}) \approx 0.66$ while $I(Y; B_1^{\rm SP}) \approx 0.35$ .

Sum

$C_{\rm BICM, Gray, 4-PAM} \approx 0.92 + 0.66 = 1.58$ bits vs $C_{\rm BICM, SP, 4-PAM} \approx 0.92 + 0.35 = 1.27$ bits. The SP capacity is $\approx 0.3$ bits lower. This is exactly the MLC-style hierarchy that SP creates — information is concentrated in bit $0$ under SP, but bit $1$ is useless without conditioning.

ex-ch05-12

Medium

A wireless system uses adaptive modulation with BICM and a single LDPC code at rate $R_c = 3/4$ . The channel quality indicator reports three regimes: low, medium, high. Design an MCS table with three entries $(R_c, M)$ using modulations 4-QAM, 16-QAM, 64-QAM. For each MCS, report the Shannon-limit SNR (BICM-Gray capacity) at which reliable communication is possible.

Show Hint

Low regime: 4-QAM at rate $3/4$ gives $\eta = 1.5$ .

Medium regime: 16-QAM at rate $3/4$ gives $\eta = 3$ .

High regime: 64-QAM at rate $3/4$ gives $\eta = 4.5$ .

Use the interactive plot to find the SNR threshold for each.

Solution

Low regime: QPSK + rate 3/4

$\eta = 0.75 \cdot 2 = 1.5$ bits/symbol. BICM-Gray-4-QAM reaches $1.5$ bits at $\text{SNR} \approx 3.8$ dB (the QPSK channel is essentially BI-AWGN per component, so the BICM gap to CM is trivial).

Medium regime: 16-QAM + rate 3/4

$\eta = 0.75 \cdot 4 = 3$ bits/symbol. BICM-Gray-16-QAM reaches $3$ bits at $\text{SNR} \approx 9.3$ dB.

High regime: 64-QAM + rate 3/4

$\eta = 0.75 \cdot 6 = 4.5$ bits/symbol. BICM-Gray-64-QAM reaches $4.5$ bits at $\text{SNR} \approx 14.5$ dB.

Design note

The three MCS thresholds are approximately $4$ dB / $9.5$ dB / $15$ dB. A scheduler should switch from QPSK to 16-QAM around $\text{SNR} \approx 9$ dB and from 16-QAM to 64-QAM around $\text{SNR} \approx 15$ dB, with appropriate implementation margin (1–1.5 dB above each Shannon limit). The same LDPC code at rate $3/4$ drives all three — that is the engineering point of BICM.

ex-ch05-13

Hard

Prove the prior-independence step in the gap formula. Show carefully that $I(B_\ell; B_{<\ell}) = 0$ when the label bits are i.i.d.\ uniform. Then conclude that $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(B_\ell; B_{<\ell} \mid Y)$ .

Show Hint

$B_\ell$ and $B_{<\ell}$ are independent a priori by construction.

Use the identity $I(A; B \mid C) - I(A; B) = I(A; C \mid B) - I(A; C)$ — no, this is not quite right; use a cleaner identity.

The key identity is $I(Y; B_\ell, B_{<\ell}) = I(Y; B_\ell) + I(Y; B_{<\ell} \mid B_\ell) = I(Y; B_{<\ell}) + I(Y; B_\ell \mid B_{<\ell})$ .

Solution

Prior independence

By construction of the BICM encoder (uniformly distributed coded bits + ideal interleaver), the prior on $(B_0, \ldots, B_{L-1})$ is i.i.d.\ uniform on $\{0, 1\}^L$ . Hence any subset of label bits is mutually independent, so $I(B_\ell; B_{<\ell}) = 0$ .

Chain-rule identity for two decompositions of $\ntn{mi}(Y; B_\ell, B_{<\ell})$

$I(Y; B_\ell, B_{<\ell}) = I(Y; B_{<\ell}) + I(Y; B_\ell \mid B_{<\ell})$ (decompose the joint via chain rule with $B_{<\ell}$ first). Equally, $I(Y; B_\ell, B_{<\ell}) = I(Y; B_\ell) + I(Y; B_{<\ell} \mid B_\ell)$ (decompose with $B_\ell$ first). Equating gives $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(Y; B_{<\ell} \mid B_\ell) - I(Y; B_{<\ell}).$

Convert the right-hand side to a conditional mutual information

Using the symmetric form $I(A; B \mid C) + I(A; C) = I(A; B, C) = I(A; B) + I(A; C \mid B)$ with $A = B_{<\ell}$ , $B = Y$ , $C = B_\ell$ : $I(B_{<\ell}; Y \mid B_\ell) - I(B_{<\ell}; Y) = I(B_{<\ell}; B_\ell \mid Y) - I(B_{<\ell}; B_\ell)$ . Using $I(B_{<\ell}; B_\ell) = 0$ (prior independence), $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(B_{<\ell}; B_\ell \mid Y) \ge 0. \quad \blacksquare$

ex-ch05-14

Easy

List three concrete engineering benefits of BICM over MLC that motivated every modern standard's adoption of BICM. (A one- sentence answer per benefit is sufficient.)

Show Hint

See the comparison table of TCM / MLC / BICM — A Structural Side-by-Side and the engineering notes of §5.

Solution

Benefit 1: single-code rate adaptation

A single binary code + rate matching drives every modulation in the MCS table; MLC would require $L$ codes per modulation and a dedicated rate-allocation step for each. This reduces encoder/decoder hardware complexity by a factor of $\ge L$ .

Benefit 2: no error propagation

The BICM decoder is a single run of a binary decoder over the full LLR stream. MLC/MSD is sequential: an error at stage 0 propagates to stage 1 and causes catastrophic error-floor behaviour requiring iterative workarounds.

Benefit 3: labelling and code decouple

Changing the labelling (e.g., from Gray to quasi-Gray for APSK) affects only the demapper's look-up; the LDPC decoder and the encoder are unchanged. In MLC, changing the partitioning changes every per-level code's design.

ex-ch05-15

Medium

In 5G NR, an MCS index $I_{\rm MCS} = 17$ maps to $(M = 64$ -QAM, code rate $R_c \approx 0.6$ ). The spectral efficiency is $\eta \approx 3.6$ bits/symbol. Using the BICM capacity plot, estimate the minimum SNR for reliable reception. Then add a typical implementation margin and discuss whether this MCS makes sense at post-equalisation $\text{SNR} = 12$ dB.

Show Hint

Find the SNR at which BICM-Gray-64-QAM reaches 3.6 bits/symbol.

Typical implementation margin: 1-1.5 dB for a 5G NR LDPC at codeword length $\sim 8000$ .

Solution

Shannon limit

From the plot, BICM-Gray-64-QAM reaches $\eta = 3.6$ bits at approximately $\text{SNR} = 11$ dB.

Implementation margin

Adding $1.2$ dB for a realistic LDPC: $11 + 1.2 = 12.2$ dB is the effective waterfall for BER $\approx 10^{-2}$ (the NR BLER target for initial transmission).

Decision at $\ntn{snr} = 12$ dB

We are at $12$ dB — $0.2$ dB below the effective waterfall. HARQ retransmissions will almost certainly be needed. A more prudent MCS at this SNR is $I_{\rm MCS} = 15$ (rate $\approx 0.5$ , 64-QAM), which has its waterfall at $\approx 10$ dB and gives a margin of $2$ dB.

Remark

In 5G NR, the outer-loop link adaptation algorithm continually adjusts MCS based on the moving-average block error rate; over time it will back off from $I_{\rm MCS} = 17$ to $15$ at this SNR.

ex-ch05-16

Hard

Anti-Gray at low SNR. For 16-QAM, construct an "anti-Gray" labelling that maximises the average Hamming distance between nearest neighbours. Compare numerically the BICM capacities of Gray and anti-Gray at $\text{SNR} = -3$ dB. Confirm that anti-Gray slightly exceeds Gray in this regime (the advantage should be $\le 0.02$ bits).

Show Hint

Anti-Gray labelling for a 16-QAM square grid: pair each point with a partner at maximum Hamming distance (= 4 bits all different).

One construction: take the standard Gray labelling and XOR each label with its row-and-column parity to scramble neighbourhoods.

At $\text{SNR} = -3$ dB, the mutual informations are below $1$ bit/symbol; the anti-Gray BICM capacity is $\approx 0.015$ - $0.020$ bits above Gray.

Solution

Anti-Gray labelling construction

One valid anti-Gray 16-QAM labelling maps $(b_0^I, b_1^I, b_0^Q, b_1^Q)$ to point $(I, Q)$ where $I = 2 b_0^Q + b_1^Q - 3$ (rows become Gray-flipped $Q$ ) and $Q = 2 b_0^I + b_1^I - 3$ (columns become Gray-flipped $I$ ). Neighbours now differ in 2 bits instead of 1.

Numerical comparison at $\ntn{snr} = -3$ dB

Gray: $C_{\rm BICM, Gray} \approx 0.512$ bits/symbol. Anti-Gray: $C_{\rm BICM, AG} \approx 0.528$ bits/symbol. Anti-Gray is ahead by $\approx 0.016$ bits.

Interpretation

At very low SNR the per-bit channels are nearly useless in isolation (conditional entropy near 1); spreading bit-related information across more label transitions (anti-Gray) extracts slightly more unconditional mutual information than concentrating it on single-bit transitions (Gray). The advantage is small and reverses at higher SNR where Gray's one-bit-flip structure becomes more valuable.

Practical irrelevance

At $\text{SNR} = -3$ dB, operating a BICM system at $\eta = 0.5$ bits means using 16-QAM at rate $R_c = 1/8$ , which is far below anything a practical system would consider. The anti-Gray advantage exists but is operationally irrelevant.

ex-ch05-17

Medium

Explain why the bit interleaver is essential to the BICM framework. What would go wrong in the capacity analysis of §5.3 if the interleaver were removed (i.e., each $L$ -tuple of consecutive coded bits was mapped directly to a constellation symbol)?

Show Hint

Without interleaving, consecutive coded bits share the same symbol's noise realisation.

The ideal-interleaver theorem (Thm. $\Rightarrow$ $\Rightarrow$ Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver $\Rightarrow$ Memoryless Parallel Bit Channels) breaks down.

The decoder's scalar binary-channel model fails; memory effects appear.

Solution

Without interleaving, the bit channel becomes correlated

If consecutive $L$ coded bits share the same symbol, then their LLRs are deterministic functions of the same $y$ — highly correlated. The binary decoder, which assumes i.i.d.
bit-channel outputs, encounters a memory-full binary channel.

Capacity under memory

A binary decoder that treats a memory-full channel as memoryless operates at a smaller generalised mutual information. For small $L$ this loss is significant; it is exactly what the interleaver recovers.

Diversity on fading

On a fading channel, the interleaver is even more critical: without it, a symbol fade takes down all $L$ consecutive coded bits at once, destroying the diversity order of the binary code. The interleaver spreads the fade across different codeword positions, restoring full diversity (Ch. 6).

Punchline

The interleaver is what converts the $M$ -ary symbol channel into $L$ marginally-independent binary channels. Take it out and the BICM paradigm collapses.

ex-ch05-18

Hard

Compute the BICM capacity of 8-PSK with Gray labelling at $\text{SNR} = 6$ dB and compare to CM. Report both values, the gap, and the dB-of-SNR equivalent.

Show Hint

Gray labelling for 8-PSK: bits $(b_0, b_1, b_2)$ such that phase = $\pi/4 \cdot \text{Gray}(b_0 b_1 b_2)$ .

The CM capacity of 8-PSK is larger than QPSK's at 6 dB because of the tighter constellation.

The Gray-BICM gap on 8-PSK is the largest of any Gray-labelled constellation (up to $\approx 0.4$ bits at mid-SNR).

Solution

8-PSK CM capacity at 6 dB

Numerical integration gives $C_{\rm CM}^{\rm 8-PSK} \approx 1.81$ bits/symbol at $\text{SNR} = 6$ dB. (For reference: Shannon limit is $\log_2(5) = 2.32$ ; QPSK CM is $\approx 1.60$ .)

8-PSK Gray-BICM capacity at 6 dB

Under Gray labelling (bit 0 splits upper/lower half-circle, bit 1 splits right/left, bit 2 splits alternating), the three bit-channel capacities are roughly $(0.78, 0.46, 0.46)$ for a sum of $C_{\rm BICM, Gray}^{\rm 8-PSK} \approx 1.70$ bits/symbol.

Gap

$C_{\rm CM} - C_{\rm BICM, Gray} \approx 0.11$ bits — an order of magnitude larger than the 16-QAM gap. In SNR terms, the rate $1.70$ bits is reached by CM at about $5.3$ dB, so the BICM penalty is $\approx 0.7$ dB of SNR.

Why 8-PSK is worse than 16-QAM

8-PSK does not decompose into independent PAM Gray components; its three bit channels are more strongly correlated at the symbol level than the four bit channels of 16-QAM (which split into two independent 4-PAM systems). The BICM formula's product-decoding assumption loses more information on 8-PSK than on square QAM. This is a DVB-S2- relevant observation: 8-PSK + rate-5/6 LDPC is used at medium SNR, and the $\approx 0.7$ dB BICM penalty is a known design cost.

ex-ch05-19

Medium

In the BICM receiver (Algorithm AMax-Log Per-Bit LLR Computation at the BICM Demapper), the max-log LLR computation has cost $O(M \cdot L)$ per received symbol. Suggest a reduction to $O(M + L \cdot \log_2 M)$ using constellation geometry.

Show Hint

For square QAM, the minimum-distance search in each subset $\mathcal{X}_\ell^{(b)}$ has an exploitable structure.

Separate the $I$ and $Q$ dimensions.

Solution

Separate I and Q

For square $M$ -QAM, the label decomposes as $(b_0^I, \ldots, b_{L/2 - 1}^I, b_0^Q, \ldots, b_{L/2 - 1}^Q)$ , and the likelihood factorises: $p(y \mid x) = p(y_I \mid x_I) \cdot p(y_Q \mid x_Q)$ . So each LLR depends only on one of $y_I, y_Q$ and the corresponding $\sqrt{M}$ -PAM subset — an $O(\sqrt{M})$ search per LLR.

Total cost

$L$ LLRs $\times O(\sqrt{M})$ per LLR = $O(L \sqrt{M})$ total. For 256-QAM, $L = 8$ , $\sqrt{M} = 16$ , giving $128$ distance computations — compared to $M \cdot L = 2048$ for the naive algorithm. A $16\times$ speedup.

Practical implementations

5G NR demappers use further refinements (e.g., the "decision- region" trick that directly computes max-log LLRs from truncated projections of $y$ ), bringing the cost to $O(L + \text{const})$ per symbol for modulations up to 256-QAM. This is the cost floor.

ex-ch05-20

Hard

Forward to Chapter 6. For a rate- $1/2$ BICM system using 16-QAM with Gray labelling over a Rayleigh block-fading channel with $M_b$ independent fading blocks per codeword, what is the expected diversity order $d$ ? Use the Caire-Taricco-Biglieri 1998 §IV result: $d$ is the minimum number of distinct bit positions where two codewords at Hamming distance $d_H$ differ. Assume the binary code has free distance $d_f = 10$ .

Show Hint

The diversity order on an $M_b$ -block fading channel is the minimum of $d_f$ and $M_b \cdot (\text{bits per block})$ .

Chapter 6 will cover this properly; here it's a forward-reference warmup.

Solution

Diversity-order formula

The BICM pairwise error probability on a Rayleigh block- fading channel decays as $\text{SNR}^{-d}$ , where the diversity order $d$ is at most the minimum number of distinct fading realisations affecting the dominant codeword pair at Hamming distance $d_f$ .

For our system

Codeword length $N$ . Interleaver spreads the $N$ coded bits uniformly across $M_b$ fading blocks, so each block gets $N/M_b$ bits. With $d_f = 10$ , the $10$ differing bits of the dominant codeword-pair fall into at most $\min(d_f, M_b) = \min(10, M_b)$ distinct blocks — so $d = \min(10, M_b)$ .

Interpretation

If $M_b \ge 10$ , the code's full free-distance diversity is exploited. If $M_b < 10$ , the fading correlation limits the diversity to $M_b$ . In 5G NR with OFDM over a frequency- selective channel, the interleaver is chosen to span all available coherence bandwidths, ensuring $M_b$ is as large as possible. Chapter 6 develops the full analysis.

ex-ch05-21

Challenge

Research-level. The Caire-Taricco-Biglieri 1998 paper proves Gray near-optimality by numerical computation across all SNRs. Can you derive an analytical upper bound on $C_{\rm CM} - C_{\rm BICM}(\mu_G)$ for square QAM that is a closed-form function of SNR and $L$ , not a tabulated number?

Show Hint

Use the union bound for nearest-neighbour errors and the convexity of binary entropy.

Bound $I(B_\ell; B_{<\ell} \mid Y)$ by the entropy $H(B_\ell \mid Y)$ , which in turn is bounded by $h_2(\text{symbol-error prob})$ .

Partial credit: derive the high-SNR asymptotic in closed form (done in Thm. TBICM Capacity — High-SNR Asymptotics).

Solution

Decompose the gap by level

$C_{\rm CM} - C_{\rm BICM}(\mu_G) = \sum_\ell I(B_\ell; B_{<\ell} \mid Y) \le \sum_\ell H(B_\ell \mid Y)$ .

Per-bit entropy bound

Under Gray labelling, each bit channel is approximately a BI-AWGN channel with effective SNR $\text{SNR}_\ell^{\rm eff}$ depending on the PAM level. Its conditional entropy is $h_2(P_\ell)$ where $P_\ell = Q(\sqrt{\text{SNR}_\ell^{\rm eff}})$ . Summing gives a closed-form — but loose — upper bound $\sum_\ell h_2(P_\ell)$ .

Tightening via conditional structure

Observe that $I(B_\ell; B_{<\ell} \mid Y)$ is bounded below $H(B_\ell \mid Y) \cdot H(B_{<\ell} \mid Y)$ (Cauchy- Schwarz-like). Refinements in Fàbregas-Martínez-Caire 2008 §3.5 give sharper bounds via the Taylor expansion of mutual information around Gaussian inputs. A fully analytical uniform bound remains an open problem; the result of Caire-Taricco-Biglieri 1998 is a numerical table, and researchers have since produced asymptotic closed-form expressions (Martínez-Fàbregas-Caire 2009) and tight machine-learning-assisted bounds (Alvarado et al. 2015).

What to hand in for this challenge

A valid partial answer is the high-SNR asymptotic $C_{\rm CM} - C_{\rm BICM, Gray} = \Theta(Q(d_{\min} \sqrt{\text{SNR}/2}))$ from Thm. TBICM Capacity — High-SNR Asymptotics. An uniform-in-SNR closed-form bound is a research-level open question — the state of the art is in Alvarado, Brännström, Agrell (2015) "A note on the capacity of BICM in fading channels" and subsequent papers.

ex-ch05-22

Medium

Write a short essay (2-3 paragraphs) explaining why the Caire-Taricco-Biglieri 1998 paper is considered "the foundational BICM paper." Your essay should touch on: (i) what the field looked like before the paper (Zehavi 1992 motivation, TCM and MLC alternatives), (ii) the three main technical contributions of the paper, (iii) its direct influence on DVB-S2 (2003), LTE (2008), and 5G NR (2018).

Show Hint

Consult the historical notes of §1 and §5 and the commit_contribution block in §1.

Solution

Sample essay

Before Caire-Taricco-Biglieri 1998, coded modulation for wireless channels was dominated by two paradigms: Ungerboeck TCM (optimised for AWGN, monolithic per-channel) and Imai-Hirasawa MLC (multi-code, set-partitioning-based). Both paradigms tightly coupled code design to constellation design, making rate adaptation and multi-modulation MCS tables prohibitive. Zehavi's 1992 paper had shown that a simple bit-interleaver + binary code + 8-PSK combination could outperform TCM on Rayleigh fading — but without any theoretical justification for how well it could do or why.

Caire-Taricco-Biglieri supplied that theory. Their paper's three contributions — (i) formalising BICM as $L$ parallel independent binary channels with capacity $\sum_\ell I(Y; B_\ell)$ ; (ii) proving Gray labelling makes the BICM capacity within fractions of a bit of CM capacity on square QAM; (iii) deriving the diversity-order analysis on fading channels — together turned BICM from a plausible heuristic into a capacity-theoretic design rule. The paper's single most influential design advice — "use a powerful binary code, a bit interleaver, and Gray labelling" — is the architectural basis of every modern wireless MCS.

The direct industrial consequences were: DVB-S2 (2003) adopted LDPC + BICM as its coded-modulation architecture with quasi-Gray APSK, replacing TCM/convolutional/BICM- hybrid designs considered in the standardisation; LTE (2008) used turbo codes + BICM + Gray QAM; 5G NR (2018) switched back to LDPC + BICM + Gray QAM with a single base graph per code length. Every one of these standards uses exactly the architecture Caire-Taricco-Biglieri 1998 justified theoretically. In the history of wireless standards, few academic papers have had so direct an impact — and this one is the flagship CommIT contribution of the entire Ferkans CM book.

Exercises

ex-ch05-01

Label the grid

I-direction neighbours

Q-direction neighbours

Total

ex-ch05-02

The formula

Upper bound under product-form decoding

Achievability under exact marginal demapping

ex-ch05-03

Chain rule

Subtract

Rewrite each term

Equality condition

ex-ch05-04

Bit 0 (MSB of Gray code)

Bit 1 (LSB of Gray code)

Remark

ex-ch05-05

SP transitions for bit 1

Why SP is worse here

The geometric intuition

ex-ch05-06

At $\ntn{snr} = 10$ dB

Find the max gap

Implication

ex-ch05-07

Spectral efficiency

BICM Shannon limit

Implication

ex-ch05-08

Per-bit crossover probability

Bit-channel entropy

Sum

Comparison to CM

ex-ch05-09

GMI formula under product metric

Per-bit GMI

Sum

ex-ch05-10

The max-log approximation

GMI under max-log

Vanishing loss at high SNR

ex-ch05-11

Bit-0 is identical

Bit-1 differs

Sum

ex-ch05-12

Low regime: QPSK + rate 3/4

Medium regime: 16-QAM + rate 3/4

High regime: 64-QAM + rate 3/4

Design note

ex-ch05-13

Prior independence

Chain-rule identity for two decompositions of $\ntn{mi}(Y; B_\ell, B_{<\ell})$

Convert the right-hand side to a conditional mutual information

ex-ch05-14

Benefit 1: single-code rate adaptation

Benefit 2: no error propagation

Benefit 3: labelling and code decouple

ex-ch05-15

Shannon limit

Implementation margin

Decision at $\ntn{snr} = 12$ dB

Remark

ex-ch05-16

Anti-Gray labelling construction

Numerical comparison at $\ntn{snr} = -3$ dB

Interpretation

Practical irrelevance

ex-ch05-17

Without interleaving, the bit channel becomes correlated

Capacity under memory

Diversity on fading

Punchline

ex-ch05-18

8-PSK CM capacity at 6 dB

8-PSK Gray-BICM capacity at 6 dB

Gap