Ferkans — Interactive Telecom Tutor

ex-ch03-01

Easy

Write out the partition chain for 16-QAM under the Ungerboeck rule and compute the intra-subset squared distance $d_i^2$ for $i = 0, 1, 2, 3$ (normalise so that $E_s = 1$ ).

Show Hint

Each Ungerboeck partition step doubles the squared intra-subset distance.

For 16-QAM, $d_0^2 = 2/5$ at unit average energy (the nearest-neighbour spacing of a standard 4x4 grid).

Solution

Level 0: full 16-QAM

At unit $E_s$ the nearest-neighbour squared distance of square $M$ -QAM is $d_0^2 = 6/(M-1)$ . For $M = 16$ , $d_0^2 = 6/15 = 2/5 = 0.4$ .

Doubling rule

Each Ungerboeck partition step on square QAM doubles the squared distance (checkerboard sublattice). So $d_1^2 = 0.8$ , $d_2^2 = 1.6$ , $d_3^2 = 3.2$ .

Verification

At level $L = 4$ each coset has $M/2^L = 1$ point — no further splitting. The rate-allocation story will use $d_3^2 = 3.2$ , i.e.
an antipodal-BPSK-like channel at the bottom level. $\blacksquare$

ex-ch03-02

Easy

Confirm that the total MLC rate at the capacity allocation is $\sum_{i=0}^{L-1} C_i = I(Y; X)$ . Give the one-line derivation from the chain rule.

Show Hint

Apply the chain rule to $I(Y; B_0, \ldots, B_{L-1})$ .

Use the bijection $X \leftrightarrow (B_0, \ldots, B_{L-1})$ under partition-based labelling.

Solution

Chain rule and bijection

By the chain rule, $I(Y; B_0, \ldots, B_{L-1}) = \sum_{i=0}^{L-1} I(Y; B_i \mid B_0, \ldots, B_{i-1}) = \sum_{i=0}^{L-1} C_i$ . Under partition-based labelling $\mu$ the map $(B_0, \ldots, B_{L-1}) \to X$ is a bijection, so $I(Y; B_0, \ldots, B_{L-1}) = I(Y; X) = C_{\rm CM}$ . $\blacksquare$

ex-ch03-03

Medium

For 8-PSK at $E_s/N_0 = 6$ dB, estimate each of $C_0, C_1, C_2$ from the interactive plot in s02 and give the capacity-rule rate allocation. What is the total rate and how far below Shannon?

Show Hint

Read the plot at $\text{SNR} = 6$ dB.

Compute $\log_2(1 + 10^{0.6}) = \log_2(1 + 3.98) \approx 2.32$ bits for the Shannon capacity.

Solution

Read off the plot

At $\text{SNR} = 6$ dB, $\text{SNR} = 3.98$ . The effective per-dimension SNRs are $d_0^2 \text{SNR}/2 \approx 1.17$ , $d_1^2 \text{SNR}/2 = 3.98$ , $d_2^2 \text{SNR}/2 = 7.96$ . Computing the binary capacities: $C_0 \approx 0.40$ , $C_1 \approx 0.87$ , $C_2 \approx 0.99$ .

Allocation and total

The capacity-rule allocation is $(R_0, R_1, R_2) \approx (0.40, 0.87, 0.99)$ , totalling $\approx 2.26$ bits/symbol.

Gap to Shannon

Shannon gives $\log_2(1 + 3.98) \approx 2.32$ bits. The 8-PSK MLC/MSD capacity is thus within $0.06$ bits of Shannon at $6$ dB — the modulation-capacity loss of 8-PSK is very small at this SNR, confirming that MLC has essentially reached the Shannon limit.

ex-ch03-04

Medium

Show that $C_{\rm CM} - C_{\rm BICM}(\mu) = \sum_{i=1}^{L-1} I(B_i; B_{<i} \mid Y)$ , where $B_{<i} = (B_0, \ldots, B_{i-1})$ . Interpret the term on the right.

Show Hint

Use the definitions $C_{\rm CM} = \sum_i I(Y; B_i \mid B_{<i})$ and $C_{\rm BICM} = \sum_i I(Y; B_i)$ .

Apply the chain rule $I(Y, B_{<i}; B_i) = I(Y; B_i) + I(B_{<i}; B_i \mid Y)$ .

Use the independence of the a priori label bits: $I(B_i; B_{<i}) = 0$ .

Solution

Decompose each term

The chain rule gives $I(Y, B_{<i}; B_i) = I(B_{<i}; B_i) + I(Y; B_i \mid B_{<i})$ . Also $I(Y, B_{<i}; B_i) = I(Y; B_i) + I(B_{<i}; B_i \mid Y)$ . Equating and using a priori independence $I(B_{<i}; B_i) = 0$ ,

$I(Y; B_i \mid B_{<i}) \;=\; I(Y; B_i) + I(B_i; B_{<i} \mid Y).$

Sum over levels

Summing from $i = 1$ to $L - 1$ (the $i = 0$ term has empty $B_{<0}$ and both sides agree),

$C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{i=1}^{L-1} I(B_i; B_{<i} \mid Y).$

$\blacksquare$

Interpretation

The gap is the total information that previous label bits carry about the current label bit given the channel output — that is, the a posteriori correlation between label bits that BICM throws away by treating each bit position as independent. Gray labelling makes this correlation small; partition-based labelling makes it large (which is why MLC/MSD exploits it).

ex-ch03-05

Medium

In MSD with partition-based labelling of 8-PSK, assume the stage-0 code operates with residual BER $p_{<1} = 10^{-2}$ (pessimistic case). Using the error-propagation bound of Thm thm-msd-error-propagation, give an upper bound on the stage-1 BER in terms of the genie BER $P_{e,1}^{\rm genie}$ . At what value of $p_{<1}$ does the propagation term begin to dominate the genie term, assuming $P_{e,1}^{\rm genie} = 10^{-4}$ ?

Show Hint

Apply $P_{e,1} \le P_{e,1}^{\rm genie} + p_{<1}$ .

Set the two terms equal and solve for $p_{<1}$ .

Solution

Apply the bound

$P_{e,1} \le 10^{-4} + 10^{-2} = 1.01 \times 10^{-2}$ . At this extreme value of $p_{<1}$ , the propagation term dominates by two orders of magnitude.

Find the crossover

The propagation term dominates when $p_{<1} \gtrsim P_{e,1}^{\rm genie} = 10^{-4}$ . So as long as the stage-0 decoder achieves BER below $10^{-4}$ , propagation is negligible at this hypothetical operating point. In practice, at the capacity-rule operating point the stage-0 BER is driven to $10^{-5}$ or better by the outer code, so propagation is essentially absent.

ex-ch03-06

Easy

A designer builds an MLC system for 16-QAM at a target aggregate rate of $3.5$ bits/symbol. From the plots in s02, read off approximately how this rate should be split across the four levels when operating at the capacity threshold.

Show Hint

Compute the four $C_i$ values at the SNR where their sum equals $3.5$ .

Match each $R_i$ to its corresponding $C_i$ .

Solution

Find the target SNR

From the plot, 16-QAM reaches $C_{\rm CM} = 3.5$ bits around $\text{SNR} \approx 13$ dB (i.e. $E_s/N_0 \approx 20$ ).

Read the per-level capacities

At $\text{SNR} = 13$ dB the four binary sub-channel capacities are approximately $C_0 \approx 0.62$ , $C_1 \approx 0.90$ , $C_2 \approx 0.99$ , $C_3 \approx 1.00$ . Sum: $3.51$ bits, very close to the target.

Allocation

$(R_0, R_1, R_2, R_3) \approx (0.62, 0.90, 0.99, 1.00)$ . The level-0 binary code is rate $0.62$ ; the level-3 code is essentially uncoded. This asymmetric split is characteristic of the capacity rule for QAM.

ex-ch03-07

Hard

Suppose $B_0$ and $B_1$ are independent Bernoulli(1/2) and $Y = (-1)^{B_0} + (-1)^{B_0 \oplus B_1} \cdot 2 + W$ with $W \sim \mathcal{N}(0, \sigma^2)$ . (This is a 4-PAM with the unusual labelling $(b_0, b_1) \to \{-3, -1, 1, 3\}$ under a specific permutation.) Compute $I(Y; B_0)$ , $I(Y; B_1 \mid B_0)$ , and $I(Y; B_1)$ numerically for $\sigma^2 = 1$ and compare the sums $I(Y; B_0) + I(Y; B_1)$ and $I(Y; B_0) + I(Y; B_1 \mid B_0)$ .

Show Hint

Compute numerically by integrating over $Y$ .

The four points are at $\{-3, -1, 1, 3\}$ ; identify the mapping from $(b_0, b_1)$ .

Use $I(Y; B_i) = H(B_i) - H(B_i \mid Y)$ and similar for the conditional.

Solution

Identify the mapping

Decode $X = (-1)^{B_0} + (-1)^{B_0 \oplus B_1} \cdot 2$ . The four label–point pairs are: $(0,0) \to 1 + 2 = 3$ , $(0,1) \to 1 - 2 = -1$ , $(1,0) \to -1 - 2 = -3$ , $(1,1) \to -1 + 2 = 1$ . So $\mathcal{X} = \{-3, -1, 1, 3\}$ with a non-Gray labelling (adjacent amplitudes differ in both label bits).

Numerical computation (sketch)

At $\sigma^2 = 1$ , simulation gives approximately $I(Y; X) \approx 1.78$ bits, $I(Y; B_0) \approx 0.79$ , $I(Y; B_1) \approx 0.79$ , $I(Y; B_1 \mid B_0) \approx 0.99$ . (Numerical values depend on exact integration scheme; the qualitative ordering is robust.)

Comparison

$I(Y; B_0) + I(Y; B_1) = 1.58$ bits (BICM). $I(Y; B_0) + I(Y; B_1 \mid B_0) = 1.78$ bits (MLC/MSD, equals $I(Y; X)$ ). The MLC gap over BICM is $0.2$ bit — substantial, because the labelling is not Gray. Replacing with Gray labelling would reduce this gap considerably.

ex-ch03-08

Easy

State and briefly justify the claim that under Gray labelling of 16-QAM the quantity $I(B_i; B_{<i} \mid Y)$ is small at high SNR.

Show Hint

At high SNR, conditioning on Y almost determines X.

Under Gray labelling, adjacent constellation points differ in a single bit.

Solution

High-SNR limit of $I(B_i; B_{<i} \mid Y)$

At high SNR the channel output $Y$ almost uniquely determines the transmitted point $X$ , hence the label vector $(B_0, \ldots, B_{L-1})$ . When the label is essentially determined by $Y$ , conditional mutual information $I(B_i; B_{<i} \mid Y)$ tends to zero because both variables become determined functions of $Y$ .

Role of Gray labelling

Gray labelling ensures that the dominant confusions at moderate SNR (between neighbouring constellation points) flip only one bit at a time. So conditioning on the correctly-decoded other bits does not significantly change the posterior of the remaining bit: the correlation $I(B_i; B_{<i} \mid Y)$ is already weak even at moderate SNR.

Conclusion

The CM-BICM gap $\sum_i I(B_i; B_{<i} \mid Y)$ is small under Gray labelling, consistent with the interactive plot in s04 showing the two curves nearly coincident.

ex-ch03-09

Hard

For the Ungerboeck partition chain of 8-PSK, suppose we use MLC but with a Gray labelling map instead of partition-based. Does the capacity rule $R_i = C_i$ still give the same total capacity? Does MSD still achieve it?

Show Hint

Is the map $(B_0, B_1, B_2) \to X$ still a bijection?

Does MSD at level $i$ see a binary channel of capacity $I(Y; B_i \mid B_{<i})$ under Gray?

Solution

Chain-rule decomposition is labelling-agnostic

For any bijective labelling $\mu$ , the chain rule gives $C_{\rm CM} = I(Y; X) = I(Y; B_0, \ldots, B_{L-1}) = \sum_i I(Y; B_i \mid B_{<i})$ . So the capacity rule always sums to $C_{\rm CM}$ — the total capacity is labelling-independent.

Per-level capacities change

Under Gray labelling the distribution $(B_0, \ldots, B_{L-1}) \to X$ is different — the per-level $C_i$ values change. In particular, under Gray the unconditional $I(Y; B_i)$ values are larger, but the conditional-on-history $I(Y; B_i \mid B_{<i})$ values are smaller (in aggregate, still summing to the same $C_{\rm CM}$ ).

MSD still works

The MSD algorithm is agnostic to the specific labelling — it computes LLRs by summing over the current coset, regardless of what bits label it. As long as $\mu$ is a bijection and the capacity-rule allocation $R_i = C_i$ is used, MSD achieves $C_{\rm CM}$ .

Practical caveat

The partition-based labelling is special in that the first $i$ decoded bits pick out a specific set-partitioning coset with known distance structure, making the stage- $i$ binary channel analytically tractable. Under Gray labelling the stage- $i$ "coset" (preimage of a specific $(b_0, \ldots, b_{i-1})$ ) is a more complex subset of $\mathcal{X}$ and LLR computation is less clean. This is why partition-based labelling is the conventional choice for MLC even though the capacity rule holds for any bijection.

ex-ch03-10

Medium

A 5G NR system uses $M$ -QAM with one LDPC code per MCS (no MLC), and its MCS table specifies one (modulation, LDPC-rate) pair per spectral efficiency. If the designer instead wanted to deploy MLC, how would the MCS table change? Estimate the size of the new table for modulations QPSK, 16-QAM, 64-QAM, 256-QAM.

Show Hint

Count levels per modulation: $\log_2 M$ .

Each level gets its own LDPC rate.

Solution

Level counts

QPSK: $L = 2$ . 16-QAM: $L = 4$ . 64-QAM: $L = 6$ . 256-QAM: $L = 8$ . Total across modulations: $2 + 4 + 6 + 8 = 20$ per-level code rates.

Codebook growth

A BICM MCS table for these four modulations has $4$ code rates (one per modulation, picked to match $C_{\rm BICM}$ at each SNR). An MLC MCS table would require $20$ code rates — a 5× growth in the codebook, plus the associated storage and verification overhead.

System impact

Each MCS in 5G requires (i) standardised code specification (base graph + lifting + rate-matching), (ii) encoder/decoder hardware support, and (iii) conformance testing. Multiplying this effort by $5\times$ is why 5G NR did not adopt MLC — the capacity gain was judged not worth the engineering cost.

ex-ch03-11

Easy

Write down the complexity of MSD as a function of $L$ and the binary-decode complexity $D(n, R)$ , and compare with the complexity of a joint ML decoder over all levels at a given block length $n$ .

Show Hint

MSD is sequential; joint ML is exhaustive.

Solution

MSD complexity

MSD runs $L$ binary decoders sequentially, plus per-symbol LLR computations. Complexity: $\sum_{i=0}^{L-1} [D(n, R_i) + n \cdot 2^{L-i}] = O(L \cdot D_{\max} + n \cdot 2^L)$ . The second term is the total LLR work, linear in $n \cdot M$ .

Joint ML complexity

Joint ML evaluates the likelihood for every combination of $L$ binary codewords: $\prod_{i=0}^{L-1} 2^{n R_i} = 2^{n \sum_i R_i} = 2^{n R_{\rm total}}$ candidates. Exponential in $n R_{\rm total}$ .

Comparison

For typical block lengths $n = 10^3$ to $10^4$ and rates $R_{\rm total} = 3$ to $8$ bits, joint ML is $2^{3000}$ to $2^{80000}$ evaluations — utterly infeasible. MSD at $n = 10^4$ , $L = 4$ , $D(n, R) = 10^6$ operations is $\sim 4 \times 10^6 + 10^4 \cdot 16 \approx 4 \times 10^6$ — comfortably within modern hardware budgets.

ex-ch03-12

Hard

Prove that if the label bits $B_0, \ldots, B_{L-1}$ are conditionally independent given $Y$ , then $C_{\rm CM} = C_{\rm BICM}(\mu)$ .

Show Hint

Conditional independence means $H(B_{<i} \mid Y, B_i) = H(B_{<i} \mid Y)$ .

Equivalently, $I(B_i; B_{<i} \mid Y) = 0$ .

Solution

Translate conditional independence

Conditional independence of $(B_0, \ldots, B_{L-1})$ given $Y$ means $I(B_i; B_j \mid Y) = 0$ for all $i \ne j$ . By the chain rule, $I(B_i; B_{<i} \mid Y) = \sum_{j < i} I(B_i; B_j \mid Y, B_0, \ldots, B_{j-1})$ . Each term is at most $I(B_i; B_j \mid Y)$ (data processing with respect to extra conditioning, carefully justified when the label bits are a priori independent). Hence $I(B_i; B_{<i} \mid Y) = 0$ .

Apply exercise ex-ch03-04

By the identity in ex-ch03-04, $C_{\rm CM} - C_{\rm BICM}(\mu) = \sum_{i=1}^{L-1} I(B_i; B_{<i} \mid Y) = 0$ . So $C_{\rm CM} = C_{\rm BICM}(\mu)$ . $\blacksquare$

Remark

Pathological channels (e.g., where $Y$ is itself a vector with independent components, one per bit) can saturate this bound. For scalar AWGN channels with a single $Y$ and uniform inputs it almost never does — the bits typically remain conditionally correlated given $Y$ , so $C_{\rm BICM} < C_{\rm CM}$ strictly.

ex-ch03-13

Medium

The capacity rule sets $R_i = C_i$ for every level. What is the error exponent of MLC/MSD at the aggregate rate $C_{\rm CM} - L \epsilon$ (with $\epsilon > 0$ small)?

Show Hint

Each stage uses a code at rate $C_i - \epsilon$ with block length $n$ .

Binary codes at rate $C - \epsilon$ have error probability $\le 2^{-n E(\epsilon)}$ .

A union bound over $L$ stages preserves exponential decay.

Solution

Per-stage exponent

At stage $i$ with rate $R_i = C_i - \epsilon$ and block length $n$ , the Gallager random-coding exponent $E_i(\epsilon) > 0$ bounds the stage- $i$ error probability by $2^{-n E_i(\epsilon)}$ .

Union bound

The aggregate error probability is at most $L \cdot \max_i 2^{-n E_i(\epsilon)} = L \cdot 2^{-n \min_i E_i(\epsilon)}$ . The effective error exponent of MLC/MSD is $E_{\rm MLC/MSD}(\epsilon) = \min_i E_i(\epsilon)$ .

Interpretation

The MLC/MSD error exponent is dominated by the worst (smallest- exponent) level — typically the top level where $C_0$ is smallest and the channel is weakest. Designers protect this level with the strongest code (lowest rate, highest redundancy), which exactly matches the capacity rule's allocation.

ex-ch03-14

Challenge

Consider 8-PSK with the Ungerboeck chain and uniform inputs. Derive a closed-form lower bound on $C_0 = I(Y; B_0)$ at SNR $\text{SNR}$ , using the binary-input AWGN channel formula with intra-level squared distance $d_0^2 = (2\sin(\pi/8))^2$ . Compare your bound numerically against the exact $C_0$ at $\text{SNR} = 4, 8, 12$ dB using the simulator.

Show Hint

The binary-input AWGN capacity bound $C_{\rm BIAWGN}(d^2 \cdot \text{SNR}/2)$ is a lower bound on $I(Y; B_0)$ .

$C_{\rm BIAWGN}(\gamma) = 1 - \int \phi(y - \sqrt\gamma) \log_2(1 + e^{-2 y \sqrt\gamma}) dy$ .

Compare with the plot output numerically.

Solution

Antipodal lower bound

Condition on the specific sub-pair of 8-PSK points selected by the other two level bits. With $B_0$ determining which of two sub-constellations (rotated QPSK's at the minimum intra-level distance $d_0$ ) is used, the worst-case binary channel is antipodal with squared distance $d_0^2$ . So $C_0 \ge C_{\rm BIAWGN}(d_0^2 \text{SNR}/2)$ where $d_0^2 = (2\sin(\pi/8))^2 \approx 0.586$ .

Numerical comparison

At $\text{SNR} = 4$ dB: the antipodal bound gives approximately $0.23$ bit, while the simulator's exact $C_0$ is around $0.31$ bit — the bound is a few tenths of a bit loose (because the sub-constellation is QPSK-like, not antipodal). At $\text{SNR} = 8$ dB: bound $\approx 0.50$ , exact $\approx 0.57$ . At $\text{SNR} = 12$ dB: bound $\approx 0.75$ , exact $\approx 0.82$ .

Remark on the bound's slackness

The antipodal bound is loose by a constant $\sim 0.07$ bits across SNRs. A tighter calculation uses the fact that the sub- constellation carries two bits ( $B_1, B_2$ ) of uncertainty, not zero — the "pseudo-antipodal" exact integral takes this into account. The operational lesson: even crude BI-AWGN bounds are within $0.1$ bit of the exact $C_i$ for 8-PSK.

ex-ch03-15

Medium

A designer argues that MLC should be preferred over BICM because " $C_{\rm CM} > C_{\rm BICM}$ always." Give three practical counter- arguments a standards committee would accept.

Show Hint

Think about the codebook size, rate adaptation, and interleaver depth.

Solution

Counter-argument 1: codebook growth

MLC needs $L$ LDPC rates per modulation; BICM needs one. For 5G NR with $L$ ranging from $2$ to $8$ across four modulations, MLC's codebook is roughly $5\times$ larger — more standard tables, more encoder/decoder circuitry, more conformance testing.

Counter-argument 2: rate adaptation

Adaptive modulation picks a new MCS every slot. BICM changes one rate; MLC must change $L$ rates jointly, each computed via the capacity rule at the current SNR. The real-time rate-update logic is substantially more complex.

Counter-argument 3: interleaver depth for fading

In frequency-selective or time-selective fading, BICM's single interleaver can spread code bits across uncorrelated fading instances — a diversity gain essential to wireless. MLC's row-structured MSD decoder is incompatible with deep bit interleaving, forfeiting this diversity.

Final remark

On top of these, Gray-BICM's residual capacity gap is small ( $<0.5$ bit, usually $<0.2$ bit at high SNR), and BICM-ID (iterative demapping) recovers most of it. So the practical capacity loss is negligible, while the complexity saving is substantial.

ex-ch03-16

Medium

Using the interactive MSD error-propagation plot, determine the minimum prior-stage BER $p_{<i}$ at which the effective stage-1 BER is no more than twice the genie BER. Assume 8-PSK.

Show Hint

The effective BER is roughly $P_{e,1}^{\rm genie} + p_{<i} P_{e,1}^{\rm wrong-coset}$ .

Solve $p_{<i} P_{e,1}^{\rm wrong-coset} = P_{e,1}^{\rm genie}$ for the critical ratio.

Solution

Set up the critical ratio

The "twice genie" condition reads $P_{e,1} \le 2 P_{e,1}^{\rm genie}$ , i.e.
$p_{<i} P_{e,1}^{\rm wrong-coset} \le P_{e,1}^{\rm genie}$ . Solving, $p_{<i} \le P_{e,1}^{\rm genie} / P_{e,1}^{\rm wrong-coset}$ .

Estimate the ratio for 8-PSK at moderate SNR

For 8-PSK level 1 at the capacity threshold, the squared distance ratio between correct and wrong coset is $2$ , corresponding to a $3$ dB effective SNR loss. At operating points where $P_{e,1}^{\rm genie} = 10^{-4}$ , the wrong-coset BER is typically $10^{-2}$ to $10^{-3}$ (read off the plot). The critical ratio is therefore $\sim 10^{-4}/10^{-3} = 0.1$ , or equivalently $p_{<i} \lesssim 10^{-4}$ .

Design implication

As long as the stage-0 outer code achieves BER below about $10^{-4}$ , propagation at stage 1 is benign. This is easily achieved by the low-rate LDPC code the capacity rule assigns to level 0.

ex-ch03-17

Easy

Name three niches where MLC still beats BICM and explain briefly why.

Show Hint

Think about constellations where Gray labelling doesn't exist, and contexts where partition structure is natural.

Solution

APSK constellations

Amplitude–phase shift keying (used in DVB-S2X) has ring structure that prevents a consistent Gray labelling across rings. MLC with partition-based labelling recovers the capacity that BICM leaves on the table.

Lattice coded modulation

Lattices $\Lambda_0 \supset \Lambda_1 \supset \cdots$ come equipped with a natural partition chain (the sub-lattice sequence). MLC operates at each level on the coset-index binary channel, and the capacity rule here is the fundamental rate-allocation theorem for lattice-coded modulation (Ch. 4).

High-dimensional constellations

For $N \gg 2$ dimensional constellations (e.g., $E_8$ lattice points, $N = 24$ Leech lattice slices), the Gray labelling becomes combinatorially awkward while the partition structure remains clean. MLC is the natural framework.

Remark on the QAM workhorse

Outside these niches — for 2D QAM on scalar AWGN — Gray-BICM essentially matches MLC and wins on complexity.

ex-ch03-18

Challenge

Derive the MLC capacity rule from the converse direction: assume an MLC scheme with rates $(R_0, \ldots, R_{L-1})$ is decodable by MSD with vanishing error probability. Show that $R_i \le C_i$ for every $i$ .

Show Hint

At stage $i$ , the decoder sees a binary channel of capacity $C_i$ (conditional on correct history).

Apply the converse to the binary channel coding theorem at each stage.

Handle the conditioning by noting that history errors vanish under the assumed decodability.

Solution

Setup: conditional decoding at stage $i$

By assumption, the aggregate error probability $P_e^{(n)} \to 0$ . Let $\mathcal{E}_i$ be the stage- $i$ error event. Then $\Pr(\mathcal{E}_i) \le P_e^{(n)} \to 0$ , so each stage's conditional error probability (conditional on correct previous stages) also vanishes.

Fano's inequality at each stage

At stage $i$ with conditional error probability $\epsilon_i \to 0$ , Fano's inequality gives $H(B_i^n \mid Y^n, B_{<i}^n) \le n h_2(\epsilon_i) + n \epsilon_i R_i$ , which tends to zero as $n \to \infty$ . Combined with $H(B_i^n) = n R_i$ and the data-processing identity $H(B_i^n) = I(B_i^n; Y^n, B_{<i}^n) + H(B_i^n \mid Y^n, B_{<i}^n)$ ,

$n R_i \le n C_i + n h_2(\epsilon_i) + n \epsilon_i R_i.$

Let $n \to \infty$

Dividing by $n$ and letting $n \to \infty$ with $\epsilon_i \to 0$ , $R_i \le C_i$ . This holds for every $i$ . $\blacksquare$

Combined with achievability

Combined with the achievability argument in Thm thm-msd-capacity- achieving, the unique optimal rate allocation is $R_i = C_i$ , with total $\sum_i C_i = C_{\rm CM}$ .

ex-ch03-19

Medium

The text claims that at low SNR, 8-PSK has $C_0 \ll 1$ bit while $C_2$ saturates to $1$ bit almost immediately. Use the approximation $C(\gamma) \approx \gamma / (2 \ln 2)$ for small $\gamma$ (low-SNR linearisation of $C = 1 - h_2(Q(\sqrt\gamma))$ ) to quantify this. Estimate $C_0$ and $C_2$ at $\text{SNR} = -2$ dB.

Show Hint

Level 0 has squared distance $d_0^2 \approx 0.586$ ; level 2 has $d_2^2 = 4$ .

Compute $\gamma_i = d_i^2 \text{SNR}/2$ for each level.

Solution

Per-level SNR at $\ntn{snr} = -2$ dB

$\text{SNR} = 10^{-0.2} \approx 0.631$ . Per-level: $\gamma_0 = 0.586 \cdot 0.631 / 2 \approx 0.185$ ; $\gamma_2 = 4 \cdot 0.631 / 2 = 1.262$ .

Capacity estimates

Low-SNR approximation: $C_0 \approx 0.185 / (2 \ln 2) \approx 0.134$ bit. For $\gamma_2 = 1.262$ the low-SNR expansion is out of range; the exact binary-input AWGN formula gives $C_2 \approx 0.55$ bit.

Commentary

At $-2$ dB 8-PSK is deep in the power-limited regime: level 0 is essentially useless ( $0.13$ bits), level 1 intermediate, level 2 partial ( $0.55$ bits). The total is less than $1$ bit — far below the $3$ bits/symbol maximum. 8-PSK is a bandwidth- limited modulation operated at power-limited SNR, which is the worst of both worlds; in practice, one would drop to QPSK at this SNR, as 5G NR does.

ex-ch03-20

Hard

Sketch the capacity rule curves $C_0, C_1, C_2$ and their sum for 8-PSK from $-2$ to $20$ dB, and identify the SNR at which $C_{\rm CM}$ reaches $2.5$ bits/symbol. Compare with the SNR at which uncoded 8-PSK achieves $P_b = 10^{-5}$ .

Show Hint

Use the interactive plot binary_partition_capacity.

Uncoded 8-PSK at $P_b = 10^{-5}$ needs $E_b/N_0 \approx 10$ dB, or $E_s/N_0 \approx 15$ dB.

Solution

Read from the plot

$C_{\rm CM} = 2.5$ bits is achieved around $\text{SNR} \approx 7$ dB (reading the plot). At this SNR the allocation is approximately $(C_0, C_1, C_2) \approx (0.52, 0.96, 1.00)$ .

Uncoded 8-PSK benchmark

Uncoded 8-PSK at $P_b = 10^{-5}$ needs $E_b/N_0 \approx 10$ dB, or $E_s/N_0 = 3 E_b/N_0 \approx 14.8$ dB (since $\eta = 3$ for 8-PSK).

Coding gain to capacity

At the same spectral efficiency $\eta = 2.5$ bits, the MLC/MSD operating point is at $\text{SNR} \approx 7$ dB, an $8$ -dB power saving over uncoded 8-PSK at the slightly higher spectral efficiency of $3$ bits. This is a very large coding gain — comparable to what TCM (Ch. 2) offered at much lower complexity.

ex-ch03-21

Medium

Compare the asymptotic ( $\text{SNR} \to \infty$ ) behaviour of $C_{\rm CM}$ and $C_{\rm BICM, Gray}$ for 16-QAM. Does the gap vanish, stay constant, or grow?

Show Hint

At high SNR both capacities approach $\log_2 M = 4$ bits.

Look at the rate of approach.

Solution

Both approach $\log_2 M$

As $\text{SNR} \to \infty$ , $C_{\rm CM}, C_{\rm BICM, Gray} \to \log_2 M = 4$ bits/symbol. The constellation is fully resolvable.

Rate of approach

At high SNR both curves saturate exponentially fast: $\log_2 M - C \sim e^{-c \text{SNR}}$ for some constant $c$ depending on the minimum distance. The CM–BICM gap at fixed SNR is dominated by the second-order terms in this expansion, which also decay exponentially — so the gap vanishes.

Empirical check

From the plot at $\text{SNR} = 20$ dB: $C_{\rm CM} \approx 3.97$ , $C_{\rm BICM, Gray} \approx 3.96$ . At $\text{SNR} = 25$ dB the gap is below $10^{-3}$ bits. At low SNR (near threshold) the gap is largest, consistent with the plot's peak of a few tenths of a bit around $5$ – $10$ dB.

ex-ch03-22

Challenge

Suppose a system uses MLC with ideal capacity-achieving codes at every level, and a BICM competitor uses an ideal capacity-achieving code at the BICM rate. For 16-QAM at $\text{SNR} = 10$ dB, compute the operational SNR gap between the two schemes (i.e., the dB difference in required SNR to achieve the same spectral efficiency $\eta$ ). Is this gap worth the complexity difference?

Show Hint

Use the interactive plot to find the SNR at which $C_{\rm BICM, Gray} = 3.22$ bits (the 16-QAM CM capacity at 10 dB).

Compare with $\text{SNR} = 10$ dB.

Solution

MLC operating point

At $\text{SNR} = 10$ dB, $C_{\rm CM}(16\text{-QAM}) \approx 3.22$ bits. With ideal codes MLC operates exactly at this point.

BICM operating point for the same $\eta$

BICM at $\eta = 3.22$ needs $C_{\rm BICM, Gray} \ge 3.22$ . From the plot, this is achieved at $\text{SNR} \approx 10.2$ dB — a $0.2$ dB shift.

Interpretation

The BICM-vs-MLC operational gap is $0.2$ dB for 16-QAM at $\eta \approx 3$ bits. This is smaller than the typical link-budget margin ( $1$ – $2$ dB) and far smaller than the gap to Shannon ( $\sim 1.3$ dB from the modulation loss alone). The capacity-rule gain of MLC is simply not worth the $5\times$ codebook growth — which is why 5G NR is BICM.

Exercises

ex-ch03-01

Level 0: full 16-QAM

Doubling rule

Verification

ex-ch03-02

Chain rule and bijection

ex-ch03-03

Read off the plot

Allocation and total

Gap to Shannon

ex-ch03-04

Decompose each term

Sum over levels

Interpretation

ex-ch03-05

Apply the bound

Find the crossover

ex-ch03-06

Find the target SNR

Read the per-level capacities

Allocation

ex-ch03-07

Identify the mapping

Numerical computation (sketch)

Comparison

ex-ch03-08

High-SNR limit of $I(B_i; B_{<i} \mid Y)$

Role of Gray labelling

Conclusion

ex-ch03-09

Chain-rule decomposition is labelling-agnostic

Per-level capacities change

MSD still works

Practical caveat

ex-ch03-10

Level counts

Codebook growth

System impact

ex-ch03-11

MSD complexity

Joint ML complexity

Comparison

ex-ch03-12

Translate conditional independence

Apply exercise ex-ch03-04

Remark

ex-ch03-13

Per-stage exponent

Union bound

Interpretation

ex-ch03-14

Antipodal lower bound

Numerical comparison

Remark on the bound's slackness

ex-ch03-15

Counter-argument 1: codebook growth

Counter-argument 2: rate adaptation

Counter-argument 3: interleaver depth for fading

Final remark

ex-ch03-16

Set up the critical ratio

Estimate the ratio for 8-PSK at moderate SNR

Design implication

ex-ch03-17

APSK constellations

Lattice coded modulation

High-dimensional constellations

Remark on the QAM workhorse

ex-ch03-18

Setup: conditional decoding at stage $i$

Fano's inequality at each stage

Let $n \to \infty$

Combined with achievability

ex-ch03-19

Per-level SNR at $\ntn{snr} = -2$ dB

Capacity estimates

Commentary

ex-ch03-20

Read from the plot