Exercises
ex-ch05-01
EasyFor 16-QAM with Gray labelling (two PAM Gray codes, one per axis), verify that every pair of nearest-neighbour constellation points (at distance ) differs in exactly one label bit. How many nearest-neighbour pairs are there total?
Draw the 4x4 grid and label each point with a 4-bit label using the 4-PAM Gray code .
List the -direction neighbours and the -direction neighbours.
There are horizontal nearest-neighbour pairs and vertical ones.
Label the grid
The four 4-PAM Gray codes are . A 16-QAM point at has 4-bit label , concatenation giving .
I-direction neighbours
Horizontal neighbours differ only in their 2-bit sublabel. The PAM Gray transitions are , , β each differing in exactly one of . The sublabel is the same on both ends. So each pair differs in exactly one bit. With 4 rows and 3 horizontal pairs per row, there are horizontal pairs.
Q-direction neighbours
By symmetry, vertical pairs, each also at Hamming distance .
Total
nearest-neighbour pairs, all at Hamming distance exactly one. This is the Gray property.
ex-ch05-02
EasyState the BICM capacity formula and explain, in one sentence each, why the formula (a) is an upper bound on achievable rate under any product-form demapping metric, and (b) is achieved under exact marginal demapping.
Use the result of Thm. TThe BICM Capacity Decomposition.
The formula
.
Upper bound under product-form decoding
Any product-form decoding metric is mismatched relative to the true joint symbol likelihood, and the generalised mutual information (GMI) under such a metric equals the sum of marginals β a standard result from Merhav-Kaplan-Lapidoth- Shamai on mismatched decoding.
Achievability under exact marginal demapping
With exact marginal likelihoods , the decoder operates on a memoryless binary mixture channel with capacity ; Shannon's theorem delivers reliable coded-bit rate up to this capacity, and coded-bit rate is the information rate.
ex-ch05-03
MediumProve that for any labelling and any channel with uniform inputs, with equality if and only if the label bits are jointly independent given .
Apply the chain rule to .
Express the difference as a sum of conditional mutual informations.
Use the a priori independence of the label bits () to rewrite each difference term.
Chain rule
.
Subtract
Difference . (The term cancels.)
Rewrite each term
Using the identity and the prior-independence , we get .
Equality condition
Each term iff and are conditionally independent given ; all terms vanish iff are jointly conditionally independent given .
ex-ch05-04
MediumCompute the BICM bit-channel transition for 4-PAM with Gray labelling under AWGN with noise variance . Do this explicitly for the first bit and the second bit .
For , and .
For , and .
The Gaussian density is .
Bit 0 (MSB of Gray code)
(labels ) and (labels ). Therefore Each half corresponds to "two points on one side of zero." This is an approximately symmetric binary-input channel with the decision boundary at .
Bit 1 (LSB of Gray code)
(labels ) and (labels ). Therefore Each half is a two-component Gaussian mixture. The decision boundaries are at .
Remark
The first bit is the easier bit: points that carry are all on one side of , so the posterior resembles a binary-input antipodal channel. The second bit is harder: points carrying (the outer pair) and (the inner pair) are interleaved, and the channel has two decision regions. This mirrors the numerical table of Thm. TThe BICM Capacity Decomposition.
ex-ch05-05
MediumFor 4-PAM under SP labelling, define and (same as Gray), but (the coarser coset) and (the finer coset). Write the bit-channel transitions and argue why the channel has lower unconditional capacity than its Gray counterpart.
Compare which two points are averaged together for bit in each case.
Gray groups outer pair / inner pair; SP groups / β points at distance .
SP transitions for bit 1
and .
Why SP is worse here
Under SP, each -subset consists of two points at distance β both mixtures are bimodal with widely separated modes. The two conditional distributions are in fact identical up to a shift by 4, so the total variation between and is small, and the mutual information is significantly less than .
The geometric intuition
SP labelling encodes bit in a long-range structure (points separated by ), which makes each unconditional bit-channel distribution bimodal and hard to distinguish. Gray encodes bit in short-range structure (inner pair vs outer pair), giving a cleaner unconditional binary channel. The MLC decoder (which conditions on ) turns the SP bimodal channel into a unimodal one and reaps the hierarchy benefit β but BICM, being unconditional, pays the bimodal price.
ex-ch05-06
MediumUse the interactive plot πCM vs BICM Capacity for QAM on AWGN or direct numerical integration to verify that for 16-QAM at dB, bits. Then find the SNR at which the gap is maximised and report its value.
The maximum gap typically occurs in the mid-SNR 'waterfall' region, not at very low or very high SNR.
For 16-QAM, the maximum gap is near dB and is bits.
At $\ntn{snr} = 10$ dB
The simulation reports and bits/symbol, giving a gap of bits, well below the 0.05-bit ceiling.
Find the max gap
Sweeping the plot from to dB, the gap peaks at approximately dB with a value of bits/symbol. At higher SNR the gap decays exponentially (Thm. TGray Labelling Near-Optimality on AWGN); at lower SNR both capacities go to zero and the gap shrinks.
Implication
The uniform bound in Thm. TGray Labelling Near-Optimality on AWGN(b) ( bits for ) is essentially tight for 16-QAM; the achievable numerical gap is within a hair of the theoretical upper bound and is negligible at the operating point of any real 16-QAM system.
ex-ch05-07
MediumA BICM system uses a rate- LDPC code driving 256-QAM () on AWGN. What is the spectral efficiency ? Using the interactive plot πCM vs BICM Capacity for QAM on AWGN, estimate the SNR at which reliable communication is possible (Shannon limit of BICM-Gray-256-QAM at this rate).
.
Find the SNR where crosses .
Spectral efficiency
bits/symbol.
BICM Shannon limit
From the plot, for the BICM-Gray curve reaches bits/symbol at approximately dB. (Compare the Shannon limit dB β the gap is the 256-QAM modulation-capacity penalty of dB.)
Implication
A capacity-approaching LDPC on BICM-Gray-256-QAM cannot achieve vanishing error below dB of SNR. Adding the standard finite-length penalty of β dB puts a typical rate- 256-QAM waterfall at β dB for BER .
ex-ch05-08
HardDerive the high-SNR asymptotic behaviour of for square 16-QAM with Gray labelling: as . (The constant inside the depends on the average number of nearest neighbours.)
At high SNR, the bit-channel capacity bit; the correction is the binary-channel entropy at crossover probability .
for unit-energy 16-QAM.
Binary entropy for small .
Per-bit crossover probability
Under Gray labelling, the dominant error event for each bit is a single nearest-neighbour flip, occurring with probability (using for unit-energy 16-QAM and per real dim).
Bit-channel entropy
Each bit channel has conditional entropy , so . For small , the leading correction is as .
Sum
. The exponential decay of makes the gap to the full capacity vanish super-exponentially in .
Comparison to CM
The CM capacity behaves exactly the same way at high SNR (same nearest-neighbour analysis, just on the -ary symbol channel), so the gap also decays as β the exponential-convergence result of Thm. TGray Labelling Near-Optimality on AWGN(a).
ex-ch05-09
HardMismatched-decoding GMI for BICM. Show that the generalised mutual information (GMI) under the product bit metric with equals the BICM capacity . (Sketch the argument; the full proof is in Ch. 7.)
The GMI formula is .
For a product metric, both numerator and denominator factorise across .
The supremum over is achieved at when the metric matches the true marginal.
GMI formula under product metric
. Under i.i.d.\ uniform labels, , and . The GMI factorises.
Per-bit GMI
For each , with being the exact marginal likelihood, the per-bit GMI reduces to at . (The supremum over is achieved at because the metric is matched β the per-bit channel law.)
Sum
. This matches the BICM capacity formula and confirms that the BICM capacity is the correct operational quantity under product-metric decoding.
ex-ch05-10
MediumSuppose the BICM demapper uses the max-log approximation. Argue that the resulting generalised mutual information is strictly below for any noise level . What asymptotic regime makes the loss vanish?
Max-log replaces with .
The error in this approximation is largest when multiple are comparable (i.e., moderate SNR).
Asymptotically, the nearest-neighbour term dominates the sum β this is the high-SNR regime.
The max-log approximation
Max-log: . This is exact only when one is much smaller than the others; otherwise, the true log-sum is strictly greater by a positive offset.
GMI under max-log
The max-log metric is strictly mismatched (not the true marginal likelihood) for . The GMI formula is then suprema over in general, and yields a value strictly less than .
Vanishing loss at high SNR
As , the nearest-neighbour dominates the likelihood, so the max-log approximation becomes exact. In this regime . The loss is therefore a low-to-moderate-SNR effect; at the high SNRs where modern MCS schemes actually operate, max-log is essentially lossless.
ex-ch05-11
MediumWhy SP loses in BICM. Compute (numerically or via direct reasoning) the unconditional bit-0 capacity for 4-PAM at dB under both Gray and SP labellings. Explain why they are equal for but differ for .
For 4-PAM, the Gray and SP labellings assign the same subsets ; they only differ at .
Bit 0 in both labellings partitions the 4-PAM as vs .
Bit-0 is identical
For 4-PAM, both Gray () and SP ( β or whatever SP tree is used) produce the same top- level partition vs . Hence β around bits at dB.
Bit-1 differs
The bit-1 partitioning differs: Gray gives vs (inner/outer), while SP gives vs (cosets of distance ). The SP unconditional bit channel is symmetric (each -set has points at distances and , a distance-4 pair), which makes the marginal posterior flatter in than . Numerically, while .
Sum
bits vs bits. The SP capacity is bits lower. This is exactly the MLC-style hierarchy that SP creates β information is concentrated in bit under SP, but bit is useless without conditioning.
ex-ch05-12
MediumA wireless system uses adaptive modulation with BICM and a single LDPC code at rate . The channel quality indicator reports three regimes: low, medium, high. Design an MCS table with three entries using modulations 4-QAM, 16-QAM, 64-QAM. For each MCS, report the Shannon-limit SNR (BICM-Gray capacity) at which reliable communication is possible.
Low regime: 4-QAM at rate gives .
Medium regime: 16-QAM at rate gives .
High regime: 64-QAM at rate gives .
Use the interactive plot to find the SNR threshold for each.
Low regime: QPSK + rate 3/4
bits/symbol. BICM-Gray-4-QAM reaches bits at dB (the QPSK channel is essentially BI-AWGN per component, so the BICM gap to CM is trivial).
Medium regime: 16-QAM + rate 3/4
bits/symbol. BICM-Gray-16-QAM reaches bits at dB.
High regime: 64-QAM + rate 3/4
bits/symbol. BICM-Gray-64-QAM reaches bits at dB.
Design note
The three MCS thresholds are approximately dB / dB / dB. A scheduler should switch from QPSK to 16-QAM around dB and from 16-QAM to 64-QAM around dB, with appropriate implementation margin (1β1.5 dB above each Shannon limit). The same LDPC code at rate drives all three β that is the engineering point of BICM.
ex-ch05-13
HardProve the prior-independence step in the gap formula. Show carefully that when the label bits are i.i.d.\ uniform. Then conclude that .
and are independent a priori by construction.
Use the identity β no, this is not quite right; use a cleaner identity.
The key identity is .
Prior independence
By construction of the BICM encoder (uniformly distributed coded bits + ideal interleaver), the prior on is i.i.d.\ uniform on . Hence any subset of label bits is mutually independent, so .
Chain-rule identity for two decompositions of $\ntn{mi}(Y; B_\ell, B_{<\ell})$
(decompose the joint via chain rule with first). Equally, (decompose with first). Equating gives
Convert the right-hand side to a conditional mutual information
Using the symmetric form with , , : . Using (prior independence),
ex-ch05-14
EasyList three concrete engineering benefits of BICM over MLC that motivated every modern standard's adoption of BICM. (A one- sentence answer per benefit is sufficient.)
See the comparison table of TCM / MLC / BICM β A Structural Side-by-Side and the engineering notes of Β§5.
Benefit 1: single-code rate adaptation
A single binary code + rate matching drives every modulation in the MCS table; MLC would require codes per modulation and a dedicated rate-allocation step for each. This reduces encoder/decoder hardware complexity by a factor of .
Benefit 2: no error propagation
The BICM decoder is a single run of a binary decoder over the full LLR stream. MLC/MSD is sequential: an error at stage 0 propagates to stage 1 and causes catastrophic error-floor behaviour requiring iterative workarounds.
Benefit 3: labelling and code decouple
Changing the labelling (e.g., from Gray to quasi-Gray for APSK) affects only the demapper's look-up; the LDPC decoder and the encoder are unchanged. In MLC, changing the partitioning changes every per-level code's design.
ex-ch05-15
MediumIn 5G NR, an MCS index maps to -QAM, code rate ). The spectral efficiency is bits/symbol. Using the BICM capacity plot, estimate the minimum SNR for reliable reception. Then add a typical implementation margin and discuss whether this MCS makes sense at post-equalisation dB.
Find the SNR at which BICM-Gray-64-QAM reaches 3.6 bits/symbol.
Typical implementation margin: 1-1.5 dB for a 5G NR LDPC at codeword length .
Shannon limit
From the plot, BICM-Gray-64-QAM reaches bits at approximately dB.
Implementation margin
Adding dB for a realistic LDPC: dB is the effective waterfall for BER (the NR BLER target for initial transmission).
Decision at $\ntn{snr} = 12$ dB
We are at dB β dB below the effective waterfall. HARQ retransmissions will almost certainly be needed. A more prudent MCS at this SNR is (rate , 64-QAM), which has its waterfall at dB and gives a margin of dB.
Remark
In 5G NR, the outer-loop link adaptation algorithm continually adjusts MCS based on the moving-average block error rate; over time it will back off from to at this SNR.
ex-ch05-16
HardAnti-Gray at low SNR. For 16-QAM, construct an "anti-Gray" labelling that maximises the average Hamming distance between nearest neighbours. Compare numerically the BICM capacities of Gray and anti-Gray at dB. Confirm that anti-Gray slightly exceeds Gray in this regime (the advantage should be bits).
Anti-Gray labelling for a 16-QAM square grid: pair each point with a partner at maximum Hamming distance (= 4 bits all different).
One construction: take the standard Gray labelling and XOR each label with its row-and-column parity to scramble neighbourhoods.
At dB, the mutual informations are below bit/symbol; the anti-Gray BICM capacity is - bits above Gray.
Anti-Gray labelling construction
One valid anti-Gray 16-QAM labelling maps to point where (rows become Gray-flipped ) and (columns become Gray-flipped ). Neighbours now differ in 2 bits instead of 1.
Numerical comparison at $\ntn{snr} = -3$ dB
Gray: bits/symbol. Anti-Gray: bits/symbol. Anti-Gray is ahead by bits.
Interpretation
At very low SNR the per-bit channels are nearly useless in isolation (conditional entropy near 1); spreading bit-related information across more label transitions (anti-Gray) extracts slightly more unconditional mutual information than concentrating it on single-bit transitions (Gray). The advantage is small and reverses at higher SNR where Gray's one-bit-flip structure becomes more valuable.
Practical irrelevance
At dB, operating a BICM system at bits means using 16-QAM at rate , which is far below anything a practical system would consider. The anti-Gray advantage exists but is operationally irrelevant.
ex-ch05-17
MediumExplain why the bit interleaver is essential to the BICM framework. What would go wrong in the capacity analysis of Β§5.3 if the interleaver were removed (i.e., each -tuple of consecutive coded bits was mapped directly to a constellation symbol)?
Without interleaving, consecutive coded bits share the same symbol's noise realisation.
The ideal-interleaver theorem (Thm. Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver Memoryless Parallel Bit Channels) breaks down.
The decoder's scalar binary-channel model fails; memory effects appear.
Without interleaving, the bit channel becomes correlated
If consecutive coded bits share the same symbol, then
their LLRs are deterministic functions of the same β
highly correlated. The binary decoder, which assumes i.i.d.
bit-channel outputs, encounters a memory-full binary
channel.
Capacity under memory
A binary decoder that treats a memory-full channel as memoryless operates at a smaller generalised mutual information. For small this loss is significant; it is exactly what the interleaver recovers.
Diversity on fading
On a fading channel, the interleaver is even more critical: without it, a symbol fade takes down all consecutive coded bits at once, destroying the diversity order of the binary code. The interleaver spreads the fade across different codeword positions, restoring full diversity (Ch. 6).
Punchline
The interleaver is what converts the -ary symbol channel into marginally-independent binary channels. Take it out and the BICM paradigm collapses.
ex-ch05-18
HardCompute the BICM capacity of 8-PSK with Gray labelling at dB and compare to CM. Report both values, the gap, and the dB-of-SNR equivalent.
Gray labelling for 8-PSK: bits such that phase = .
The CM capacity of 8-PSK is larger than QPSK's at 6 dB because of the tighter constellation.
The Gray-BICM gap on 8-PSK is the largest of any Gray-labelled constellation (up to bits at mid-SNR).
8-PSK CM capacity at 6 dB
Numerical integration gives bits/symbol at dB. (For reference: Shannon limit is ; QPSK CM is .)
8-PSK Gray-BICM capacity at 6 dB
Under Gray labelling (bit 0 splits upper/lower half-circle, bit 1 splits right/left, bit 2 splits alternating), the three bit-channel capacities are roughly for a sum of bits/symbol.
Gap
bits β an order of magnitude larger than the 16-QAM gap. In SNR terms, the rate bits is reached by CM at about dB, so the BICM penalty is dB of SNR.
Why 8-PSK is worse than 16-QAM
8-PSK does not decompose into independent PAM Gray components; its three bit channels are more strongly correlated at the symbol level than the four bit channels of 16-QAM (which split into two independent 4-PAM systems). The BICM formula's product-decoding assumption loses more information on 8-PSK than on square QAM. This is a DVB-S2- relevant observation: 8-PSK + rate-5/6 LDPC is used at medium SNR, and the dB BICM penalty is a known design cost.
ex-ch05-19
MediumIn the BICM receiver (Algorithm AMax-Log Per-Bit LLR Computation at the BICM Demapper), the max-log LLR computation has cost per received symbol. Suggest a reduction to using constellation geometry.
For square QAM, the minimum-distance search in each subset has an exploitable structure.
Separate the and dimensions.
Separate I and Q
For square -QAM, the label decomposes as , and the likelihood factorises: . So each LLR depends only on one of and the corresponding -PAM subset β an search per LLR.
Total cost
LLRs per LLR = total. For 256-QAM, , , giving distance computations β compared to for the naive algorithm. A speedup.
Practical implementations
5G NR demappers use further refinements (e.g., the "decision- region" trick that directly computes max-log LLRs from truncated projections of ), bringing the cost to per symbol for modulations up to 256-QAM. This is the cost floor.
ex-ch05-20
HardForward to Chapter 6. For a rate- BICM system using 16-QAM with Gray labelling over a Rayleigh block-fading channel with independent fading blocks per codeword, what is the expected diversity order ? Use the Caire-Taricco-Biglieri 1998 Β§IV result: is the minimum number of distinct bit positions where two codewords at Hamming distance differ. Assume the binary code has free distance .
The diversity order on an -block fading channel is the minimum of and .
Chapter 6 will cover this properly; here it's a forward-reference warmup.
Diversity-order formula
The BICM pairwise error probability on a Rayleigh block- fading channel decays as , where the diversity order is at most the minimum number of distinct fading realisations affecting the dominant codeword pair at Hamming distance .
For our system
Codeword length . Interleaver spreads the coded bits uniformly across fading blocks, so each block gets bits. With , the differing bits of the dominant codeword-pair fall into at most distinct blocks β so .
Interpretation
If , the code's full free-distance diversity is exploited. If , the fading correlation limits the diversity to . In 5G NR with OFDM over a frequency- selective channel, the interleaver is chosen to span all available coherence bandwidths, ensuring is as large as possible. Chapter 6 develops the full analysis.
ex-ch05-21
ChallengeResearch-level. The Caire-Taricco-Biglieri 1998 paper proves Gray near-optimality by numerical computation across all SNRs. Can you derive an analytical upper bound on for square QAM that is a closed-form function of SNR and , not a tabulated number?
Use the union bound for nearest-neighbour errors and the convexity of binary entropy.
Bound by the entropy , which in turn is bounded by .
Partial credit: derive the high-SNR asymptotic in closed form (done in Thm. TBICM Capacity β High-SNR Asymptotics).
Decompose the gap by level
.
Per-bit entropy bound
Under Gray labelling, each bit channel is approximately a BI-AWGN channel with effective SNR depending on the PAM level. Its conditional entropy is where . Summing gives a closed-form β but loose β upper bound .
Tightening via conditional structure
Observe that is bounded below (Cauchy- Schwarz-like). Refinements in FΓ bregas-MartΓnez-Caire 2008 Β§3.5 give sharper bounds via the Taylor expansion of mutual information around Gaussian inputs. A fully analytical uniform bound remains an open problem; the result of Caire-Taricco-Biglieri 1998 is a numerical table, and researchers have since produced asymptotic closed-form expressions (MartΓnez-FΓ bregas-Caire 2009) and tight machine-learning-assisted bounds (Alvarado et al. 2015).
What to hand in for this challenge
A valid partial answer is the high-SNR asymptotic from Thm. TBICM Capacity β High-SNR Asymptotics. An uniform-in-SNR closed-form bound is a research-level open question β the state of the art is in Alvarado, BrΓ€nnstrΓΆm, Agrell (2015) "A note on the capacity of BICM in fading channels" and subsequent papers.
ex-ch05-22
MediumWrite a short essay (2-3 paragraphs) explaining why the Caire-Taricco-Biglieri 1998 paper is considered "the foundational BICM paper." Your essay should touch on: (i) what the field looked like before the paper (Zehavi 1992 motivation, TCM and MLC alternatives), (ii) the three main technical contributions of the paper, (iii) its direct influence on DVB-S2 (2003), LTE (2008), and 5G NR (2018).
Consult the historical notes of Β§1 and Β§5 and the commit_contribution block in Β§1.
Sample essay
Before Caire-Taricco-Biglieri 1998, coded modulation for wireless channels was dominated by two paradigms: Ungerboeck TCM (optimised for AWGN, monolithic per-channel) and Imai-Hirasawa MLC (multi-code, set-partitioning-based). Both paradigms tightly coupled code design to constellation design, making rate adaptation and multi-modulation MCS tables prohibitive. Zehavi's 1992 paper had shown that a simple bit-interleaver + binary code + 8-PSK combination could outperform TCM on Rayleigh fading β but without any theoretical justification for how well it could do or why.
Caire-Taricco-Biglieri supplied that theory. Their paper's three contributions β (i) formalising BICM as parallel independent binary channels with capacity ; (ii) proving Gray labelling makes the BICM capacity within fractions of a bit of CM capacity on square QAM; (iii) deriving the diversity-order analysis on fading channels β together turned BICM from a plausible heuristic into a capacity-theoretic design rule. The paper's single most influential design advice β "use a powerful binary code, a bit interleaver, and Gray labelling" β is the architectural basis of every modern wireless MCS.
The direct industrial consequences were: DVB-S2 (2003) adopted LDPC + BICM as its coded-modulation architecture with quasi-Gray APSK, replacing TCM/convolutional/BICM- hybrid designs considered in the standardisation; LTE (2008) used turbo codes + BICM + Gray QAM; 5G NR (2018) switched back to LDPC + BICM + Gray QAM with a single base graph per code length. Every one of these standards uses exactly the architecture Caire-Taricco-Biglieri 1998 justified theoretically. In the history of wireless standards, few academic papers have had so direct an impact β and this one is the flagship CommIT contribution of the entire Ferkans CM book.