Independent Parallel Bit Channels
The Central Modelling Device of BICM
The capacity analysis of BICM rests on a single simplifying assumption β that the bit interleaver is so large and well chosen that any two bits belonging to the same constellation symbol come from effectively independent positions in the coded stream. Under this assumption, the decoder sees not an -ary channel but separate binary channels operating in parallel. Each binary channel is characterised by a single scalar per-bit metric, and their capacities add.
This section makes that picture precise. We define the per-bit channel carefully, prove that ideal interleaving forces it to be memoryless, and derive the soft LLR metric that the demapper uses. The derivation looks innocent, but it contains a subtle twist that every student of BICM must see: the bit channels are marginally independent from the decoder's viewpoint, but they are not jointly independent β they all depend on the same underlying . The BICM capacity we will derive in Β§3 is the correct capacity for a decoder that processes the bits as if they were marginally independent. This is a mismatched decoding rule, and the capacity is the corresponding generalised mutual information.
Definition: The -th BICM Bit Channel
The -th BICM Bit Channel
Fix a constellation of size , a labelling , a memoryless channel law , and uniform inputs on . For each bit position , the -th BICM bit channel is the scalar binary-input channel with:
- Input alphabet , with uniform prior .
- Output (taking values in the original channel's output alphabet).
- Transition law where is the subset of constellation points whose -th label bit is .
The soft bit metric or log-likelihood ratio is
This is the quantity the demapper computes and passes (through the de-interleaver) to the binary decoder.
The definition averages over the other bits: the bits at positions are marginalised out uniformly. This is what makes the bit channel scalar; it is also what costs capacity relative to MLC, which keeps those bits as conditioning variables.
Theorem: Ideal Interleaver Memoryless Parallel Bit Channels
Consider the BICM encoder of Definition DBICM Encoder with an ideal interleaver β i.e., a random permutation selected uniformly from the symmetric group on elements, revealed to both transmitter and receiver. Then from the binary decoder's viewpoint, the per-bit LLR stream is produced by a memoryless binary-input channel whose transition law is the mixture Equivalently, the BICM bit channel is the time-averaged random parallel channel over the bit positions, with each position selected with probability .
The interleaver shuffles coded bits so thoroughly that the -th coded bit is, from the decoder's perspective, equally likely to have been mapped to any of the label positions inside any constellation symbol β with all other bits of that symbol drawn independently from the interleaver's output. This converts the structured joint symbol channel into the mixture channel above, which is memoryless by construction.
Fix a coded-bit position . Conditional on the interleaver, it is mapped to some label position inside some constellation symbol.
The other bits of that symbol come from other parts of the codeword and, through the interleaver's randomness, are uniform and independent at long block lengths.
Average the symbol channel transition law over this randomness.
Conditional transition law
Fix any coded-bit index and condition on the interleaver and on the event β i.e., coded bit is mapped to label position of constellation symbol . Let denote the value of that bit and denote the other bits of symbol . Then the output received from symbol satisfies
Ideal interleaver $\Rightarrow$ uniform other bits
Under the ideal-interleaver assumption, the other bits of symbol originate from coded positions that are effectively independent of position , and in the capacity-approaching random- coding regime they are asymptotically uniform i.i.d.\ on . Hence , and the transition law becomes the per-bit channel law which is exactly the -th BICM bit channel from Definition -th BICM Bit Channel" data-ref-type="definition">DThe -th BICM Bit Channel.
Average over label position
The interleaver is uniform, so for each of the label positions inside symbol . Averaging over this uniform choice gives This is the channel seen by the binary decoder.
Memorylessness
The claim of memorylessness requires that the joint law of two coded bits factorise into the product of individual laws. This holds because (i) different coded bits end up in different symbols (with probability approaching one as ), and (ii) conditional on the interleaver, the noise processes of different symbols are independent. The ideal-interleaver mixture channel is therefore i.i.d.\ in the temporal direction.
Max-Log Per-Bit LLR Computation at the BICM Demapper
Complexity: arithmetic operations per received symbolThis is the max-log approximation: it replaces by . For moderate to high SNR the approximation is tight to within a fraction of a dB; for 5G NR and Wi-Fi receivers this is the standard implementation.
Per-Level Bit-Channel Capacities vs. SNR
Each of the BICM bit channels has its own capacity , which depends on the labelling. Under Gray labelling, all curves look qualitatively similar β each bit sees an "average" binary-input AWGN-like channel. Under SP labelling, the curves split: the top-level bit has near-Shannon capacity (because it lives on well-separated cosets), while the bottom-level bit has near-zero capacity at low SNR (its cosets overlap heavily). This split is exactly why SP is optimal for MLC (one matches a low-rate code to and a high-rate code to ) and suboptimal for BICM (the sum of unconditional capacities is smaller).
Parameters
Example: The Four Bit Channels of 16-QAM at 10 dB
For 16-QAM at dB under Gray labelling, compute the four bit- channel capacities and compare to the SP labelling. Explain the pattern.
Gray labelling β approximate symmetry
Under Gray labelling, 16-QAM decomposes as two independent Gray- coded 4-PAM components. The four BICM bit channels are therefore two copies each of the two 4-PAM Gray bit channels. Numerically, at dB, and the same for the component. The sum is bits.
SP labelling β dramatic split
Under SP labelling the capacities split into a highly asymmetric set: The top two bit channels are nearly saturated (their cosets are maximally separated), while is close to useless in isolation because its two cosets overlap in Euclidean space. The sum is bits β lower than Gray by bits.
The punchline
Gray labelling is not maximising any single ; it is maximising the sum . Each Gray bit sees a somewhat less-than-perfect binary channel, but the four of them add up to more than the SP sum. Geometrically, Gray spreads information evenly across bits; SP concentrates it in the top bits. Adding uniformly-useful sub-channels beats adding an almost-saturated sub- channel to an almost-useless one β at least in the unconditional capacity sense. MLC recovers the SP "wasted" bits through conditioning; BICM cannot.
BICM as Mismatched Decoding
Strictly speaking, the BICM decoder treats the label bits of each symbol as if they were outputs of independent binary channels, whereas they are in fact deterministic functions of the same . This makes the BICM decoding metric a mismatched metric β it is not the true likelihood of the codeword given . The capacity of a channel under a mismatched decoder is the generalised mutual information (GMI), and the BICM capacity formula is in fact the GMI under the product-form decoding metric. Chapter 7 revisits this mismatch perspective rigorously following MartΓnezβGuillΓ©n i FΓ bregasβCaire (2008). For now, remember: the capacity we are about to derive is operationally correct for the BICM decoder as implemented, not for an optimal -ary ML decoder β which would of course achieve .
Common Mistake: "Independent Bit Channels" β Marginal, Not Joint
Mistake:
Assuming that the bit positions of a constellation symbol are jointly independent given .
Correction:
They are marginally independent in the sense that the BICM capacity formula treats each separately β but they are all deterministic functions of the same , so jointly conditional on they have a specific joint distribution that generally does not factorise. The quantity (the CM capacity) is therefore greater than or equal to (the BICM capacity), with equality only when the joint conditional really does factorise β which is a knife-edge condition.
The BICM decoder acts as if the bits were jointly independent; that mismatch is precisely what costs it the gap to CM capacity.
Quick Check
Under the ideal-interleaver assumption, the BICM bit channel seen by the binary decoder is
an -ary channel whose output is and whose input is the -tuple
a uniform mixture of the per-position binary channels , each selected with probability
a deterministic channel (no noise) because interleaving removes correlations
a BSC with crossover probability equal to the symbol error rate of the constellation
The ideal interleaver distributes coded bits uniformly over the label positions, so the decoder sees the mixture . This is the key modelling reduction of BICM: the -ary symbol channel becomes a scalar binary channel.
Bit Channel (BICM)
The scalar binary-input channel seen by bit position after marginalising out the other label bits and the interleaver. Its transition law is the average of the full-symbol likelihood over the two subsets and . The sum of its capacities across is the BICM capacity.
Related: The -th BICM Bit Channel, BICM Capacity, Max-Log Per-Bit LLR Computation at the BICM Demapper
Log-Likelihood Ratio (LLR)
For a binary-input channel, . The sufficient soft statistic for optimal binary decoding; modern BICM receivers compute one LLR per bit position per received symbol and feed them to a soft-input binary decoder.
Related: Bit Channel (BICM), Demapper with A Priori Information