Independent Parallel Bit Channels

The Central Modelling Device of BICM

The capacity analysis of BICM rests on a single simplifying assumption β€” that the bit interleaver is so large and well chosen that any two bits belonging to the same constellation symbol come from effectively independent positions in the coded stream. Under this assumption, the decoder sees not an MM-ary channel but LL separate binary channels operating in parallel. Each binary channel is characterised by a single scalar per-bit metric, and their capacities add.

This section makes that picture precise. We define the per-bit channel carefully, prove that ideal interleaving forces it to be memoryless, and derive the soft LLR metric that the demapper uses. The derivation looks innocent, but it contains a subtle twist that every student of BICM must see: the bit channels are marginally independent from the decoder's viewpoint, but they are not jointly independent β€” they all depend on the same underlying yy. The BICM capacity we will derive in Β§3 is the correct capacity for a decoder that processes the bits as if they were marginally independent. This is a mismatched decoding rule, and the capacity is the corresponding generalised mutual information.

Definition:

The β„“\ell-th BICM Bit Channel

Fix a constellation X\mathcal{X} of size M=2LM = 2^L, a labelling ΞΌ\mu, a memoryless channel law p(y∣x)p(y \mid x), and uniform inputs on X\mathcal{X}. For each bit position β„“βˆˆ{0,…,Lβˆ’1}\ell \in \{0, \ldots, L-1\}, the β„“\ell-th BICM bit channel Wβ„“W_\ell is the scalar binary-input channel with:

  • Input alphabet {0,1}\{0,1\}, with uniform prior P(Bβ„“=0)=P(Bβ„“=1)=12P(B_\ell = 0) = P(B_\ell = 1) = \tfrac12.
  • Output YY (taking values in the original channel's output alphabet).
  • Transition law pWβ„“(y∣b)β€…β€Š=β€…β€Š1M/2βˆ‘x∈Xβ„“(b)p(y∣x),p_{W_\ell}(y \mid b) \;=\; \frac{1}{M/2} \sum_{x \in \mathcal{X}_\ell^{(b)}} p(y \mid x), where Xβ„“(b)={x∈X:ΞΌβˆ’1(x)β„“=b}\mathcal{X}_\ell^{(b)} = \{x \in \mathcal{X} : \mu^{-1}(x)_\ell = b\} is the subset of constellation points whose β„“\ell-th label bit is bb.

The soft bit metric or log-likelihood ratio is Ξ»β„“(y)β€…β€Š=β€…β€Šlog⁑pWβ„“(y∣0)pWβ„“(y∣1)β€…β€Š=β€…β€Šlogβ‘βˆ‘x∈Xβ„“(0)p(y∣x)βˆ‘x∈Xβ„“(1)p(y∣x).\lambda_\ell(y) \;=\; \log \frac{p_{W_\ell}(y \mid 0)}{p_{W_\ell}(y \mid 1)} \;=\; \log \frac{\sum_{x \in \mathcal{X}_\ell^{(0)}} p(y \mid x)}{\sum_{x \in \mathcal{X}_\ell^{(1)}} p(y \mid x)}.

This is the quantity the demapper computes and passes (through the de-interleaver) to the binary decoder.

The definition averages over the M/2M/2 other bits: the bits at positions β„“β€²β‰ β„“\ell' \ne \ell are marginalised out uniformly. This is what makes the bit channel scalar; it is also what costs capacity relative to MLC, which keeps those bits as conditioning variables.

Theorem: Ideal Interleaver β‡’\Rightarrow Memoryless Parallel Bit Channels

Consider the BICM encoder of Definition DBICM Encoder with an ideal interleaver β€” i.e., a random permutation selected uniformly from the symmetric group on NN elements, revealed to both transmitter and receiver. Then from the binary decoder's viewpoint, the per-bit LLR stream {Ξ»n}n=1N\{\lambda_n\}_{n=1}^N is produced by a memoryless binary-input channel WBICMW_{\rm BICM} whose transition law is the mixture pWBICM(y∣b)β€…β€Š=β€…β€Š1Lβˆ‘β„“=0Lβˆ’1pWβ„“(y∣b).p_{W_{\rm BICM}}(y \mid b) \;=\; \frac{1}{L} \sum_{\ell = 0}^{L-1} p_{W_\ell}(y \mid b). Equivalently, the BICM bit channel is the time-averaged random parallel channel over the LL bit positions, with each position selected with probability 1/L1/L.

The interleaver shuffles coded bits so thoroughly that the nn-th coded bit is, from the decoder's perspective, equally likely to have been mapped to any of the LL label positions inside any constellation symbol β€” with all other bits of that symbol drawn independently from the interleaver's output. This converts the structured joint symbol channel into the mixture channel above, which is memoryless by construction.

,

Max-Log Per-Bit LLR Computation at the BICM Demapper

Complexity: O(Mβ‹…L)O(M \cdot L) arithmetic operations per received symbol
Input: received symbol y∈Cy \in \mathbb{C}; constellation
X={x1,…,xM}\mathcal{X} = \{x_1, \ldots, x_M\}; labelling ΞΌ\mu; noise variance
Οƒ22{\sigma^2}^{2}
Output: vector of LLRs (Ξ»0,Ξ»1,…,Ξ»Lβˆ’1)(\lambda_0, \lambda_1, \ldots, \lambda_{L-1})
1. Compute di←βˆ₯yβˆ’xiβˆ₯2d_i \leftarrow \|y - x_i\|^2 for i=1,…,Mi = 1, \ldots, M
2. for β„“=0,…,Lβˆ’1\ell = 0, \ldots, L-1 do
3. dβ„“(0)←min⁑i:ΞΌβˆ’1(xi)β„“=0di\quad d_\ell^{(0)} \leftarrow \min_{i : \mu^{-1}(x_i)_\ell = 0} d_i
4. dβ„“(1)←min⁑i:ΞΌβˆ’1(xi)β„“=1di\quad d_\ell^{(1)} \leftarrow \min_{i : \mu^{-1}(x_i)_\ell = 1} d_i
5. λℓ←(dβ„“(1)βˆ’dβ„“(0))/Οƒ22\quad \lambda_\ell \leftarrow (d_\ell^{(1)} - d_\ell^{(0)}) / {\sigma^2}^{2}
6. end for
7. return (Ξ»0,…,Ξ»Lβˆ’1)(\lambda_0, \ldots, \lambda_{L-1})

This is the max-log approximation: it replaces logβ‘βˆ‘iexp⁑(βˆ’di/Οƒ22)\log \sum_i \exp(-d_i / {\sigma^2}^{2}) by βˆ’min⁑idi/Οƒ22-\min_i d_i / {\sigma^2}^{2}. For moderate to high SNR the approximation is tight to within a fraction of a dB; for 5G NR and Wi-Fi receivers this is the standard implementation.

Per-Level Bit-Channel Capacities Cβ„“C_\ell vs. SNR

Each of the L=log⁑2ML = \log_2 M BICM bit channels has its own capacity Cβ„“=I(Y;Bβ„“)C_\ell = I(Y; B_\ell), which depends on the labelling. Under Gray labelling, all LL curves look qualitatively similar β€” each bit sees an "average" binary-input AWGN-like channel. Under SP labelling, the curves split: the top-level bit b0b_0 has near-Shannon capacity (because it lives on well-separated cosets), while the bottom-level bit bLβˆ’1b_{L-1} has near-zero capacity at low SNR (its cosets overlap heavily). This split is exactly why SP is optimal for MLC (one matches a low-rate code to b0b_0 and a high-rate code to bLβˆ’1b_{L-1}) and suboptimal for BICM (the sum of unconditional capacities is smaller).

Parameters

Example: The Four Bit Channels of 16-QAM at 10 dB

For 16-QAM at SNR=10\text{SNR} = 10 dB under Gray labelling, compute the four bit- channel capacities C0,C1,C2,C3C_{0}, C_{1}, C_{2}, C_{3} and compare to the SP labelling. Explain the pattern.

BICM as Mismatched Decoding

Strictly speaking, the BICM decoder treats the LL label bits of each symbol as if they were outputs of independent binary channels, whereas they are in fact deterministic functions of the same yy. This makes the BICM decoding metric a mismatched metric β€” it is not the true likelihood of the codeword given yy. The capacity of a channel under a mismatched decoder is the generalised mutual information (GMI), and the BICM capacity formula βˆ‘β„“I(Y;Bβ„“)\sum_\ell I(Y; B_\ell) is in fact the GMI under the product-form decoding metric. Chapter 7 revisits this mismatch perspective rigorously following MartΓ­nez–GuillΓ©n i FΓ bregas–Caire (2008). For now, remember: the capacity we are about to derive is operationally correct for the BICM decoder as implemented, not for an optimal MM-ary ML decoder β€” which would of course achieve CCMC_{\rm CM}.

Common Mistake: "Independent Bit Channels" β€” Marginal, Not Joint

Mistake:

Assuming that the LL bit positions B0,B1,…,BLβˆ’1B_0, B_1, \ldots, B_{L-1} of a constellation symbol are jointly independent given YY.

Correction:

They are marginally independent in the sense that the BICM capacity formula treats each I(Y;Bβ„“)I(Y; B_\ell) separately β€” but they are all deterministic functions of the same yy, so jointly conditional on YY they have a specific joint distribution that generally does not factorise. The quantity I(Y;B0,B1,…,BLβˆ’1)I(Y; B_0, B_1, \ldots, B_{L-1}) (the CM capacity) is therefore greater than or equal to βˆ‘β„“I(Y;Bβ„“)\sum_\ell I(Y; B_\ell) (the BICM capacity), with equality only when the joint conditional really does factorise β€” which is a knife-edge condition.

The BICM decoder acts as if the bits were jointly independent; that mismatch is precisely what costs it the gap to CM capacity.

Quick Check

Under the ideal-interleaver assumption, the BICM bit channel seen by the binary decoder is

an MM-ary channel whose output is yy and whose input is the LL-tuple (B0,…,BLβˆ’1)(B_0, \ldots, B_{L-1})

a uniform mixture of the LL per-position binary channels Wβ„“W_\ell, each selected with probability 1/L1/L

a deterministic channel (no noise) because interleaving removes correlations

a BSC with crossover probability equal to the symbol error rate of the constellation

Bit Channel (BICM)

The scalar binary-input channel Wβ„“W_\ell seen by bit position β„“\ell after marginalising out the other Lβˆ’1L-1 label bits and the interleaver. Its transition law is the average of the full-symbol likelihood over the two subsets Xβ„“(0)\mathcal{X}_\ell^{(0)} and Xβ„“(1)\mathcal{X}_\ell^{(1)}. The sum of its capacities across β„“\ell is the BICM capacity.

Related: The β„“\ell-th BICM Bit Channel, BICM Capacity, Max-Log Per-Bit LLR Computation at the BICM Demapper

Log-Likelihood Ratio (LLR)

For a binary-input channel, λ=log⁑P(B=0∣y)P(B=1∣y)\lambda = \log \frac{P(B=0 \mid y)}{P(B=1 \mid y)}. The sufficient soft statistic for optimal binary decoding; modern BICM receivers compute one LLR per bit position per received symbol and feed them to a soft-input binary decoder.

Related: Bit Channel (BICM), Demapper with A Priori Information