Ferkans — Interactive Telecom Tutor

The Open Loop in One-Shot BICM

Look back at the BICM receiver of Chapter 5. A channel observation $y$ enters the demapper, which produces $L$ soft per-bit LLRs $\lambda_\ell(y)$ , one per label bit. The deinterleaver sends these LLRs to a SISO decoder, which produces per-bit a posteriori LLRs and decisions. At that point the receiver stops. The decoder's a posteriori output — which contains information the demapper did not have — is DISCARDED before it can help the demapper do a better job on the same observation.

This is exactly the suboptimality argument that motivated the whole Chapter 5 discussion of $C_{\rm BICM}(\mu) \le C_{\rm CM}$ . The BICM demapper treats each label bit as independent, because it has no reason to believe the bits are correlated; but the binary code MAKES them correlated, and the decoder knows that correlation. If the demapper could exploit what the decoder knows, its per-bit LLRs would be sharper, and the decoder would then produce sharper outputs, and so on. That feedback loop is BICM with Iterative Decoding (BICM-ID), introduced independently by Li–Ritcey and by ten Brink–Speidel–Yan in 1997–1998. The point is simple: nothing in the demapper computation $\lambda_\ell(y) = \log \sum_{s \in \mathcal{X}_\ell^{(0)}} p(y \mid s) - \log \sum_{s \in \mathcal{X}_\ell^{(1)}} p(y \mid s)$ prevents us from incorporating a priori bit probabilities into the sums. Doing so gives a refined LLR; passing the refinement through the decoder gives an even better one; iterating closes the gap to CM capacity.

What is new is not any single computation — the demapper with a priori and the SISO decoder both existed before BICM-ID — but the ARCHITECTURE: the extrinsic information exchanged between the two boxes, iterated to a fixed point. The analysis of that iteration is the content of this chapter.

Definition:
Demapper with A Priori Information

Let $\mathbf{b} = (b_0, \ldots, b_{L-1}) \in \{0,1\}^L$ be the label of a transmitted constellation symbol $s = \mu(\mathbf{b})$ , and let $\lambda_A = (\lambda_{A,0}, \ldots, \lambda_{A,L-1})$ be the vector of a-priori LLRs supplied to the demapper (one per label bit, from the decoder in the previous iteration). The demapper with a priori computes, for each bit position $\ell$ , $\lambda_\ell(y; \lambda_A) \triangleq \log \frac{\sum_{s \in \mathcal{X}_\ell^{(0)}} p(y \mid s) \cdot \prod_{k \ne \ell} P(b_k = \mu^{-1}_k(s) \mid \lambda_{A,k})} {\sum_{s \in \mathcal{X}_\ell^{(1)}} p(y \mid s) \cdot \prod_{k \ne \ell} P(b_k = \mu^{-1}_k(s) \mid \lambda_{A,k})},$ where $P(b_k = c \mid \lambda_{A,k}) = 1/(1 + \exp((1 - 2c)\lambda_{A,k}))$ is the Bernoulli probability induced by the a-priori LLR, and $\mu^{-1}_k(s)$ is the $k$ -th label bit of $s$ .

Two limiting cases check the definition: if all $\lambda_{A,k} = 0$ , the a-priori probabilities are uniform and the formula reduces to the one-shot BICM LLR of Ch. 5. If $\lambda_{A,k} \to \pm\infty$ for all $k \ne \ell$ , the sums collapse to a single term and $\lambda_\ell(y; \lambda_A)$ becomes the CM max-log LLR against the known symbol — the demapper has become a genie.

,

Definition:
Extrinsic Information

At any SISO box — demapper or decoder — the input is a vector of a-priori LLRs $\lambda_A$ and a set of channel observations, and the output is a vector of a-posteriori LLRs $\lambda_{\rm out}$ . The extrinsic LLR at bit position $\ell$ is $\lambda_{E,\ell} \triangleq \lambda_{\rm out, \ell} - \lambda_{A,\ell}.$ Equivalently, $\lambda_{E,\ell}$ is the log-likelihood ratio of $b_\ell$ given ALL OTHER evidence — the channel observations and the a-priori LLRs of all bits $k \ne \ell$ . The extrinsic component is the "new" information the box has produced; subtracting $\lambda_{A,\ell}$ is what prevents the iterative scheme from feeding a box its own previous output back to itself, which would create a positive-feedback loop.

The extrinsic/a-priori decomposition is one of the great conceptual inventions of iterative coding theory. It works because LLRs add under independence, so subtracting the a-priori from the a-posteriori yields the contribution of everything BUT that a-priori. In BICM-ID the decoder extrinsic LLRs become the demapper's a-priori LLRs in the next iteration, and vice versa — a strict separation that turns a feedback system into an iteration between two well-defined maps.

,

BICM-ID Iteration (Single Pass)

Complexity: Per iteration: demapper is

O(N \cdot M)

with

M = 2^L

constellation points (or

O(N \cdot L)

with max-log and smart subset indexing); SISO decoder depends on the code (BCJR:

O(N \cdot 2^\nu)

for

\nu

-state convolutional; LDPC BP:

O(N \cdot d_c)

per iteration). Total cost is the per-iteration cost times

T_{\max}

, typically 5–10 outer iterations in practice.

Input: channel observations y_1, ..., y_N

a-priori LLR vector from decoder: lambda_A_dec (initialised to 0)

constellation X, labelling mu, bit subsets X_ell^(b)

SISO decoder for outer code C

maximum iteration count T_max

for t = 0, 1, ..., T_max - 1:

# ---- Demapper box ----

for each symbol index i = 1, ..., N:

for each label bit position ell = 0, ..., L - 1:

# a-priori for THIS bit comes from decoder; pass ALL others forward

lambda_dem_ell(y_i; lambda_A_dec)

= log [ sum over s in X_ell^(0) of

p(y_i | s) * prod_{k != ell} P(b_k = mu_k^{-1}(s) | lambda_A_dec) ]

- log [ same with X_ell^(1) ]

lambda_E_dem(i, ell) = lambda_dem_ell - lambda_A_dec(i, ell)

# lambda_E_dem is now the EXTRINSIC demapper output

# It becomes the a-priori for the decoder:

lambda_A_dec_interleaved = lambda_E_dem

# ---- Deinterleave ----

lambda_A_dec = pi^{-1}(lambda_A_dec_interleaved)

# ---- Decoder box ----

(lambda_out_dec, lambda_E_dec) = SISO_decode(lambda_A_dec, code C)

# Check stopping criterion (CRC or max iterations)

if CRC_passes(hard_decision(lambda_out_dec)):

return hard_decision(lambda_out_dec)

# ---- Interleave extrinsic for next demapper pass ----

lambda_A_dec = pi(lambda_E_dec)

return hard_decision(lambda_out_dec)

The structure is a serial concatenation: demapper (inner) and decoder (outer). The interleaver $\pi$ randomises the bit order between them and is essential for making the extrinsic-information assumption of independence approximately valid. Without a long enough interleaver, bits correlated inside a code constraint get mapped to neighbouring constellation positions and the extrinsic information becomes contaminated by its own past — the classical iterative-decoder convergence failure.

BICM-ID Receiver Block Diagram — Receiver architecture for BICM-ID. The demapper outputs extrinsic LLRs, which after deinterleaving become the decoder's a-priori input. The decoder produces extrinsic LLRs in turn, which after interleaving become the demapper's a-priori input for the next pass. Subtraction of the a-priori from the a-posteriori at each box's output is what enforces the extrinsic-only exchange — feeding a box its own previous output would create a positive-feedback loop (cf. pitfall below).

Theorem: Fixed-Point Rate of a Converged BICM-ID Receiver

Consider BICM-ID with a binary linear code $C$ of rate $R$ and a labelling $\mu$ on a memoryless channel with input SNR. Assume the Gaussian-LLR model holds (s02) and the interleaver is long. Let $(I_A^*, I_E^*)$ be a fixed point of the iteration — i.e., a point satisfying $I_E^* = T_{\mathrm{dem}}(I_A^*, \mathrm{SNR})$ and $I_A^* = T_{\mathrm{dec}}(I_E^*, R)$ . If $(I_A^*, I_E^*) = (1, 1)$ , the receiver achieves arbitrarily low BER and the effective information rate through the combined channel is $R$ . If the only fixed point strictly below $(1, 1)$ is attracting, the iteration stalls there and the BER is bounded away from zero.

A fixed point is where the two maps cross. The iteration walks from $(0, 0)$ — no a priori, no extrinsic — up the staircase until it hits a crossing. If the crossing is $(1, 1)$ , the code's bits are all perfectly known and the BER is zero. If it is anything less, the bits are only partially known and there is an error floor. All the BICM-ID design story is about arranging the two curves so that the FIRST crossing encountered on the way up is $(1, 1)$ itself.

Show Hint

A fixed point $(I_A^*, I_E^*)$ is invariant under one round-trip of the iteration: after demapper + deinterleaver + decoder + interleaver, the MI values are unchanged.

The BER is a strictly decreasing function of the a-posteriori MI, which at a fixed point equals $I_A^* + I_E^*$ (well, once LLR variances are added — see s02).

At $(I_A^*, I_E^*) = (1, 1)$ the a-posteriori LLR has infinite variance, hence the BER is zero.

Proof

Step 1: Fixed-point characterisation

Writing one round-trip of the BICM-ID iteration as the composition $I_A \to T_{\mathrm{dem}}(I_A, \mathrm{SNR}) \to T_{\mathrm{dec}} (T_{\mathrm{dem}}(I_A, \mathrm{SNR}), R)$ , the fixed points are solutions of $I_A = T_{\mathrm{dec}}(T_{\mathrm{dem}}(I_A, \mathrm{SNR}), R)$ . Equivalently, they are the crossings of the demapper curve $I_E = T_{\mathrm{dem}}(I_A, \mathrm{SNR})$ and the INVERTED decoder curve $I_E = T_{\mathrm{dec}}^{-1}(I_A, R)$ on the $(I_A, I_E)$ -plane.

Step 2: BER at a fixed point

Under the Gaussian-LLR model, the a-posteriori LLR of a bit at the decoder output has mean $2 \sigma_{\rm apost}^2$ and variance $4\sigma_{\rm apost}^2$ , with $\sigma_{\rm apost}^2$ determined by the a-priori MI $I_A$ and channel observation MI. The bit error rate is $Q(\sigma_{\rm apost})$ , a strictly decreasing function of $\sigma_{\rm apost}^2$ . At $(I_A^*, I_E^*) = (1, 1)$ , $\sigma_{\rm apost}^2 \to \infty$ and BER $\to 0$ .

Step 3: Achievability of rate $R$

The iteration passes exactly $R$ bits per channel use from transmitter to receiver with vanishing BER; the overhead is zero because the iteration itself does not require any rate-costly signalling. The fixed point at $(1, 1)$ therefore achieves rate $R$ at the operating SNR, which is the information-theoretic rate of the iterative scheme. $\blacksquare$

,

Why Extrinsic — Not A Posteriori

The subtraction $\lambda_E = \lambda_{\rm out} - \lambda_A$ is not a mere bookkeeping convention. It is the MECHANISM that prevents BICM-ID from becoming a runaway positive-feedback system. If the demapper received $\lambda_{\rm out}$ from the decoder (the full a-posteriori), it would effectively be counting the channel observation twice: once directly through $p(y \mid s)$ , and a second time through the a-priori term, which the decoder computed FROM that same $p(y \mid s)$ . The "new" estimate would merely re-emphasise the old one, and after a few iterations every bit would be declared confident and wrong in equal measure. Passing only the extrinsic part keeps each iteration honest: the demapper learns from code structure (via the decoder's extrinsic), not from its own prior LLR.

This is the same principle that makes turbo decoding work, and it generalises to turbo equalisation, iterative MIMO detection, and iterative source-channel decoding. Memorise it.

Common Mistake: Feeding the A-Posteriori Back (Positive Feedback)

Mistake:

In a first implementation, it is tempting to simplify the BICM-ID receiver by feeding the decoder's a-posteriori LLRs back to the demapper — after all, these are "the best estimates we have" of the coded bits. Surely using better a priori can only help?

Correction:

The a-posteriori LLR for bit $b_\ell$ contains the CHANNEL observation $y$ (through the demapper) plus the code-induced cross-information from the other bits. Feeding it back as a-priori to the demapper makes the demapper multiply by $P(b \mid \lambda_A)$ , which already contains $p(y \mid s)$ — the channel is now counted twice. The iteration no longer tracks MI correctly: confidence inflates without justification, LLRs diverge to $\pm \infty$ , and decisions become arbitrary. The correct quantity to exchange is the EXTRINSIC part $\lambda_E = \lambda_{\rm out} - \lambda_A$ , which removes the double-counted contribution. Every EXIT-chart analysis, every density-evolution proof, every convergence theorem in this chapter relies on extrinsic feedback. Posterior feedback is a bug.

Example: Demapper-with-A-Priori for QPSK with Gray Labelling

Consider QPSK with Gray labelling $\mu_G$ : the four constellation points are $s = (\pm 1 \pm j)/\sqrt{2}$ with labels $(b_0, b_1) \in \{00, 01, 11, 10\}$ . The channel is AWGN, $y = s + w$ , $w \sim \mathcal{CN}(0, 2\sigma^2)$ . The decoder supplies a-priori LLRs $\lambda_A = (\lambda_{A,0}, \lambda_{A,1})$ . Compute the demapper extrinsic LLR $\lambda_{E,0}(y; \lambda_A)$ for bit 0.

Solution

Factor the bit subsets

Bit 0 is the I-axis LSB of Gray QPSK: $\mathcal{X}_0^{(0)} = \{s : \mathrm{Re}(s) > 0\}$ , $\mathcal{X}_0^{(1)} = \{s : \mathrm{Re}(s) < 0\}$ . Bit 1 is the Q-axis LSB, independent. Thus the a-priori factors across axes: $P(b_1 \mid \lambda_{A,1}) = \frac{1}{1 + \exp((1 - 2b_1)\lambda_{A,1})}.$

Separate I and Q contributions

Because $|y - s|^2 = (y_I - s_I)^2 + (y_Q - s_Q)^2$ and the label bits decouple across axes, the demapper-with-a-priori LLR for bit 0 reduces to $\lambda_0(y; \lambda_A) = \log \frac {e^{-(y_I - 1/\sqrt{2})^2/(2\sigma^2)}} {e^{-(y_I + 1/\sqrt{2})^2/(2\sigma^2)}} = \frac{\sqrt{2}\, y_I}{\sigma^2}.$ Note that $\lambda_{A,1}$ (the Q-axis a priori) drops out because the Q-axis subsets are identical on both sides of the bit-0 partition — Gray decoupling.

Extrinsic is the full LLR (for Gray QPSK)

The input a-priori for bit 0 was $\lambda_{A,0}$ , not included in the demapper's bit-0 computation (it would be double-counting), so $\lambda_{E,0}(y) = \lambda_0(y) - 0 = \sqrt{2} y_I/\sigma^2$ . Equivalently, for Gray QPSK the demapper extrinsic is just the channel LLR, independent of a priori: BICM-ID on Gray QPSK is pointless because the two label bits are already independent bit channels. This is exactly the sense in which Gray is "one-shot optimal" and BICM-ID offers nothing extra — a pattern that generalises (s04) to all labelings that fully decouple bit channels.

Example: Demapper-with-A-Priori for 16-QAM with Set-Partition Labelling

Consider 16-QAM with the set-partition labelling $\mu_{\rm SP}$ : the MSB bit splits the constellation into two 8-point QAM lattices of large minimum distance, and successively finer bits resolve inside each subset. Suppose the decoder supplies an a-priori LLR $\lambda_{A,0} = +\infty$ for the MSB (perfect knowledge that bit 0 = 0). Describe what the demapper LLRs for bits 1, 2, 3 become, and contrast with the case $\lambda_{A,0} = 0$ .

Solution

Zero a priori: one-shot BICM demapper

With $\lambda_{A,0} = 0$ , the demapper sums over all 16 points when computing $\lambda_\ell$ for any bit $\ell \ne 0$ . For the LSB of the I-axis under SP, pairs of neighbours at minimum distance $d_{\min}$ alternate between bit-0 = 0 and bit-0 = 1, so the two subsets $\mathcal{X}_\ell^{(0)}$ and $\mathcal{X}_\ell^{(1)}$ overlap heavily — the demapper cannot tell "bit 1 = 0" from "bit 1 = 1" with high confidence. This is exactly why SP labelling has lower BICM capacity than Gray (Ch. 5).

Perfect a priori on bit 0: constellation shrinks to 8 points

With $\lambda_{A,0} = +\infty$ , the demapper sums ONLY over the 8 points with $b_0 = 0$ — the 8-QAM sub-constellation of larger minimum distance. For a fixed label bit $\ell > 0$ , the two subsets $\mathcal{X}_\ell^{(0)} \cap \{b_0 = 0\}$ and $\mathcal{X}_\ell^{(1)} \cap \{b_0 = 0\}$ are now separated by the SP sub-partition's minimum distance, which is a factor $\sqrt{2}$ – $2$ larger than the original. The demapper LLRs become much sharper.

The BICM-ID gain mechanism for SP

This is the quantitative version of the "SP wins under iteration" story. SP concentrates bit-discrimination information in the MSB; learning the MSB via iteration unlocks larger minimum distances for the remaining bits. Gray distributes discrimination evenly, so learning one bit does not unlock much for the others — and BICM-ID gains little. Section 4 quantifies this via EXIT curve steepness. $\blacksquare$

Quick Check

In BICM-ID, what is exchanged between the demapper and the decoder on each iteration?

The channel LLRs (demapper computes them once and decoder reuses them every iteration).

The full a-posteriori LLRs from each box.

The extrinsic LLRs: a-posteriori minus a-priori at each box's output.

The hard decisions after each decoder pass.

Correction:

The extrinsic LLRs: a-posteriori minus a-priori at each box's output.

Correct. The extrinsic decomposition enforces that each box's output reflects only what that box has LEARNED, not what it was told. This is the fundamental mechanism of iterative decoding.

Key Takeaway

BICM-ID is a two-box serial iteration with extrinsic-only exchange. The demapper with a priori sharpens per-bit LLRs using the code's cross-bit correlations (via decoder extrinsic), and the SISO decoder sharpens a-posteriori LLRs using the refined per-bit LLRs (via demapper extrinsic). Subtracting the a-priori at each output is what keeps the iteration from double-counting. Gray labelling gets nothing from the loop — it fully decouples bits already. Set-partition labelling gets a lot — it concentrates bit information and the loop unlocks the concentration.

Why This Matters: BICM-ID in DVB-S2X Very-Low-SNR Modes

The DVB-S2X standard extends DVB-S2 with a set of very-low-SNR MODCODs (down to QPSK at rate $2/9$ , an operating $E_s/N_0$ near $-2.85$ dB) for sub-1 m satellite terminals in tropical-scintillation environments. At these SNRs, the one-shot BICM receiver leaves several tenths of a dB on the table relative to CM capacity — enough that the standard includes an optional iterative demapping mode. The receiver performs 3–5 passes between the APSK demapper and the LDPC decoder before declaring a codeword. In practice this closes about 0.3 dB of the gap on the VL-SNR modes and pushes the link budget just over the rain-fade edge that the system was designed for.

Extrinsic Information

At any soft-in/soft-out decoding box, the extrinsic LLR for a bit is the a-posteriori LLR minus the a-priori LLR for that same bit. It represents the box's learning — the information contributed by the box from sources other than its own input for that bit. In BICM-ID, extrinsic LLRs are the only quantity exchanged between demapper and decoder, to prevent positive feedback.

BICM-ID (Bit-Interleaved Coded Modulation with Iterative Decoding)

A BICM receiver architecture that iterates between a demapper-with- a-priori and a soft-in/soft-out decoder, exchanging extrinsic information through a bit interleaver. Introduced by Li and Ritcey (1997) and ten Brink–Speidel–Yan (1998), BICM-ID can close most of the gap between BICM capacity and CM capacity when combined with a set-partition (or similar) labelling.

Iterative Demapping: Closing the BICM Loop