Multistage Decoding and Error Propagation

Decoding Level by Level

The capacity rule of s02 is an achievability statement: there exist codes at rates Ri=CiR_i = C_i that, together with the right receiver, approach the CM capacity. What is the right receiver?

An optimal joint decoder would weigh all 2nR0×2nR1××2nRL12^{nR_0} \times 2^{nR_1} \times \cdots \times 2^{nR_{L-1}} possible label-bit sequences against the received signal — exponentially many in the total rate. That is out of reach in practice. Multistage decoding (MSD) instead exploits the conditioning structure of the capacity rule directly: decode the levels one at a time, passing the decoded bits as side information to the next stage. The receiver's work at stage ii is a single-level binary decode of complexity O(2nRi)O(2^{nR_i}) (already exponential, but managed by the binary-decoder pipeline), and the total complexity is the sum, not the product.

The price of this simplification is conceptually subtle: if a stage makes a decoding error, the next stage receives a wrong history and operates on a shifted coset. This is error propagation, and understanding it is what this section is about.

,

Multistage Decoding (MSD) for Multilevel Codes

Complexity: O(i=0L1n2Li+Di(n,Ri))O\bigl(\sum_{i=0}^{L-1} n \cdot 2^{L-i} + D_i(n, R_i)\bigr), where Di(n,Ri)D_i(n, R_i) is the decoding cost of the binary code Ci\mathcal{C}_i. The first term sums per-symbol LLR computations over the shrinking coset sizes 2Li2^{L-i}; the second term is the per-stage binary decode.
Input: Received signal y=(y(1),,y(n))\mathbf{y} = (y^{(1)}, \ldots, y^{(n)}), nn
channel uses, LL binary codes C0,,CL1\mathcal{C}_0, \ldots, \mathcal{C}_{L-1}
of rates R0,,RL1R_0, \ldots, R_{L-1}, partition-based labelling μ\mu.
Output: Estimates b^0,b^1,,b^L1\hat{b}_0, \hat{b}_1, \ldots, \hat{b}_{L-1} of the
binary codewords at all levels.
1. Stage 0 (top level, coarsest partition).
2. \quad For each t=1,,nt = 1, \ldots, n, compute the bit-level log-likelihood ratio
3. Λ0(t)  =  logxA1(0)p(y(t)x)xA1(1)p(y(t)x)\quad\quad \Lambda_0^{(t)} \;=\; \log \dfrac{\sum_{x \in \mathcal{A}_1^{(0)}} p(y^{(t)} \mid x)}{\sum_{x \in \mathcal{A}_1^{(1)}} p(y^{(t)} \mid x)}
4. \quad Feed {Λ0(t)}\{\Lambda_0^{(t)}\} to the decoder of C0\mathcal{C}_0; obtain
b^0=(b^0(1),,b^0(n))\hat{b}_0 = (\hat b_0^{(1)}, \ldots, \hat b_0^{(n)}).
5. For i=1,2,,L1i = 1, 2, \ldots, L - 1 do:
6. \quad For each t=1,,nt = 1, \ldots, n, compute the conditional LLR
7. Λi(t)  =  logxAi+1(b^0(t),,b^i1(t),0)p(y(t)x)xAi+1(b^0(t),,b^i1(t),1)p(y(t)x)\quad\quad \Lambda_i^{(t)} \;=\; \log \dfrac{\sum_{x \in \mathcal{A}_{i+1}^{(\hat b_0^{(t)}, \ldots, \hat b_{i-1}^{(t)}, 0)}} p(y^{(t)} \mid x)}{\sum_{x \in \mathcal{A}_{i+1}^{(\hat b_0^{(t)}, \ldots, \hat b_{i-1}^{(t)}, 1)}} p(y^{(t)} \mid x)}
8. \quad Feed {Λi(t)}\{\Lambda_i^{(t)}\} to the decoder of Ci\mathcal{C}_i; obtain
b^i=(b^i(1),,b^i(n))\hat{b}_i = (\hat b_i^{(1)}, \ldots, \hat b_i^{(n)}).
9. End for.
10. Return (b^0,b^1,,b^L1)(\hat{b}_0, \hat{b}_1, \ldots, \hat{b}_{L-1}).

At stage ii the LLR computation involves a sum over the 2Li2^{L-i} points inside the current coset — a natural consequence of marginalising over the still-unknown levels i+1,,L1i+1, \ldots, L-1. The MLC/MSD combination therefore has linear complexity in LL, compared with joint decoding's exponential complexity in the total rate.

,

Effective BER at MSD Stages with Error Propagation

The plot shows the effective bit-error rate at each MSD stage as a function of SNR, assuming the previous stage completes with a given residual BER. For small prior-stage BER (say 10410^{-4}) the penalty is negligible. As the prior BER grows, the later-stage curve shifts rightward: with some probability the wrong coset is assumed and the effective squared distance is halved, costing up to 33 dB. This is the quantitative statement of error propagation.

Parameters
0.001

Theorem: MSD Achieves the CM Capacity

Let {(Ci)}i=0L1\{(C_i)\}_{i=0}^{L-1} be the binary sub-channel capacities defined in s02. For every ϵ>0\epsilon > 0 there exist binary codes of lengths nin_i and rates Ri=CiϵR_i = C_i - \epsilon such that the MSD algorithm above recovers the transmitted label bits with aggregate error probability going to zero as minini\min_i n_i \to \infty. The total rate iRiiCi=CCM\sum_i R_i \to \sum_i C_i = C_{\rm CM}. Consequently, MSD achieves the full CM capacity, and does so with per-stage complexity determined by binary decoding alone.

At stage 0 the decoder operates on a binary channel of capacity C0C_0 at rate C0ϵC_0 - \epsilon, so its error probability vanishes. At stage i>0i > 0 the decoder operates on a channel whose capacity is CiC_i conditional on correct history. Because history errors vanish by the previous stages' reliability, the stage-ii error probability vanishes as well. A union bound over LL stages then gives a vanishing total error probability.

,

Theorem: Error-Propagation Bound at Stage ii

Let p<i=Pr(at least one of b^0,,b^i1 is in error on a given symbol)p_{<i} = \Pr(\text{at least one of } \hat b_0, \ldots, \hat b_{i-1} \text{ is in error on a given symbol}) and let Pe,igenieP_{e,i}^{\rm genie} be the stage-ii bit-error probability when the history is delivered by a genie. The actual stage-ii bit-error probability in MSD satisfies

Pe,i    (1p<i)Pe,igenie  +  p<iPe,iwrong-coset    Pe,igenie+p<i,P_{e,i} \;\le\; (1 - p_{<i}) \, P_{e,i}^{\rm genie} \;+\; p_{<i} \, P_{e,i}^{\rm wrong\text{-}coset} \;\le\; P_{e,i}^{\rm genie} + p_{<i},

where Pe,iwrong-coset1P_{e,i}^{\rm wrong\text{-}coset} \le 1 is the worst-case bit-error probability when the stage-ii decoder operates on the wrong coset.

On a given symbol, either the history is correct (probability 1p<i1 - p_{<i}, genie-like performance) or it is wrong (probability p<ip_{<i}, at most worst-case error). The bound says the extra error due to propagation is at most the probability of a history error.

Example: Quantifying the Propagation Penalty at Level 1

Suppose an MSD receiver for 8-PSK operates with stage-0 residual BER p<1=103p_{<1} = 10^{-3}. The genie stage-1 BER at some SNR is Pe,1genie=104P_{e,1}^{\rm genie} = 10^{-4} (hypothetical). Under a pessimistic model, a history error at stage 0 causes stage 1 to operate with effective squared distance halved (from d12=2d_1^2 = 2 to d12/2=1d_1^2 / 2 = 1). What is the total stage-1 BER, and by how many dB is stage 1 degraded compared with the genie case?

MSD as Outer-Inner Code Concatenation

The MSD structure is best understood as a concatenation where the constellation plays the role of the inner code and each level's binary code plays the role of an outer code.

At stage 0 the inner "code" is the partition A0=X\mathcal{A}_0 = \mathcal{X} viewed as a binary-input channel (b0b_0 selects one of the two sub-constellations A1(0)\mathcal{A}_1^{(0)} and A1(1)\mathcal{A}_1^{(1)}); the outer code C0\mathcal{C}_0 protects b0b_0. At stage 1, given b^0\hat b_0, the inner code is the binary partition of A1(b^0)\mathcal{A}_1^{(\hat b_0)} into two halves; C1\mathcal{C}_1 protects b1b_1. And so on.

This is the same pattern as Reed–Solomon over Galois fields combined with a binary inner code — except here the "Galois field" structure is the Euclidean-distance structure of the constellation. The point is that the concatenation turns the non-binary CM problem into a sequence of binary coding problems, one per level — and that is why we can re-use the entire binary coding toolchain.

,

Common Mistake: MSD is not joint ML, but achieves the same capacity

Mistake:

Believing that because MSD is a sequential decoder it cannot achieve the CM capacity — surely a joint ML decoder over all levels simultaneously would do better.

Correction:

Joint ML over all levels minimises block error probability at a given finite code length. But from an information-theoretic standpoint both receivers achieve the same capacity CCM=iCiC_{\rm CM} = \sum_i C_i. The chain rule of mutual information is blind to how the decoder operates — it only describes what rates are achievable, given some decoder that approaches the binary-channel thresholds at each stage. MSD does exactly that. The finite-length gap between MSD and joint ML is usually small and can be closed by iterative feedback between stages (turbo-MSD).

Why This Matters: From MSD to Iterative Bit-Level Decoders

Modern receivers for coded modulation — whether MLC-style or BICM-style — rarely run pure MSD any more. Instead they iterate between the detector (LLR computation) and the outer code, passing extrinsic information back and forth. This is the bit-interleaved iterative decoding (BIID) pattern that powers every LDPC/turbo-code-based wireless modem from 3G onwards.

The MSD analysis of this section is the information-theoretic skeleton of that iterative receiver: the capacity rule still dictates the asymptotic rates, but at finite length the iterations recover the joint-ML performance (or most of it) at modest complexity. We revisit this pattern in the BICM chapters (Chs 5–9) and in the MIMO detection chapters of book MIMO.

,

Multistage decoding (MSD)

A sequential decoder for multilevel codes that decodes one binary code per level in order, using the decoded bits from previous stages as side information for the next. Achieves the CM capacity when paired with the capacity-rule rate allocation.

Related: Multilevel Code (MLC) Encoder, The Capacity Rule for MLC, Error propagation (in MSD)

Error propagation (in MSD)

The phenomenon whereby decoding errors at an early MSD stage are passed to later stages as incorrect history, causing them to operate on the wrong coset. Quantified by the bound Pe,iPe,igenie+p<iP_{e,i} \le P_{e,i}^{\rm genie} + p_{<i}.

Related: MSD Achieves the CM Capacity, Genie Aided

Quick Check

How does the computational complexity of MSD scale with the number of levels L=log2ML = \log_2 M, compared with a joint ML decoder over all levels?

Both scale the same way — linearly in LL

MSD scales linearly in LL; joint ML scales exponentially in iRi\sum_i R_i

MSD is exponential in LL; joint ML is linear

Both are exponential, but MSD has a smaller exponent