Multistage Decoding and Error Propagation
Decoding Level by Level
The capacity rule of s02 is an achievability statement: there exist codes at rates that, together with the right receiver, approach the CM capacity. What is the right receiver?
An optimal joint decoder would weigh all possible label-bit sequences against the received signal — exponentially many in the total rate. That is out of reach in practice. Multistage decoding (MSD) instead exploits the conditioning structure of the capacity rule directly: decode the levels one at a time, passing the decoded bits as side information to the next stage. The receiver's work at stage is a single-level binary decode of complexity (already exponential, but managed by the binary-decoder pipeline), and the total complexity is the sum, not the product.
The price of this simplification is conceptually subtle: if a stage makes a decoding error, the next stage receives a wrong history and operates on a shifted coset. This is error propagation, and understanding it is what this section is about.
Multistage Decoding (MSD) for Multilevel Codes
Complexity: , where is the decoding cost of the binary code . The first term sums per-symbol LLR computations over the shrinking coset sizes ; the second term is the per-stage binary decode.At stage the LLR computation involves a sum over the points inside the current coset — a natural consequence of marginalising over the still-unknown levels . The MLC/MSD combination therefore has linear complexity in , compared with joint decoding's exponential complexity in the total rate.
Effective BER at MSD Stages with Error Propagation
The plot shows the effective bit-error rate at each MSD stage as a function of SNR, assuming the previous stage completes with a given residual BER. For small prior-stage BER (say ) the penalty is negligible. As the prior BER grows, the later-stage curve shifts rightward: with some probability the wrong coset is assumed and the effective squared distance is halved, costing up to dB. This is the quantitative statement of error propagation.
Parameters
Theorem: MSD Achieves the CM Capacity
Let be the binary sub-channel capacities defined in s02. For every there exist binary codes of lengths and rates such that the MSD algorithm above recovers the transmitted label bits with aggregate error probability going to zero as . The total rate . Consequently, MSD achieves the full CM capacity, and does so with per-stage complexity determined by binary decoding alone.
At stage 0 the decoder operates on a binary channel of capacity at rate , so its error probability vanishes. At stage the decoder operates on a channel whose capacity is conditional on correct history. Because history errors vanish by the previous stages' reliability, the stage- error probability vanishes as well. A union bound over stages then gives a vanishing total error probability.
Use the binary noisy-channel coding theorem at each stage.
Condition on the event 'all previous stages decoded correctly' and bound its complement by a sum of per-stage error probabilities.
A union bound over stages preserves exponential decay of error probability.
Per-stage error probability decays exponentially
At stage , given correct decoded history , the LLR computation produces likelihoods for a binary channel of capacity . Using a capacity-achieving binary code at rate with length , the stage- conditional error probability satisfies for some (Gallager random-coding exponent).
Union bound across stages
Let be the event that stage produces a wrong codeword. The aggregate error event is . By the chain ,
This tends to zero as because is a constant (depending only on the constellation) and .
Total rate approaches capacity
The total information rate is . As , the aggregate rate approaches , with vanishing aggregate error probability.
Theorem: Error-Propagation Bound at Stage
Let and let be the stage- bit-error probability when the history is delivered by a genie. The actual stage- bit-error probability in MSD satisfies
where is the worst-case bit-error probability when the stage- decoder operates on the wrong coset.
On a given symbol, either the history is correct (probability , genie-like performance) or it is wrong (probability , at most worst-case error). The bound says the extra error due to propagation is at most the probability of a history error.
Condition on the event .
Apply the law of total probability.
Bound the conditional error under by (worst-case).
Conditioning on the history event
Let be the event that the decoded history on a given symbol matches the true history. By the law of total probability,
Under the stage- decoder sees the genie channel, giving .
Worst-case bound on the propagated term
Under , the stage- decoder operates on a shifted coset and its bit-error probability is at most (a trivial upper bound) and at least . Using and bounding the worst case by ,
Example: Quantifying the Propagation Penalty at Level 1
Suppose an MSD receiver for 8-PSK operates with stage-0 residual BER . The genie stage-1 BER at some SNR is (hypothetical). Under a pessimistic model, a history error at stage 0 causes stage 1 to operate with effective squared distance halved (from to ). What is the total stage-1 BER, and by how many dB is stage 1 degraded compared with the genie case?
Apply the propagation bound
By Theorem thm-msd-error-propagation,
Estimate the wrong-coset BER
With squared distance halved the -function argument is divided by , costing about dB. At around the genie operating point, is typically an order of magnitude larger — say for this numerical example.
Combine
Substituting,
The propagation term is four orders of magnitude smaller than the genie term at this operating point — negligible. The rule of thumb: as long as , propagation causes less than a dB shift. For the capacity-rule rate allocation (where each stage operates at its own threshold) this condition is typically satisfied by a wide margin.
MSD as Outer-Inner Code Concatenation
The MSD structure is best understood as a concatenation where the constellation plays the role of the inner code and each level's binary code plays the role of an outer code.
At stage 0 the inner "code" is the partition viewed as a binary-input channel ( selects one of the two sub-constellations and ); the outer code protects . At stage 1, given , the inner code is the binary partition of into two halves; protects . And so on.
This is the same pattern as Reed–Solomon over Galois fields combined with a binary inner code — except here the "Galois field" structure is the Euclidean-distance structure of the constellation. The point is that the concatenation turns the non-binary CM problem into a sequence of binary coding problems, one per level — and that is why we can re-use the entire binary coding toolchain.
Common Mistake: MSD is not joint ML, but achieves the same capacity
Mistake:
Believing that because MSD is a sequential decoder it cannot achieve the CM capacity — surely a joint ML decoder over all levels simultaneously would do better.
Correction:
Joint ML over all levels minimises block error probability at a given finite code length. But from an information-theoretic standpoint both receivers achieve the same capacity . The chain rule of mutual information is blind to how the decoder operates — it only describes what rates are achievable, given some decoder that approaches the binary-channel thresholds at each stage. MSD does exactly that. The finite-length gap between MSD and joint ML is usually small and can be closed by iterative feedback between stages (turbo-MSD).
Why This Matters: From MSD to Iterative Bit-Level Decoders
Modern receivers for coded modulation — whether MLC-style or BICM-style — rarely run pure MSD any more. Instead they iterate between the detector (LLR computation) and the outer code, passing extrinsic information back and forth. This is the bit-interleaved iterative decoding (BIID) pattern that powers every LDPC/turbo-code-based wireless modem from 3G onwards.
The MSD analysis of this section is the information-theoretic skeleton of that iterative receiver: the capacity rule still dictates the asymptotic rates, but at finite length the iterations recover the joint-ML performance (or most of it) at modest complexity. We revisit this pattern in the BICM chapters (Chs 5–9) and in the MIMO detection chapters of book MIMO.
Multistage decoding (MSD)
A sequential decoder for multilevel codes that decodes one binary code per level in order, using the decoded bits from previous stages as side information for the next. Achieves the CM capacity when paired with the capacity-rule rate allocation.
Related: Multilevel Code (MLC) Encoder, The Capacity Rule for MLC, Error propagation (in MSD)
Error propagation (in MSD)
The phenomenon whereby decoding errors at an early MSD stage are passed to later stages as incorrect history, causing them to operate on the wrong coset. Quantified by the bound .
Related: MSD Achieves the CM Capacity, Genie Aided
Quick Check
How does the computational complexity of MSD scale with the number of levels , compared with a joint ML decoder over all levels?
Both scale the same way — linearly in
MSD scales linearly in ; joint ML scales exponentially in
MSD is exponential in ; joint ML is linear
Both are exponential, but MSD has a smaller exponent
MSD runs separate binary decoders in sequence — a linear sum of per-level costs. Joint ML decoding would evaluate likelihoods for every combination of codewords simultaneously, giving a complexity proportional to the product of codebook sizes, i.e., exponential in the total rate . The separation of levels is the key complexity win of MLC/MSD.