The Capacity Rule for Rate Allocation
Why a Rate Rule Is Even Possible
Before stating the capacity rule, consider what we are asking. MLC has separate binary codes, each with its own rate . In principle the designer has degrees of freedom. The question is: what rates should we pick?
The answer is striking. The point is that the chain rule of mutual information gives a complete decomposition of the capacity of the non-binary channel into a sum of binary capacities. This is not an approximation — it is exact. The cost is that the binary channels are not independent: level must condition on levels . This conditioning is the reason multistage decoding exists, and it is what separates MLC/MSD from BICM.
With the chain-rule decomposition in hand, the rate rule writes itself: pick equal to the capacity of the -th conditioned binary sub-channel. The sum telescopes to the full CM capacity. No single choice gives more, and any other choice gives less — this is the content of the main theorem of this section.
Definition: Binary Sub-Channel at Level
Binary Sub-Channel at Level
Fix a partition chain, a partition-based labelling , and the AWGN channel with . Let where each is uniform on and the bits are independent. The binary sub-channel at level is the channel with input and output — that is, the channel sees the received signal and a genie-provided history of the previously decoded bits. Its capacity is
with the understanding that .
The "genie-provided history" terminology is standard but misleading at first reading. What is really happening is that MSD decodes level using the decoded bits from levels . At rates below the capacity rule the probability of a history error is driven to zero, so the decoded bits equal the transmitted bits with high probability — and the genie assumption is justified in the information-theoretic limit.
Theorem: The Capacity Rule for MLC
Let be a constellation with , let be a partition-based labelling, and let with i.i.d.\ uniform label bits. Then
Consequently, an MLC scheme with binary codes of rates at each level achieves the full constellation capacity . No other choice of rates satisfying can be decoded reliably by multistage decoding unless at every level.
The equality is the chain rule of mutual information applied to the bijective correspondence . The "no-other-choice" part is the weak converse for binary channels applied level-by-level: with MSD as the receiver, each stage sees a binary channel of capacity , and a rate above at any level cannot be decoded with vanishing error probability.
Start with the chain rule: .
Argue that because is a bijection between label vectors and points.
For the second claim, note that MSD at stage sees a binary channel of capacity and apply the binary channel coding theorem.
Chain rule gives the decomposition
By the chain rule of mutual information,
Each term on the right is by definition (def-binary-subchannel).
Label vector ↔ constellation point bijection
Since is a bijection, the random variables and carry the same information: is a deterministic function of the label vector, and vice versa. Mutual information is invariant under one-to-one functions, so
Combining with the previous step yields , as claimed.
Achievability of $R_i = C_i$ via MSD
Fix and choose binary codes of rates and block length . By the noisy-channel coding theorem for binary channels, there exist codes whose error probability on the level- binary sub-channel goes to zero as .
Multistage decoding decodes first (against the unconditional channel ), then using as side information, and so on. Each stage operates at a rate strictly below its channel's capacity, so by a union bound over the stages, the aggregate error probability is at most for some positive error exponent . Letting , the aggregate rate approaches .
Tightness (the converse direction)
Suppose an MLC/MSD scheme uses rates with for some level . At stage the decoder sees a binary channel of capacity (conditional on the decoded history, which is correct with high probability under reasonable operating assumptions). By the converse to the binary channel coding theorem, the error probability at stage is bounded away from zero for all code lengths . Hence no MSD-achievable rate vector has at any level.
The converse combined with achievability establishes that the rate vector is the unique maximiser of subject to — and its sum is .
Key Takeaway
The capacity rule is the chain rule in disguise. For any partition-based labelling of any constellation, the full CM capacity decomposes exactly into a sum of conditional binary capacities, and MLC with achieves the sum. The rule is optimal, not heuristic — this is what makes MLC fundamentally different from ad hoc rate allocations.
Three Binary Sub-Channel Capacities vs. SNR for 8-PSK
For 8-PSK with the Ungerboeck partition chain, the plot shows the three binary sub-channel capacities as functions of SNR, together with their sum (the MLC/MSD capacity) and the Shannon limit . Observe that saturates to bit almost immediately: level 2 is effectively BPSK with squared distance . saturates next. is the bottleneck — the capacity of level 0 is what throttles the total at low-to-medium SNR.
Parameters
Example: 8-PSK Rate Allocation at dB
Using the Ungerboeck partition chain of 8-PSK and operating at dB (equivalently ), compute the binary sub-channel capacities , the total MLC/MSD capacity, and the implied rate allocation when the designer picks .
Intra-level squared distances and effective SNRs
At unit the intra-level squared distances are , , . The effective SNR per dimension for an antipodal binary channel of squared distance is . At this gives , , and for the three levels.
Capacity of a binary antipodal sub-channel
The capacity of a binary-input AWGN channel with squared distance and noise variance per dimension is well-approximated for our purpose by , where is the binary entropy function.
Plug in and compute
- Level 0: , so bit.
- Level 1: , so bit.
- Level 2: , so bit.
These numbers are the exact (to three digits) outputs of the interactive plot above.
Total capacity and rate allocation
Summing, bits/symbol. This compares favourably with the Shannon AWGN capacity bits/dim — the -bit gap is the modulation-capacity loss of 8-PSK at dB.
The capacity-rule allocation is . Notice how asymmetric it is: level 0 is a low-rate, heavily protected code; level 2 is nearly uncoded. A designer using the same-rate allocation would lose at level 0 — exceeds , so reliable decoding is impossible. This is the operational meaning of "no other allocation works."
Rate Allocation in Practice
In a real system one does not get to pick continuously. The available binary codes (LDPC, polar, convolutional) come in discrete rate classes: maybe rates . The designer picks the available rate closest to from below at each level (rounding down ensures reliable decoding). This discretisation costs a small amount of capacity — usually a fraction of a dB — and is the practical reason that the theoretical MLC-capacity curve in the next section is not fully reached by a specific implementation.
Common Mistake: The unconditional is NOT the capacity rule
Mistake:
Allocating rate — the unconditional mutual information between and the -th label bit — and expecting MSD to achieve the sum.
Correction:
The capacity rule uses the conditional mutual information at every level except . The sum of unconditional is precisely the BICM capacity , which is generally less than — so the unconditional allocation shortchanges the scheme. Section s04 makes this gap explicit.
Quick Check
Which identity lies at the heart of the MLC capacity rule?
The data processing inequality
Fano's inequality
The chain rule of mutual information:
Jensen's inequality for concave functions
The capacity rule is the chain rule of mutual information applied to the bijective label-to-constellation map. Every other information inequality in the proof (the noisy-channel theorem, its converse) is applied separately at each level — the decomposition itself is the chain rule.
Capacity rule
The rate-allocation rule for multilevel coding: set the rate of the level- binary code equal to the conditional mutual information . Sums to the full CM capacity .
Related: Multilevel Code (MLC) Encoder, MSD Achieves the CM Capacity, Chain Rule
From the Rule to an LDPC Rate Table
The DVB-S2 standard's MODCOD table is, from a coded-modulation point of view, a pragmatic instantiation of the capacity rule with one code rate per constellation (that is, BICM-style, not MLC). The table lists the LDPC code rate and the constellation size as a pair, chosen to cover the expected operating range in roughly dB steps. An MLC-native DVB-S2 would instead store three (for 8-PSK) or four (for 16-APSK) rates per modulation — one per level.
The fact that DVB-S2 shipped with the simpler BICM-style single-rate design, not an MLC-style multi-rate design, is the clearest practical evidence that the BICM–MLC gap was judged too small to justify the complexity. We will quantify this gap in s04.
- •
DVB-S2 LDPC code rates: 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, 9/10
- •
A full MLC implementation would need a separate rate per level, multiplying the code-table size by