The Capacity Rule for Rate Allocation

Why a Rate Rule Is Even Possible

Before stating the capacity rule, consider what we are asking. MLC has LL separate binary codes, each with its own rate RiR_i. In principle the designer has LL degrees of freedom. The question is: what rates should we pick?

The answer is striking. The point is that the chain rule of mutual information gives a complete decomposition of the capacity of the non-binary channel into a sum of LL binary capacities. This is not an approximation — it is exact. The cost is that the binary channels are not independent: level ii must condition on levels 0,1,,i10, 1, \ldots, i-1. This conditioning is the reason multistage decoding exists, and it is what separates MLC/MSD from BICM.

With the chain-rule decomposition in hand, the rate rule writes itself: pick RiR_i equal to the capacity of the ii-th conditioned binary sub-channel. The sum telescopes to the full CM capacity. No single choice gives more, and any other choice gives less — this is the content of the main theorem of this section.

,

Definition:

Binary Sub-Channel at Level ii

Fix a partition chain, a partition-based labelling μ\mu, and the AWGN channel Y=X+WY = X + W with WN(0,N0/2)W \sim \mathcal{N}(0, N_0/2). Let X=μ(B0,B1,,BL1)X = \mu(B_0, B_1, \ldots, B_{L-1}) where each BiB_i is uniform on {0,1}\{0, 1\} and the bits are independent. The binary sub-channel at level ii is the channel with input BiB_i and output (Y,B0,B1,,Bi1)\bigl(Y, B_0, B_1, \ldots, B_{i-1}\bigr) — that is, the channel sees the received signal YY and a genie-provided history of the previously decoded bits. Its capacity is

Ci    I(Y;BiB0,B1,,Bi1)C_i \;\triangleq\; I(Y; B_i \mid B_0, B_1, \ldots, B_{i-1})

with the understanding that C0=I(Y;B0)C_0 = I(Y; B_0).

The "genie-provided history" terminology is standard but misleading at first reading. What is really happening is that MSD decodes level ii using the decoded bits from levels 0,,i10, \ldots, i-1. At rates below the capacity rule the probability of a history error is driven to zero, so the decoded bits equal the transmitted bits with high probability — and the genie assumption is justified in the information-theoretic limit.

Theorem: The Capacity Rule for MLC

Let X\mathcal{X} be a constellation with X=M=2L|\mathcal{X}| = M = 2^L, let μ\mu be a partition-based labelling, and let X=μ(B0,,BL1)X = \mu(B_0, \ldots, B_{L-1}) with i.i.d.\ uniform label bits. Then

CCM    I(Y;X)  =  i=0L1I(Y;BiB0,,Bi1)  =  i=0L1Ci.C_{\rm CM} \;\triangleq\; I(Y; X) \;=\; \sum_{i=0}^{L-1} I(Y; B_i \mid B_0, \ldots, B_{i-1}) \;=\; \sum_{i=0}^{L-1} C_i.

Consequently, an MLC scheme with binary codes of rates Ri=CiR_i = C_i at each level achieves the full constellation capacity CCMC_{\rm CM}. No other choice of rates (R0,,RL1)(R_0, \ldots, R_{L-1}) satisfying iRiCCM\sum_i R_i \le C_{\rm CM} can be decoded reliably by multistage decoding unless RiCiR_i \le C_i at every level.

The equality is the chain rule of mutual information applied to the bijective correspondence X(B0,,BL1)X \leftrightarrow (B_0, \ldots, B_{L-1}). The "no-other-choice" part is the weak converse for binary channels applied level-by-level: with MSD as the receiver, each stage sees a binary channel of capacity CiC_i, and a rate above CiC_i at any level cannot be decoded with vanishing error probability.

, ,

Key Takeaway

The capacity rule is the chain rule in disguise. For any partition-based labelling of any constellation, the full CM capacity decomposes exactly into a sum of LL conditional binary capacities, and MLC with Ri=CiR_i = C_i achieves the sum. The rule is optimal, not heuristic — this is what makes MLC fundamentally different from ad hoc rate allocations.

Three Binary Sub-Channel Capacities vs. SNR for 8-PSK

For 8-PSK with the Ungerboeck partition chain, the plot shows the three binary sub-channel capacities C0,C1,C2C_0, C_1, C_2 as functions of SNR, together with their sum (the MLC/MSD capacity) and the Shannon limit log2(1+SNR)\log_2(1 + \text{SNR}). Observe that C2C_2 saturates to 11 bit almost immediately: level 2 is effectively BPSK with squared distance d22=4d_2^2 = 4. C1C_1 saturates next. C0C_0 is the bottleneck — the capacity of level 0 is what throttles the total at low-to-medium SNR.

Parameters

Example: 8-PSK Rate Allocation at SNR=8\text{SNR} = 8 dB

Using the Ungerboeck partition chain of 8-PSK and operating at SNR=8\text{SNR} = 8 dB (equivalently Es/N0=100.86.31E_s/N_0 = 10^{0.8} \approx 6.31), compute the binary sub-channel capacities C0,C1,C2C_0, C_1, C_2, the total MLC/MSD capacity, and the implied rate allocation (R0,R1,R2)(R_0, R_1, R_2) when the designer picks Ri=CiR_i = C_i.

Rate Allocation in Practice

In a real system one does not get to pick RiR_i continuously. The available binary codes (LDPC, polar, convolutional) come in discrete rate classes: maybe rates 1/4,1/3,1/2,2/3,3/4,5/6,7/8,11/4, 1/3, 1/2, 2/3, 3/4, 5/6, 7/8, 1. The designer picks the available rate closest to CiC_i from below at each level (rounding down ensures reliable decoding). This discretisation costs a small amount of capacity — usually a fraction of a dB — and is the practical reason that the theoretical MLC-capacity curve in the next section is not fully reached by a specific implementation.

Common Mistake: The unconditional I(Y;Bi)I(Y; B_i) is NOT the capacity rule

Mistake:

Allocating rate Ri=I(Y;Bi)R_i = I(Y; B_i) — the unconditional mutual information between YY and the ii-th label bit — and expecting MSD to achieve the sum.

Correction:

The capacity rule uses the conditional mutual information Ci=I(Y;BiB0,,Bi1)C_i = I(Y; B_i \mid B_0, \ldots, B_{i-1}) at every level except i=0i = 0. The sum of unconditional I(Y;Bi)I(Y; B_i) is precisely the BICM capacity CBICMC_{\rm BICM}, which is generally less than CCMC_{\rm CM} — so the unconditional allocation shortchanges the scheme. Section s04 makes this gap explicit.

Quick Check

Which identity lies at the heart of the MLC capacity rule?

The data processing inequality I(X;Y)I(X;g(Y))I(X; Y) \geq I(X; g(Y))

Fano's inequality

The chain rule of mutual information: I(Y;B0,,BL1)=iI(Y;BiB0,,Bi1)I(Y; B_0, \ldots, B_{L-1}) = \sum_i I(Y; B_i \mid B_0, \ldots, B_{i-1})

Jensen's inequality for concave functions

Capacity rule

The rate-allocation rule for multilevel coding: set the rate of the level-ii binary code equal to the conditional mutual information Ci=I(Y;BiB0,,Bi1)C_i = I(Y; B_i \mid B_0, \ldots, B_{i-1}). Sums to the full CM capacity I(Y;X)I(Y; X).

Related: Multilevel Code (MLC) Encoder, MSD Achieves the CM Capacity, Chain Rule

⚠️Engineering Note

From the Rule to an LDPC Rate Table

The DVB-S2 standard's MODCOD table is, from a coded-modulation point of view, a pragmatic instantiation of the capacity rule with one code rate per constellation (that is, BICM-style, not MLC). The table lists the LDPC code rate and the constellation size as a pair, chosen to cover the expected operating Es/N0E_s/N_0 range in roughly 11 dB steps. An MLC-native DVB-S2 would instead store three (for 8-PSK) or four (for 16-APSK) rates per modulation — one per level.

The fact that DVB-S2 shipped with the simpler BICM-style single-rate design, not an MLC-style multi-rate design, is the clearest practical evidence that the BICM–MLC gap was judged too small to justify the complexity. We will quantify this gap in s04.

Practical Constraints
  • DVB-S2 LDPC code rates: 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, 9/10

  • A full MLC implementation would need a separate rate per level, multiplying the code-table size by L=log2ML = \log_2 M

📋 Ref: ETSI EN 302 307, §5.4
,