CCDM and Distribution Matching

From Theory to Practice: Distribution Matching

The point is that Maxwell-Boltzmann shaping (§1) tells us what distribution to target, and Probabilistic Amplitude Shaping (§2) tells us how to compose shaping with FEC — but neither tells us how to TRANSFORM a uniform bit stream into a stream of MB-shaped amplitudes. That is the job of a distribution matcher. The CCDM (constant-composition distribution matcher) is the dominant practical choice, achieving the MB rate asymptotically.

Definition:

Distribution Matcher (DM)

A distribution matcher is an invertible mapping from a uniform binary stream {0,1}k\{0, 1\}^k to an amplitude sequence aAn\mathbf{a} \in \mathcal{A}^n such that the empirical distribution of a\mathbf{a} approximates a target distribution PAP_A. The DM is INVERTIBLE — the inverse is computed at the receiver after FEC decoding to recover the original bits.

Definition:

Constant-Composition Distribution Matcher (CCDM)

A constant-composition DM (CCDM) selects output sequences of length nn with the same empirical histogram — the "composition". Given a target distribution PAP_A on an A|\mathcal{A}|-ary alphabet, the CCDM chooses counts (na1,,naA)(n_{a_1}, \ldots, n_{a_{|\mathcal{A}|}}) with nai=nPA(ai)n_{a_i} = \lfloor n \cdot P_A(a_i) \rfloor and assigns arithmetic-coding rank to each of the (nna1,)\binom{n}{n_{a_1}, \ldots} legal sequences.

Theorem: CCDM Rate Approaches MB Rate

For a target distribution PAP_A with entropy H(PA)H(P_A), the CCDM of length nn achieves rate RCCDM(n)=log2(nna1,,naA)n=H(PA)O ⁣(lognn).R_{\rm CCDM}(n) = \frac{\lfloor \log_2 \binom{n}{n_{a_1}, \ldots, n_{a_{|\mathcal{A}|}}} \rfloor}{n} = H(P_A) - O\!\left(\frac{\log n}{n}\right). The rate loss is O(logn/n)O(\log n / n), vanishing as nn \to \infty.

CCDM Output vs Target MB Distribution

Compare the target Maxwell-Boltzmann distribution (green bars) with the actual CCDM output distribution at a chosen block length (blue bars). As nn grows, the two match more closely — the KL divergence decays as logn/n\log n / n.

Parameters
200

CCDM Arithmetic Encoding

Complexity: O(nA)O(n |\mathcal{A}|) time, O(n)O(n) memory.
Input: Target counts (na1,,naA)(n_{a_1}, \ldots, n_{a_{|\mathcal{A}|}})
summing to nn; uniform kk-bit message m\mathbf{m}.
Output: nn-symbol amplitude sequence a\mathbf{a}.
1. Compute index I(m){0,1,,Mmax1}I(\mathbf{m}) \in \{0, 1, \ldots, M_{\max} - 1\}
where Mmax=(nna1,,naA)M_{\max} = \binom{n}{n_{a_1}, \ldots, n_{a_{|\mathcal{A}|}}}.
2. for j=1,,nj = 1, \ldots, n do
3. \quad For each candidate symbol aia_i, compute the number
NiN_i of sequences starting with aia_i in the remaining position.
4. \quad Find the smallest ii^* such that iiNi>I\sum_{i \le i^*} N_i > I.
5. \quad Emit aj=aia_j = a_{i^*}; update IIi<iNiI \gets I - \sum_{i < i^*} N_i.
6. \quad Decrement nain_{a_{i^*}}.
7. end for

The inverse at the receiver reconstructs II from a\mathbf{a} by accumulating i<ajNi\sum_{i < a_j} N_i at each step. Then m=I\mathbf{m} = I in binary.

Example: CCDM for 16-QAM at n=200n = 200

For 16-QAM per-axis amplitudes {3,1,+1,+3}\{-3, -1, +1, +3\} (i.e., A=4| \mathcal{A}| = 4), and target PA=(0.35,0.15,0.15,0.35)P_A = (0.35, 0.15, 0.15, 0.35) with H(PA)=1.88H(P_A) = 1.88 bits/symbol, find the CCDM rate for n=200n = 200. Compare with the entropy.

Common Mistake: CCDM Short-Block Rate Loss Is Real

Mistake:

"CCDM achieves the entropy H(PA)H(P_A) exactly for any block length."

Correction:

The rate loss is O(logn/n)O(\log n / n). At n=100n = 100 the loss can be 5-10% of the target entropy — a meaningful slice of the shaping gain. For URLLC or short-packet applications, this loss limits the achievable shaping gain. Alternatives: shell mapping (Imai-Hirasawa) and hierarchical DM (Amjad-Böcherer 2013) that reduce the short- block loss at increased complexity.

Historical Note: A Line of Distribution-Matching Algorithms

1977–2016

Distribution matching has a 40-year history:

  • Imai-Hirasawa 1977: shell mapping (sphere-packing lower bound).
  • Laroia-Farvardin-Tretter 1994: trellis shaping (convolutional approach).
  • Böcherer 2013 (thesis): CCDM as the practical modern algorithm.
  • Amjad-Böcherer 2013: hierarchical DM for short blocks.
  • Schulte-Böcherer 2016: formal CCDM rate-loss analysis. Today, CCDM is the dominant choice in production optical and 5G research. Hierarchical DMs are used for very short blocks.

Key Takeaway

CCDM is the practical algorithm that realises MB shaping. It converts a uniform bit stream into an amplitude sequence of target composition with rate H(PA)O(logn/n)H(P_A) - O(\log n/n). The short-block rate loss limits the achievable shaping gain for URLLC; hierarchical DM offers a more complex but tighter alternative.