Probabilistic Shaping: Closing the Gap

The 1.53 dB Shaping Gap

We have now seen every modern standard customise BICM — 5G NR, Wi-Fi, DVB-S2 — and we have seen the AMC envelope approach Shannon within a dB or two. The last remaining gap has a specific cause and a specific remedy: shaping.

With a uniformly distributed QAM input (which every standard of the previous sections uses), the maximum mutual information is bounded by log2M\log_2 M, the modulation order. For high-SNR operation this leaves a πe/61.53\pi e / 6 \approx 1.53 dB gap to Shannon capacity — the "cubic shaping gap" between a uniform distribution over a square constellation and the truly-optimal Gaussian input.

Closing that 1.53 dB gap is worth the fight. On a 400 Gbps optical coherent link, 1.5 dB is a 50% reach extension. On a satellite link with a fixed EIRP budget, 1.5 dB is an extra 40% throughput without more power or bandwidth. The solution is probabilistic shaping (PS): instead of driving each QAM point with equal probability, weight outer (high-energy) points lower and inner (low-energy) points higher, producing an approximately Gaussian envelope. The specific distribution that maximises entropy at a fixed average-power constraint is the Maxwell-Boltzmann distribution, and the practical scheme that implements it within the BICM framework is the Probabilistic Amplitude Shaping (PAS) architecture of Böcherer, Steiner, and Schulte (2015).

This section explains how PAS works, proves the 1.53 dB bound, and walks through why PAS is now standard in 400G coherent optical transmission (OIF 400ZR) and is being proposed for 6G. Shaping is the modern capstone of BICM design.

,

Definition:

Maxwell-Boltzmann (MB) Distribution

For a finite constellation XC\mathcal{X} \subset \mathbb{C} (e.g., a M×M\sqrt{M} \times \sqrt{M} square QAM), the Maxwell-Boltzmann (MB) distribution with parameter λ>0\lambda > 0 is pλ(x)=exp(λx2)xXexp(λx2),xX.p_\lambda(x) = \frac{\exp(-\lambda |x|^2)}{\sum_{x' \in \mathcal{X}} \exp(-\lambda |x'|^2)}, \qquad x \in \mathcal{X}. Outer (high-energy) constellation points receive exponentially lower probability than inner points. The parameter λ\lambda controls the average energy: Epλ[X2]=xx2exp(λx2)xexp(λx2).\mathbb{E}_{p_\lambda}[|X|^2] = \frac{\sum_{x} |x|^2 \exp(-\lambda|x|^2)}{\sum_{x} \exp(-\lambda|x|^2)}. Increasing λ\lambda shrinks the average energy (fewer outer points); decreasing λ\lambda toward 0 recovers the uniform distribution.

The MB distribution is the finite-alphabet analogue of the Gaussian distribution on R2\mathbb{R}^2: both maximise differential (resp. discrete) entropy subject to a fixed second moment. On a QAM lattice, MB "samples" the Gaussian density at lattice points. As MM \to \infty and with appropriate normalisation, the MB distribution converges to a continuous Gaussian.

,

Theorem: Maxwell-Boltzmann Maximises Entropy at Fixed Second Moment

Let X\mathcal{X} be a finite set and let E>0E > 0. Among all probability mass functions pp on X\mathcal{X} satisfying xp(x)x2=E\sum_x p(x) |x|^2 = E, the entropy H(p)=xp(x)logp(x)H(p) = -\sum_x p(x) \log p(x) is maximised uniquely by the Maxwell-Boltzmann distribution pλp_\lambda with λ=λ(E)\lambda = \lambda(E) chosen such that Epλ[X2]=E\mathbb{E}_{p_\lambda} [|X|^2] = E. The maximum entropy is H(E)=λE+logxXexp(λx2).H^\star(E) = \lambda E + \log \sum_{x \in \mathcal{X}} \exp(-\lambda |x|^2).

The proof is a textbook Lagrangian: maximise entropy subject to an energy constraint by introducing a multiplier λ\lambda for the constraint and solving. The exponential form p(x)exp(λx2)p(x) \propto \exp(-\lambda |x|^2) falls out automatically. This is the finite-alphabet version of Shannon's theorem that Gaussian distribution maximises differential entropy at fixed variance — with the MB on a lattice playing the role of the Gaussian on R2\mathbb{R}^2.

,

Theorem: Asymptotic Shaping Gain Approaches πe/6\pi e / 6

Let XM\mathcal{X}_M be a M×M\sqrt{M} \times \sqrt{M} square QAM constellation on a uniform grid with unit spacing. Define the shaping gain as Gs(M)C(XM,pMB,SNR)C(XM,punif,SNR).G_s(M) \triangleq \frac{C(\mathcal{X}_M, p_{\rm MB}, \text{SNR})}{C(\mathcal{X}_M, p_{\rm unif}, \text{SNR})}. Then in the high-SNR limit with λ\lambda chosen so that both constellations achieve the same information rate log2M\log_2 M, limMlimSNRGs(M)=πe61.5329 dB.\lim_{M \to \infty} \lim_{\text{SNR} \to \infty} G_s(M) = \frac{\pi e}{6} \approx 1.5329 \text{ dB}. Equivalently, MB-shaped QAM requires 1.53 dB less SNR than uniform QAM to achieve the same rate in the high-SNR limit.

The bound comes from two facts: (i) at high SNR, the spherical Gaussian is the capacity-achieving distribution; (ii) a uniform distribution on a square is suboptimal because a square does not match a circular Gaussian envelope. The factor πe/6\pi e / 6 is explicitly the ratio of the second moment of a sphere to that of a cube at the same volume (more precisely, the square-to-circle second-moment ratio in 2D is π/3e=πe/6\pi/3 \cdot e = \pi e / 6). This 1.53 dB is the universal "cubic shaping gap" that no amount of coding can close without shaping.

,

Probabilistic Amplitude Shaping (PAS) Architecture

Probabilistic Amplitude Shaping (PAS) Architecture
The Böcherer-Steiner-Schulte Probabilistic Amplitude Shaping (PAS) architecture. Uniform information bits enter a distribution matcher (DM) that outputs a stream of amplitude symbols with MB-shaped probability distribution. The DM output is concatenated with the original uniform bits, sent through a systematic BICM encoder (LDPC), and the parity bits are appended. The amplitude bits and sign/parity bits together drive the QAM mapper: amplitudes carry the shaped part, signs + parity bits carry the uniform part. The output is a QAM sequence whose per-symbol distribution is approximately MB. The receiver inverts this pipeline: LDPC decode, distribution dematcher, deliver information bits.

Definition:

Constant-Composition Distribution Matcher (CCDM)

A constant-composition distribution matcher (CCDM) is a bijective (losslessly invertible) block code that maps kk uniformly distributed input bits to a sequence of nn amplitude symbols from A={a1,,aK}\mathcal{A} = \{a_1, \ldots, a_K\} with a fixed composition — the number nin_i of output symbols equal to aia_i is fixed across all codewords. The input-output rate is knH(p) bits/symbol,\frac{k}{n} \approx H(p) \text{ bits/symbol}, where p=(n1/n,,nK/n)p = (n_1/n, \ldots, n_K/n) is the target MB distribution quantised to nn-tuple counts. The standard CCDM implementation is arithmetic coding over the multinomial distribution restricted to the fixed composition; the encoder and decoder are streaming with O(logn)O(\log n) state.

The key operational property: every codeword of the CCDM has exactly the target type (n1,,nK)(n_1, \ldots, n_K), so the average energy is exactly the MB target energy. The rate loss is log2(nn1,,nK)/n\log_2 \binom{n}{n_1, \ldots, n_K} / n below H(p)H(p), and approaches zero at nn \to \infty.

CCDM is not the only distribution matcher; alternatives include hierarchical DMs (Steiner-Böcherer-Liva 2018) that trade rate loss for shorter processing blocks, and shell mappers (Laroia 1994) that predate PAS but are conceptually similar. The CCDM is the choice in the 2015 PAS paper because of its simplicity.

,

Probabilistic Shaping Gain: Shannon vs Uniform vs MB-Shaped QAM

Achievable rate curves for uniform MM-QAM, MB-shaped MM-QAM (optimal λ\lambda per SNR), and the Shannon bound log2(1+SNR)\log_2(1 + \text{SNR}). Observe that MB-shaped QAM recovers 1.3\sim 1.3-1.51.5 dB over uniform QAM at high SNR, closing nearly all of the asymptotic πe/61.53\pi e / 6 \approx 1.53 dB shaping gap. At low SNR, the shaping gain is smaller because the uniform distribution is already near-optimal — the MB distribution converges to uniform as λ0\lambda \to 0. The sweet spot for shaping in practice is at rates close to log2M1\log_2 M - 1 bit per symbol, where the shaping gain is largest.

Parameters

Probabilistic Shaping via Maxwell-Boltzmann Distribution

Animated visualisation of uniform QAM transforming into MB-shaped QAM as the shaping parameter λ\lambda is increased. Each constellation point is drawn as a disk whose area is proportional to the MB probability pλ(x)exp(λx2)p_\lambda(x) \propto \exp(-\lambda |x|^2). Outer points shrink; inner points grow. The instantaneous average energy is shown on the side panel, together with the achievable rate — which first increases (as λ\lambda optimally shapes the distribution) and then decreases (as extreme shaping collapses onto the inner points and wastes the constellation).
Maxwell-Boltzmann shaping of 64-QAM. As λ\lambda varies, the outer points lose probability mass and the distribution approaches a 2D Gaussian on the lattice. Achievable rate peaks at an intermediate λ(SNR)\lambda^\star(\text{SNR}) — the rate-adaptive setting that underpins PAS.

Example: Shaping Gain at 18 dB SNR for 256-QAM

At SNR=18\text{SNR} = 18 dB, compute the achievable rate for (a) uniform 256-QAM, (b) optimally MB-shaped 256-QAM, (c) Shannon bound. Quantify the shaping gain.

⚠️Engineering Note

OIF 400ZR: PAS in Coherent Optical Transmission

OIF 400ZR (Implementation Agreement, 2020) is the first commercial standard to specify probabilistic amplitude shaping as mandatory. The specification targets 400 Gbps point-to-point coherent optical transmission at 120\sim 120 km reach over conventional SMF fibre. Key design choices:

  • Modulation: dual-polarisation 16-QAM at baseline (DP-16QAM). Rate per polarisation: 3.17 bits/symbol (vs. 4 for uniform DP-16QAM). The shaped rate of 3.17 was chosen to match the chromatic dispersion and fibre nonlinearity budget for the target reach.
  • Shaping: constant-composition distribution matcher (CCDM) at block length n=272n = 272 with target MB distribution.
  • Inner code: staircase code (a Forward Error Correction scheme with 15%\sim 15\% overhead) combined with a degree-1 Hamming outer code. Post-FEC BER target: 101510^{-15}.
  • Concatenation: uniform bits are the sign + parity; shaped amplitude bits are the DM output. Exactly the PAS template.

The commercial impact: 400ZR modules based on this spec are now the dominant 400G DCI optic, shipping in hundreds of thousands of units annually. The shaping gain alone accounts for \sim 30% of the reach advantage over pre-400ZR 400G modulation formats.

PAS is now being proposed for 6G (3GPP Release 20+ study items) as an optional mode for eMBB at high SNR. The main holdup is the DM block length: CCDMs need n103n \sim 10^3 to approach the shaping gain within 0.2 dB, but cellular block sizes are typically n500n \lesssim 500. Hierarchical DMs (Steiner-Böcherer-Liva 2018) and shell DMs are being evaluated as alternatives.

Practical Constraints
  • Per-symbol distribution MB with per-block CCDM

  • DM block length n=272n = 272 in 400ZR

  • Post-FEC BER 1015\le 10^{-15}

  • Shaping adds 1.3-1.5 dB at high SNR

📋 Ref: OIF Implementation Agreement 400ZR, 2020
,

Historical Note: PAS: From 2015 Paper to 400ZR in 5 Years

2015-2020

Probabilistic shaping has a longer history — Forney and Ungerboeck (1998) reviewed shell-mapping and trellis shaping methods; Kschischang and Pasupathy (1993) studied the information-theoretic shaping gain. But the Probabilistic Amplitude Shaping (PAS) architecture, which reconciled shaping with the BICM framework in a way that let LDPC decoders stay off-the-shelf, is a single-paper contribution:

G. Böcherer, F. Steiner, and P. Schulte, "Bandwidth Efficient and Rate-Matched Low-Density Parity-Check Coded Modulation," IEEE Trans. Commun., Dec. 2015.

The trick is the "sign-parity" decomposition: the systematic LDPC encoder's information bits carry the MB-shaped amplitude labels; the parity bits and signs are uniform. Because the signs are independent of amplitudes under a reasonable labelling, the parity bits do not disturb the MB distribution. The receiver does ordinary LDPC decoding and then runs a distribution dematcher on the decoded amplitude bits.

The PAS architecture met industry adoption faster than any other BICM extension in history. Within three years (2018) the TU Munich / DSI Lab spinoffs had produced commercial modems. By 2020, OIF 400ZR mandated PAS in every compliant 400G coherent module. ATSC 3.0 (2017 terrestrial broadcast TV) had already shipped a related geometric shaping scheme. As of 2025 PAS is on the 6G study work-item list for eMBB extensions.

, ,

Common Mistake: Shaping Loses at Low SNR

Mistake:

A common assumption is that MB-shaped QAM is always at least as good as uniform QAM. At low SNR this is false.

Correction:

At low SNR (operating near the BI-AWGN Shannon limit of the inner ring), the uniform distribution is already near-optimal because the outer constellation points are effectively unusable — they cause too many errors. Heavy MB shaping (λ\lambda large) collapses the distribution onto the inner points, which are exactly the points that survive the noise. But extreme collapse wastes the constellation entirely (effective rate <2< 2 bits/symbol).

The operational rule: MB shaping provides gain only when the target rate is within 1\sim 1 bit/symbol of log2M\log_2 M. Below that, stick with uniform QAM and use a smaller constellation. PAS implementations disable shaping for low-MCS modes for exactly this reason.

Why This Matters: Forward Reference to Chapter 19: Probabilistic and Geometric Shaping

This section introduces probabilistic shaping as a concrete BICM extension currently used in 400G optical and proposed for 6G. Chapter 19 of this book takes the topic much further:

  • Rate-adaptive PAS: varying λ\lambda per block to match the channel without changing the code or modulation.
  • Geometric shaping: rearranging the constellation points geometrically (e.g., non-uniform QAM with point-specific spacing) rather than weighting them probabilistically. Used in ATSC 3.0 terrestrial broadcast.
  • Hierarchical and shell DMs: alternatives to CCDM with shorter block lengths.
  • Joint shaping and coding: trellis shaping and Voronoi constellations that close the last 0.3 dB.

The 1.53 dB gap we introduced here is the starting point for all of Chapter 19.

Quick Check

The asymptotic shaping gain of an MB-shaped square QAM over uniform QAM is πe/61.53\pi e / 6 \approx 1.53 dB. What is the operational interpretation of this bound?

It is the ratio of the second moment of a 2D circle to that of a 2D square at equal entropy

It is the rate loss of BICM relative to CM

It is the Gray labelling gap to CM capacity

It is the LDPC decoding threshold gap to Shannon

Maxwell-Boltzmann Distribution

A probability mass function on a finite set X\mathcal{X} of the form pλ(x)exp(λx2)p_\lambda(x) \propto \exp(-\lambda |x|^2) with λ>0\lambda > 0. Maximises entropy subject to a fixed second moment. The finite- alphabet analogue of the Gaussian distribution.

Related: Probabilistic Shaping, Probabilistic Amplitude Shaping (PAS), Constant-Composition Distribution Matcher (CCDM)

Probabilistic Amplitude Shaping (PAS)

An architecture introduced by Böcherer, Steiner, and Schulte (2015) that implements probabilistic shaping within the BICM framework. A distribution matcher converts uniform bits to MB-shaped amplitude bits; a systematic LDPC code adds uniform parity; signs and parity drive the QAM mapper. Now standard in 400G optical transmission and proposed for 6G.

Related: Mb Distribution, Constant-Composition Distribution Matcher (CCDM), BICM Is a Paradigm, Not a Specification

Distribution Matcher (DM)

A bijective block code that maps uniform input bits to an output sequence with a target non-uniform distribution. The CCDM (constant- composition DM) is the canonical example, using arithmetic coding over a fixed-type multinomial. Rate-optimal DMs approach the target entropy as the block length grows.

Related: Constant-Composition Distribution Matcher (CCDM), Probabilistic Amplitude Shaping (PAS), Probabilistic Shaping

Key Takeaway

Probabilistic shaping closes the 1.53 dB BICM gap to Shannon. The Maxwell-Boltzmann distribution maximises entropy at fixed energy — a direct Lagrangian. The PAS architecture (Böcherer-Steiner-Schulte 2015) realises MB-shaped QAM within the BICM framework by feeding MB-shaped amplitude bits through a systematic LDPC encoder. PAS is now mandatory in 400G optical (OIF 400ZR) and is being proposed for 6G. Chapter 19 takes this much further.