Ferkans — Interactive Telecom Tutor

The 1.53 dB Shaping Gap

We have now seen every modern standard customise BICM — 5G NR, Wi-Fi, DVB-S2 — and we have seen the AMC envelope approach Shannon within a dB or two. The last remaining gap has a specific cause and a specific remedy: shaping.

With a uniformly distributed QAM input (which every standard of the previous sections uses), the maximum mutual information is bounded by $\log_2 M$ , the modulation order. For high-SNR operation this leaves a $\pi e / 6 \approx 1.53$ dB gap to Shannon capacity — the "cubic shaping gap" between a uniform distribution over a square constellation and the truly-optimal Gaussian input.

Closing that 1.53 dB gap is worth the fight. On a 400 Gbps optical coherent link, 1.5 dB is a 50% reach extension. On a satellite link with a fixed EIRP budget, 1.5 dB is an extra 40% throughput without more power or bandwidth. The solution is probabilistic shaping (PS): instead of driving each QAM point with equal probability, weight outer (high-energy) points lower and inner (low-energy) points higher, producing an approximately Gaussian envelope. The specific distribution that maximises entropy at a fixed average-power constraint is the Maxwell-Boltzmann distribution, and the practical scheme that implements it within the BICM framework is the Probabilistic Amplitude Shaping (PAS) architecture of Böcherer, Steiner, and Schulte (2015).

This section explains how PAS works, proves the 1.53 dB bound, and walks through why PAS is now standard in 400G coherent optical transmission (OIF 400ZR) and is being proposed for 6G. Shaping is the modern capstone of BICM design.

,

Definition:
Maxwell-Boltzmann (MB) Distribution

For a finite constellation $\mathcal{X} \subset \mathbb{C}$ (e.g., a $\sqrt{M} \times \sqrt{M}$ square QAM), the Maxwell-Boltzmann (MB) distribution with parameter $\lambda > 0$ is $p_\lambda(x) = \frac{\exp(-\lambda |x|^2)}{\sum_{x' \in \mathcal{X}} \exp(-\lambda |x'|^2)}, \qquad x \in \mathcal{X}.$ Outer (high-energy) constellation points receive exponentially lower probability than inner points. The parameter $\lambda$ controls the average energy: $\mathbb{E}_{p_\lambda}[|X|^2] = \frac{\sum_{x} |x|^2 \exp(-\lambda|x|^2)}{\sum_{x} \exp(-\lambda|x|^2)}.$ Increasing $\lambda$ shrinks the average energy (fewer outer points); decreasing $\lambda$ toward 0 recovers the uniform distribution.

The MB distribution is the finite-alphabet analogue of the Gaussian distribution on $\mathbb{R}^2$ : both maximise differential (resp. discrete) entropy subject to a fixed second moment. On a QAM lattice, MB "samples" the Gaussian density at lattice points. As $M \to \infty$ and with appropriate normalisation, the MB distribution converges to a continuous Gaussian.

,

Theorem: Maxwell-Boltzmann Maximises Entropy at Fixed Second Moment

Let $\mathcal{X}$ be a finite set and let $E > 0$ . Among all probability mass functions $p$ on $\mathcal{X}$ satisfying $\sum_x p(x) |x|^2 = E$ , the entropy $H(p) = -\sum_x p(x) \log p(x)$ is maximised uniquely by the Maxwell-Boltzmann distribution $p_\lambda$ with $\lambda = \lambda(E)$ chosen such that $\mathbb{E}_{p_\lambda} [|X|^2] = E$ . The maximum entropy is $H^\star(E) = \lambda E + \log \sum_{x \in \mathcal{X}} \exp(-\lambda |x|^2).$

The proof is a textbook Lagrangian: maximise entropy subject to an energy constraint by introducing a multiplier $\lambda$ for the constraint and solving. The exponential form $p(x) \propto \exp(-\lambda |x|^2)$ falls out automatically. This is the finite-alphabet version of Shannon's theorem that Gaussian distribution maximises differential entropy at fixed variance — with the MB on a lattice playing the role of the Gaussian on $\mathbb{R}^2$ .

Show Hint

Set up the Lagrangian $L(p) = H(p) - \lambda (\mathbb{E}|X|^2 - E) - \mu (\sum p - 1)$ .

Take $\partial L / \partial p(x) = 0$ for each $x \in \mathcal{X}$ .

Solve for $p(x)$ in terms of $\lambda, \mu$ — observe the exponential form.

Use the normalisation constraint to eliminate $\mu$ .

The uniqueness follows from strict concavity of entropy.

Proof

Step 1: Lagrangian

Maximise $H(p) = -\sum_x p(x) \log p(x)$ subject to $\sum_x p(x) |x|^2 = E$ and $\sum_x p(x) = 1$ . Form $L(p; \lambda, \mu) = -\sum_x p(x) \log p(x) - \lambda \left(\sum_x p(x) |x|^2 - E\right) - \mu \left(\sum_x p(x) - 1\right).$

Step 2: Stationarity

$\partial L / \partial p(x) = -\log p(x) - 1 - \lambda |x|^2 - \mu = 0$ , so $p(x) = e^{-1 - \mu} e^{-\lambda |x|^2} = C \exp(-\lambda |x|^2),$ where $C = e^{-1 - \mu}$ is a normalisation constant.

Step 3: Normalisation

$\sum_x p(x) = C \sum_x \exp(-\lambda |x|^2) = 1$ , so $C = 1 / \sum_{x'} \exp(-\lambda |x'|^2)$ . This gives exactly the Maxwell-Boltzmann distribution $p_\lambda(x) = \exp(-\lambda|x|^2) / Z(\lambda)$ with $Z(\lambda) = \sum_x \exp(-\lambda |x|^2)$ .

Step 4: Lagrange multiplier from energy constraint

The multiplier $\lambda$ is the unique positive solution to $\mathbb{E}_{p_\lambda}[|X|^2] = -Z'(\lambda)/Z(\lambda) = E$ , which exists and is unique because $E \mapsto \lambda(E)$ is strictly decreasing.

Step 5: Entropy expression

Substituting $p_\lambda$ into $H(p)$ , $H(p_\lambda) = -\sum_x p_\lambda(x) [-\lambda |x|^2 - \log Z(\lambda)] = \lambda E + \log Z(\lambda). \;\;\blacksquare$

,

Theorem: Asymptotic Shaping Gain Approaches $\pi e / 6$

Let $\mathcal{X}_M$ be a $\sqrt{M} \times \sqrt{M}$ square QAM constellation on a uniform grid with unit spacing. Define the shaping gain as $G_s(M) \triangleq \frac{C(\mathcal{X}_M, p_{\rm MB}, \text{SNR})}{C(\mathcal{X}_M, p_{\rm unif}, \text{SNR})}.$ Then in the high-SNR limit with $\lambda$ chosen so that both constellations achieve the same information rate $\log_2 M$ , $\lim_{M \to \infty} \lim_{\text{SNR} \to \infty} G_s(M) = \frac{\pi e}{6} \approx 1.5329 \text{ dB}.$ Equivalently, MB-shaped QAM requires 1.53 dB less SNR than uniform QAM to achieve the same rate in the high-SNR limit.

The bound comes from two facts: (i) at high SNR, the spherical Gaussian is the capacity-achieving distribution; (ii) a uniform distribution on a square is suboptimal because a square does not match a circular Gaussian envelope. The factor $\pi e / 6$ is explicitly the ratio of the second moment of a sphere to that of a cube at the same volume (more precisely, the square-to-circle second-moment ratio in 2D is $\pi/3 \cdot e = \pi e / 6$ ). This 1.53 dB is the universal "cubic shaping gap" that no amount of coding can close without shaping.

Show Hint

Use the equivalence: at high SNR, capacity is $H(X) - H(X|Y)$ , and $H(X|Y) \to 0$ .

So shaping gain $\approx$ entropy ratio $H(p_{\rm MB}) / H(p_{\rm unif})$ at matched second moment.

In the limit $M \to \infty$ , these converge to differential entropies of Gaussian vs uniform-over-square.

Differential entropy of Gaussian in 2D: $\log_2(\pi e \sigma^2)$ . Differential entropy of uniform over square of second moment $\sigma^2$ : $\log_2(12 \sigma^2) - 1 = \log_2(6 \sigma^2)$ ... compute ratio.

Proof

Step 1: High-SNR capacity decomposition

For a channel $Y = X + W$ with noise $W$ , at high SNR, $I(X;Y) \approx H(X) - H(X|Y) \to H(X)$ because $H(X|Y) \to 0$ . So at fixed $H(X)$ , the channel reaches capacity; shaping reduces the energy needed to achieve a given $H(X)$ .

Step 2: Energy at fixed entropy

For a Gaussian distribution in 2D with variance $\sigma^2$ per dimension, $H = \log_2(\pi e \sigma^2)$ . For a uniform distribution on a square of side $a$ with second moment $\sigma^2$ : $\sigma^2 = a^2/12$ (per dimension), and $H = \log_2 a^2 = \log_2(12\sigma^2)$ .

Step 3: Second-moment ratio at matched entropy

To match $H$ : $\pi e \sigma_G^2 = 12 \sigma_U^2$ , so $\sigma_G^2 / \sigma_U^2 = 12 / (\pi e) = 6 / (\pi e / 2)$ . More precisely, the 2D shaping gain for a circular Gaussian vs uniform-over-square at the same entropy is $G_s = \frac{\sigma_U^2}{\sigma_G^2} = \frac{12}{\pi e} = \frac{12}{\pi e}.$ In dB: $10 \log_{10}(12/(\pi e)) \approx 1.53$ dB.

Step 4: Finite-$M$ approximation

For finite $M$ the MB distribution approximates the Gaussian on the lattice; the shaping gain converges to $\pi e / 6$ as $M \to \infty$ . At $M = 256$ the asymptotic is already reached within 0.1 dB. $\blacksquare$

,

Probabilistic Amplitude Shaping (PAS) Architecture — The Böcherer-Steiner-Schulte Probabilistic Amplitude Shaping (PAS) architecture. Uniform information bits enter a **distribution matcher** (DM) that outputs a stream of amplitude symbols with MB-shaped probability distribution. The DM output is concatenated with the original uniform bits, sent through a **systematic** BICM encoder (LDPC), and the parity bits are appended. The amplitude bits and sign/parity bits together drive the QAM mapper: amplitudes carry the shaped part, signs + parity bits carry the uniform part. The output is a QAM sequence whose per-symbol distribution is approximately MB. The receiver inverts this pipeline: LDPC decode, distribution dematcher, deliver information bits.

Definition:
Constant-Composition Distribution Matcher (CCDM)

A constant-composition distribution matcher (CCDM) is a bijective (losslessly invertible) block code that maps $k$ uniformly distributed input bits to a sequence of $n$ amplitude symbols from $\mathcal{A} = \{a_1, \ldots, a_K\}$ with a fixed composition — the number $n_i$ of output symbols equal to $a_i$ is fixed across all codewords. The input-output rate is $\frac{k}{n} \approx H(p) \text{ bits/symbol},$ where $p = (n_1/n, \ldots, n_K/n)$ is the target MB distribution quantised to $n$ -tuple counts. The standard CCDM implementation is arithmetic coding over the multinomial distribution restricted to the fixed composition; the encoder and decoder are streaming with $O(\log n)$ state.

The key operational property: every codeword of the CCDM has exactly the target type $(n_1, \ldots, n_K)$ , so the average energy is exactly the MB target energy. The rate loss is $\log_2 \binom{n}{n_1, \ldots, n_K} / n$ below $H(p)$ , and approaches zero at $n \to \infty$ .

CCDM is not the only distribution matcher; alternatives include hierarchical DMs (Steiner-Böcherer-Liva 2018) that trade rate loss for shorter processing blocks, and shell mappers (Laroia 1994) that predate PAS but are conceptually similar. The CCDM is the choice in the 2015 PAS paper because of its simplicity.

,

Probabilistic Shaping Gain: Shannon vs Uniform vs MB-Shaped QAM

Achievable rate curves for uniform $M$ -QAM, MB-shaped $M$ -QAM (optimal $\lambda$ per SNR), and the Shannon bound $\log_2(1 + \text{SNR})$ . Observe that MB-shaped QAM recovers $\sim 1.3$ - $1.5$ dB over uniform QAM at high SNR, closing nearly all of the asymptotic $\pi e / 6 \approx 1.53$ dB shaping gap. At low SNR, the shaping gain is smaller because the uniform distribution is already near-optimal — the MB distribution converges to uniform as $\lambda \to 0$ . The sweet spot for shaping in practice is at rates close to $\log_2 M - 1$ bit per symbol, where the shaping gain is largest.

Parameters

QAM size

M

Probabilistic Shaping via Maxwell-Boltzmann Distribution

Animated visualisation of uniform QAM transforming into MB-shaped QAM as the shaping parameter

\lambda

is increased. Each constellation point is drawn as a disk whose area is proportional to the MB probability

p_\lambda(x) \propto \exp(-\lambda |x|^2)

. Outer points shrink; inner points grow. The instantaneous average energy is shown on the side panel, together with the achievable rate — which first increases (as

\lambda

optimally shapes the distribution) and then decreases (as extreme shaping collapses onto the inner points and wastes the constellation).

Maxwell-Boltzmann shaping of 64-QAM. As

\lambda

varies, the outer points lose probability mass and the distribution approaches a 2D Gaussian on the lattice. Achievable rate peaks at an intermediate

\lambda^\star(\text{SNR})

— the rate-adaptive setting that underpins PAS.

Example: Shaping Gain at 18 dB SNR for 256-QAM

At $\text{SNR} = 18$ dB, compute the achievable rate for (a) uniform 256-QAM, (b) optimally MB-shaped 256-QAM, (c) Shannon bound. Quantify the shaping gain.

Solution

Shannon bound

$\text{SNR} = 18$ dB $= 63.1$ linear. Shannon capacity $\log_2(1 + 63.1) = 6.00$ bits/2D symbol.

Uniform 256-QAM BICM capacity

For uniform Gray-labelled 256-QAM at 18 dB, the BICM capacity is approximately $5.55$ bits/2D symbol (see Fig. 2 of the Böcherer-Steiner-Schulte paper). The 18 dB SNR is above the "knee" of the 256-QAM BICM curve, so the uniform distribution has almost saturated at 8 bits/symbol but with significant mutual-information loss to CM.

MB-shaped 256-QAM

With $\lambda^\star = 0.044$ (picked to target rate 5.9), the MB-shaped 256-QAM achieves approximately $5.88$ bits/2D symbol. This is within $0.12$ bits of the Shannon bound — versus uniform 256-QAM's $0.45$ bit gap.

Shaping gain in dB

$G_s \approx 10 \log_{10}(\text{SNR}_{\rm unif} / \text{SNR}_{\rm MB})$ at rate 5.55: uniform needs 18 dB, MB-shaped 256-QAM needs about 16.7 dB. Shaping gain $\approx 1.3$ dB at this rate — consistent with approaching the asymptotic $1.53$ dB in the high- SNR regime.

Operational use

In optical coherent transmission at 400 Gbps per wavelength, this 1.3 dB is what separates 80 km reach from 120 km reach. The OIF 400ZR module specification mandates PAS precisely to exploit it. $\blacksquare$

⚠️Engineering Note

OIF 400ZR: PAS in Coherent Optical Transmission

OIF 400ZR (Implementation Agreement, 2020) is the first commercial standard to specify probabilistic amplitude shaping as mandatory. The specification targets 400 Gbps point-to-point coherent optical transmission at $\sim 120$ km reach over conventional SMF fibre. Key design choices:

Modulation: dual-polarisation 16-QAM at baseline (DP-16QAM). Rate per polarisation: 3.17 bits/symbol (vs. 4 for uniform DP-16QAM). The shaped rate of 3.17 was chosen to match the chromatic dispersion and fibre nonlinearity budget for the target reach.
Shaping: constant-composition distribution matcher (CCDM) at block length $n = 272$ with target MB distribution.
Inner code: staircase code (a Forward Error Correction scheme with $\sim 15\%$ overhead) combined with a degree-1 Hamming outer code. Post-FEC BER target: $10^{-15}$ .
Concatenation: uniform bits are the sign + parity; shaped amplitude bits are the DM output. Exactly the PAS template.

The commercial impact: 400ZR modules based on this spec are now the dominant 400G DCI optic, shipping in hundreds of thousands of units annually. The shaping gain alone accounts for $\sim$ 30% of the reach advantage over pre-400ZR 400G modulation formats.

PAS is now being proposed for 6G (3GPP Release 20+ study items) as an optional mode for eMBB at high SNR. The main holdup is the DM block length: CCDMs need $n \sim 10^3$ to approach the shaping gain within 0.2 dB, but cellular block sizes are typically $n \lesssim 500$ . Hierarchical DMs (Steiner-Böcherer-Liva 2018) and shell DMs are being evaluated as alternatives.

Practical Constraints

•
Per-symbol distribution MB with per-block CCDM
•
DM block length $n = 272$ in 400ZR
•
Post-FEC BER $\le 10^{-15}$
•
Shaping adds 1.3-1.5 dB at high SNR

📋 Ref: OIF Implementation Agreement 400ZR, 2020

,

Historical Note: PAS: From 2015 Paper to 400ZR in 5 Years

2015-2020

Probabilistic shaping has a longer history — Forney and Ungerboeck (1998) reviewed shell-mapping and trellis shaping methods; Kschischang and Pasupathy (1993) studied the information-theoretic shaping gain. But the Probabilistic Amplitude Shaping (PAS) architecture, which reconciled shaping with the BICM framework in a way that let LDPC decoders stay off-the-shelf, is a single-paper contribution:

G. Böcherer, F. Steiner, and P. Schulte, "Bandwidth Efficient and Rate-Matched Low-Density Parity-Check Coded Modulation," IEEE Trans. Commun., Dec. 2015.

The trick is the "sign-parity" decomposition: the systematic LDPC encoder's information bits carry the MB-shaped amplitude labels; the parity bits and signs are uniform. Because the signs are independent of amplitudes under a reasonable labelling, the parity bits do not disturb the MB distribution. The receiver does ordinary LDPC decoding and then runs a distribution dematcher on the decoded amplitude bits.

The PAS architecture met industry adoption faster than any other BICM extension in history. Within three years (2018) the TU Munich / DSI Lab spinoffs had produced commercial modems. By 2020, OIF 400ZR mandated PAS in every compliant 400G coherent module. ATSC 3.0 (2017 terrestrial broadcast TV) had already shipped a related geometric shaping scheme. As of 2025 PAS is on the 6G study work-item list for eMBB extensions.

, ,

Common Mistake: Shaping Loses at Low SNR

Mistake:

A common assumption is that MB-shaped QAM is always at least as good as uniform QAM. At low SNR this is false.

Correction:

At low SNR (operating near the BI-AWGN Shannon limit of the inner ring), the uniform distribution is already near-optimal because the outer constellation points are effectively unusable — they cause too many errors. Heavy MB shaping ( $\lambda$ large) collapses the distribution onto the inner points, which are exactly the points that survive the noise. But extreme collapse wastes the constellation entirely (effective rate $< 2$ bits/symbol).

The operational rule: MB shaping provides gain only when the target rate is within $\sim 1$ bit/symbol of $\log_2 M$ . Below that, stick with uniform QAM and use a smaller constellation. PAS implementations disable shaping for low-MCS modes for exactly this reason.

Why This Matters: Forward Reference to Chapter 19: Probabilistic and Geometric Shaping

This section introduces probabilistic shaping as a concrete BICM extension currently used in 400G optical and proposed for 6G. Chapter 19 of this book takes the topic much further:

Rate-adaptive PAS: varying $\lambda$ per block to match the channel without changing the code or modulation.
Geometric shaping: rearranging the constellation points geometrically (e.g., non-uniform QAM with point-specific spacing) rather than weighting them probabilistically. Used in ATSC 3.0 terrestrial broadcast.
Hierarchical and shell DMs: alternatives to CCDM with shorter block lengths.
Joint shaping and coding: trellis shaping and Voronoi constellations that close the last 0.3 dB.

The 1.53 dB gap we introduced here is the starting point for all of Chapter 19.

Quick Check

The asymptotic shaping gain of an MB-shaped square QAM over uniform QAM is $\pi e / 6 \approx 1.53$ dB. What is the operational interpretation of this bound?

It is the ratio of the second moment of a 2D circle to that of a 2D square at equal entropy

It is the rate loss of BICM relative to CM

It is the Gray labelling gap to CM capacity

It is the LDPC decoding threshold gap to Shannon

Correction:

It is the ratio of the second moment of a 2D circle to that of a 2D square at equal entropy

Exactly — at equal entropy (matched rate), the Gaussian-like (circular) distribution has lower second moment than the uniform-over-square. The $12/(\pi e)$ factor is this ratio, equal to $\pi e / 6$ dB after reciprocation.

Maxwell-Boltzmann Distribution

A probability mass function on a finite set $\mathcal{X}$ of the form $p_\lambda(x) \propto \exp(-\lambda |x|^2)$ with $\lambda > 0$ . Maximises entropy subject to a fixed second moment. The finite- alphabet analogue of the Gaussian distribution.

Probabilistic Amplitude Shaping (PAS)

An architecture introduced by Böcherer, Steiner, and Schulte (2015) that implements probabilistic shaping within the BICM framework. A distribution matcher converts uniform bits to MB-shaped amplitude bits; a systematic LDPC code adds uniform parity; signs and parity drive the QAM mapper. Now standard in 400G optical transmission and proposed for 6G.

Distribution Matcher (DM)

A bijective block code that maps uniform input bits to an output sequence with a target non-uniform distribution. The CCDM (constant- composition DM) is the canonical example, using arithmetic coding over a fixed-type multinomial. Rate-optimal DMs approach the target entropy as the block length grows.

Key Takeaway

Probabilistic shaping closes the 1.53 dB BICM gap to Shannon. The Maxwell-Boltzmann distribution maximises entropy at fixed energy — a direct Lagrangian. The PAS architecture (Böcherer-Steiner-Schulte 2015) realises MB-shaped QAM within the BICM framework by feeding MB-shaped amplitude bits through a systematic LDPC encoder. PAS is now mandatory in 400G optical (OIF 400ZR) and is being proposed for 6G. Chapter 19 takes this much further.

Probabilistic Shaping: Closing the Gap

The 1.53 dB Shaping Gap

Definition: Maxwell-Boltzmann (MB) Distribution

Theorem: Maxwell-Boltzmann Maximises Entropy at Fixed Second Moment

Step 1: Lagrangian

Step 2: Stationarity

Step 3: Normalisation

Step 4: Lagrange multiplier from energy constraint

Step 5: Entropy expression

Theorem: Asymptotic Shaping Gain Approaches πe/6\pi e / 6πe/6

Step 1: High-SNR capacity decomposition

Step 2: Energy at fixed entropy

Step 3: Second-moment ratio at matched entropy

Step 4: Finite-$M$ approximation

Probabilistic Amplitude Shaping (PAS) Architecture

Definition: Constant-Composition Distribution Matcher (CCDM)

Probabilistic Shaping Gain: Shannon vs Uniform vs MB-Shaped QAM

Parameters

Probabilistic Shaping via Maxwell-Boltzmann Distribution

Example: Shaping Gain at 18 dB SNR for 256-QAM

Shannon bound

Uniform 256-QAM BICM capacity

MB-shaped 256-QAM

Shaping gain in dB

Operational use

OIF 400ZR: PAS in Coherent Optical Transmission

Historical Note: PAS: From 2015 Paper to 400ZR in 5 Years

Common Mistake: Shaping Loses at Low SNR

Why This Matters: Forward Reference to Chapter 19: Probabilistic and Geometric Shaping

Quick Check

Maxwell-Boltzmann Distribution

Probabilistic Amplitude Shaping (PAS)

Distribution Matcher (DM)

Key Takeaway

Definition:
Maxwell-Boltzmann (MB) Distribution

Theorem: Asymptotic Shaping Gain Approaches $\pi e / 6$

Definition:
Constant-Composition Distribution Matcher (CCDM)