Probabilistic Shaping: Closing the Gap
The 1.53 dB Shaping Gap
We have now seen every modern standard customise BICM — 5G NR, Wi-Fi, DVB-S2 — and we have seen the AMC envelope approach Shannon within a dB or two. The last remaining gap has a specific cause and a specific remedy: shaping.
With a uniformly distributed QAM input (which every standard of the previous sections uses), the maximum mutual information is bounded by , the modulation order. For high-SNR operation this leaves a dB gap to Shannon capacity — the "cubic shaping gap" between a uniform distribution over a square constellation and the truly-optimal Gaussian input.
Closing that 1.53 dB gap is worth the fight. On a 400 Gbps optical coherent link, 1.5 dB is a 50% reach extension. On a satellite link with a fixed EIRP budget, 1.5 dB is an extra 40% throughput without more power or bandwidth. The solution is probabilistic shaping (PS): instead of driving each QAM point with equal probability, weight outer (high-energy) points lower and inner (low-energy) points higher, producing an approximately Gaussian envelope. The specific distribution that maximises entropy at a fixed average-power constraint is the Maxwell-Boltzmann distribution, and the practical scheme that implements it within the BICM framework is the Probabilistic Amplitude Shaping (PAS) architecture of Böcherer, Steiner, and Schulte (2015).
This section explains how PAS works, proves the 1.53 dB bound, and walks through why PAS is now standard in 400G coherent optical transmission (OIF 400ZR) and is being proposed for 6G. Shaping is the modern capstone of BICM design.
Definition: Maxwell-Boltzmann (MB) Distribution
Maxwell-Boltzmann (MB) Distribution
For a finite constellation (e.g., a square QAM), the Maxwell-Boltzmann (MB) distribution with parameter is Outer (high-energy) constellation points receive exponentially lower probability than inner points. The parameter controls the average energy: Increasing shrinks the average energy (fewer outer points); decreasing toward 0 recovers the uniform distribution.
The MB distribution is the finite-alphabet analogue of the Gaussian distribution on : both maximise differential (resp. discrete) entropy subject to a fixed second moment. On a QAM lattice, MB "samples" the Gaussian density at lattice points. As and with appropriate normalisation, the MB distribution converges to a continuous Gaussian.
Theorem: Maxwell-Boltzmann Maximises Entropy at Fixed Second Moment
Let be a finite set and let . Among all probability mass functions on satisfying , the entropy is maximised uniquely by the Maxwell-Boltzmann distribution with chosen such that . The maximum entropy is
The proof is a textbook Lagrangian: maximise entropy subject to an energy constraint by introducing a multiplier for the constraint and solving. The exponential form falls out automatically. This is the finite-alphabet version of Shannon's theorem that Gaussian distribution maximises differential entropy at fixed variance — with the MB on a lattice playing the role of the Gaussian on .
Set up the Lagrangian .
Take for each .
Solve for in terms of — observe the exponential form.
Use the normalisation constraint to eliminate .
The uniqueness follows from strict concavity of entropy.
Step 1: Lagrangian
Maximise subject to and . Form
Step 2: Stationarity
, so where is a normalisation constant.
Step 3: Normalisation
, so . This gives exactly the Maxwell-Boltzmann distribution with .
Step 4: Lagrange multiplier from energy constraint
The multiplier is the unique positive solution to , which exists and is unique because is strictly decreasing.
Step 5: Entropy expression
Substituting into ,
Theorem: Asymptotic Shaping Gain Approaches
Let be a square QAM constellation on a uniform grid with unit spacing. Define the shaping gain as Then in the high-SNR limit with chosen so that both constellations achieve the same information rate , Equivalently, MB-shaped QAM requires 1.53 dB less SNR than uniform QAM to achieve the same rate in the high-SNR limit.
The bound comes from two facts: (i) at high SNR, the spherical Gaussian is the capacity-achieving distribution; (ii) a uniform distribution on a square is suboptimal because a square does not match a circular Gaussian envelope. The factor is explicitly the ratio of the second moment of a sphere to that of a cube at the same volume (more precisely, the square-to-circle second-moment ratio in 2D is ). This 1.53 dB is the universal "cubic shaping gap" that no amount of coding can close without shaping.
Use the equivalence: at high SNR, capacity is , and .
So shaping gain entropy ratio at matched second moment.
In the limit , these converge to differential entropies of Gaussian vs uniform-over-square.
Differential entropy of Gaussian in 2D: . Differential entropy of uniform over square of second moment : ... compute ratio.
Step 1: High-SNR capacity decomposition
For a channel with noise , at high SNR, because . So at fixed , the channel reaches capacity; shaping reduces the energy needed to achieve a given .
Step 2: Energy at fixed entropy
For a Gaussian distribution in 2D with variance per dimension, . For a uniform distribution on a square of side with second moment : (per dimension), and .
Step 3: Second-moment ratio at matched entropy
To match : , so . More precisely, the 2D shaping gain for a circular Gaussian vs uniform-over-square at the same entropy is In dB: dB.
Step 4: Finite-$M$ approximation
For finite the MB distribution approximates the Gaussian on the lattice; the shaping gain converges to as . At the asymptotic is already reached within 0.1 dB.
Probabilistic Amplitude Shaping (PAS) Architecture
Definition: Constant-Composition Distribution Matcher (CCDM)
Constant-Composition Distribution Matcher (CCDM)
A constant-composition distribution matcher (CCDM) is a bijective (losslessly invertible) block code that maps uniformly distributed input bits to a sequence of amplitude symbols from with a fixed composition — the number of output symbols equal to is fixed across all codewords. The input-output rate is where is the target MB distribution quantised to -tuple counts. The standard CCDM implementation is arithmetic coding over the multinomial distribution restricted to the fixed composition; the encoder and decoder are streaming with state.
The key operational property: every codeword of the CCDM has exactly the target type , so the average energy is exactly the MB target energy. The rate loss is below , and approaches zero at .
CCDM is not the only distribution matcher; alternatives include hierarchical DMs (Steiner-Böcherer-Liva 2018) that trade rate loss for shorter processing blocks, and shell mappers (Laroia 1994) that predate PAS but are conceptually similar. The CCDM is the choice in the 2015 PAS paper because of its simplicity.
Probabilistic Shaping Gain: Shannon vs Uniform vs MB-Shaped QAM
Achievable rate curves for uniform -QAM, MB-shaped -QAM (optimal per SNR), and the Shannon bound . Observe that MB-shaped QAM recovers - dB over uniform QAM at high SNR, closing nearly all of the asymptotic dB shaping gap. At low SNR, the shaping gain is smaller because the uniform distribution is already near-optimal — the MB distribution converges to uniform as . The sweet spot for shaping in practice is at rates close to bit per symbol, where the shaping gain is largest.
Parameters
Probabilistic Shaping via Maxwell-Boltzmann Distribution
Example: Shaping Gain at 18 dB SNR for 256-QAM
At dB, compute the achievable rate for (a) uniform 256-QAM, (b) optimally MB-shaped 256-QAM, (c) Shannon bound. Quantify the shaping gain.
Shannon bound
dB linear. Shannon capacity bits/2D symbol.
Uniform 256-QAM BICM capacity
For uniform Gray-labelled 256-QAM at 18 dB, the BICM capacity is approximately bits/2D symbol (see Fig. 2 of the Böcherer-Steiner-Schulte paper). The 18 dB SNR is above the "knee" of the 256-QAM BICM curve, so the uniform distribution has almost saturated at 8 bits/symbol but with significant mutual-information loss to CM.
MB-shaped 256-QAM
With (picked to target rate 5.9), the MB-shaped 256-QAM achieves approximately bits/2D symbol. This is within bits of the Shannon bound — versus uniform 256-QAM's bit gap.
Shaping gain in dB
at rate 5.55: uniform needs 18 dB, MB-shaped 256-QAM needs about 16.7 dB. Shaping gain dB at this rate — consistent with approaching the asymptotic dB in the high- SNR regime.
Operational use
In optical coherent transmission at 400 Gbps per wavelength, this 1.3 dB is what separates 80 km reach from 120 km reach. The OIF 400ZR module specification mandates PAS precisely to exploit it.
OIF 400ZR: PAS in Coherent Optical Transmission
OIF 400ZR (Implementation Agreement, 2020) is the first commercial standard to specify probabilistic amplitude shaping as mandatory. The specification targets 400 Gbps point-to-point coherent optical transmission at km reach over conventional SMF fibre. Key design choices:
- Modulation: dual-polarisation 16-QAM at baseline (DP-16QAM). Rate per polarisation: 3.17 bits/symbol (vs. 4 for uniform DP-16QAM). The shaped rate of 3.17 was chosen to match the chromatic dispersion and fibre nonlinearity budget for the target reach.
- Shaping: constant-composition distribution matcher (CCDM) at block length with target MB distribution.
- Inner code: staircase code (a Forward Error Correction scheme with overhead) combined with a degree-1 Hamming outer code. Post-FEC BER target: .
- Concatenation: uniform bits are the sign + parity; shaped amplitude bits are the DM output. Exactly the PAS template.
The commercial impact: 400ZR modules based on this spec are now the dominant 400G DCI optic, shipping in hundreds of thousands of units annually. The shaping gain alone accounts for 30% of the reach advantage over pre-400ZR 400G modulation formats.
PAS is now being proposed for 6G (3GPP Release 20+ study items) as an optional mode for eMBB at high SNR. The main holdup is the DM block length: CCDMs need to approach the shaping gain within 0.2 dB, but cellular block sizes are typically . Hierarchical DMs (Steiner-Böcherer-Liva 2018) and shell DMs are being evaluated as alternatives.
- •
Per-symbol distribution MB with per-block CCDM
- •
DM block length in 400ZR
- •
Post-FEC BER
- •
Shaping adds 1.3-1.5 dB at high SNR
Historical Note: PAS: From 2015 Paper to 400ZR in 5 Years
2015-2020Probabilistic shaping has a longer history — Forney and Ungerboeck (1998) reviewed shell-mapping and trellis shaping methods; Kschischang and Pasupathy (1993) studied the information-theoretic shaping gain. But the Probabilistic Amplitude Shaping (PAS) architecture, which reconciled shaping with the BICM framework in a way that let LDPC decoders stay off-the-shelf, is a single-paper contribution:
G. Böcherer, F. Steiner, and P. Schulte, "Bandwidth Efficient and Rate-Matched Low-Density Parity-Check Coded Modulation," IEEE Trans. Commun., Dec. 2015.
The trick is the "sign-parity" decomposition: the systematic LDPC encoder's information bits carry the MB-shaped amplitude labels; the parity bits and signs are uniform. Because the signs are independent of amplitudes under a reasonable labelling, the parity bits do not disturb the MB distribution. The receiver does ordinary LDPC decoding and then runs a distribution dematcher on the decoded amplitude bits.
The PAS architecture met industry adoption faster than any other BICM extension in history. Within three years (2018) the TU Munich / DSI Lab spinoffs had produced commercial modems. By 2020, OIF 400ZR mandated PAS in every compliant 400G coherent module. ATSC 3.0 (2017 terrestrial broadcast TV) had already shipped a related geometric shaping scheme. As of 2025 PAS is on the 6G study work-item list for eMBB extensions.
Common Mistake: Shaping Loses at Low SNR
Mistake:
A common assumption is that MB-shaped QAM is always at least as good as uniform QAM. At low SNR this is false.
Correction:
At low SNR (operating near the BI-AWGN Shannon limit of the inner ring), the uniform distribution is already near-optimal because the outer constellation points are effectively unusable — they cause too many errors. Heavy MB shaping ( large) collapses the distribution onto the inner points, which are exactly the points that survive the noise. But extreme collapse wastes the constellation entirely (effective rate bits/symbol).
The operational rule: MB shaping provides gain only when the target rate is within bit/symbol of . Below that, stick with uniform QAM and use a smaller constellation. PAS implementations disable shaping for low-MCS modes for exactly this reason.
Why This Matters: Forward Reference to Chapter 19: Probabilistic and Geometric Shaping
This section introduces probabilistic shaping as a concrete BICM extension currently used in 400G optical and proposed for 6G. Chapter 19 of this book takes the topic much further:
- Rate-adaptive PAS: varying per block to match the channel without changing the code or modulation.
- Geometric shaping: rearranging the constellation points geometrically (e.g., non-uniform QAM with point-specific spacing) rather than weighting them probabilistically. Used in ATSC 3.0 terrestrial broadcast.
- Hierarchical and shell DMs: alternatives to CCDM with shorter block lengths.
- Joint shaping and coding: trellis shaping and Voronoi constellations that close the last 0.3 dB.
The 1.53 dB gap we introduced here is the starting point for all of Chapter 19.
Quick Check
The asymptotic shaping gain of an MB-shaped square QAM over uniform QAM is dB. What is the operational interpretation of this bound?
It is the ratio of the second moment of a 2D circle to that of a 2D square at equal entropy
It is the rate loss of BICM relative to CM
It is the Gray labelling gap to CM capacity
It is the LDPC decoding threshold gap to Shannon
Exactly — at equal entropy (matched rate), the Gaussian-like (circular) distribution has lower second moment than the uniform-over-square. The factor is this ratio, equal to dB after reciprocation.
Maxwell-Boltzmann Distribution
A probability mass function on a finite set of the form with . Maximises entropy subject to a fixed second moment. The finite- alphabet analogue of the Gaussian distribution.
Related: Probabilistic Shaping, Probabilistic Amplitude Shaping (PAS), Constant-Composition Distribution Matcher (CCDM)
Probabilistic Amplitude Shaping (PAS)
An architecture introduced by Böcherer, Steiner, and Schulte (2015) that implements probabilistic shaping within the BICM framework. A distribution matcher converts uniform bits to MB-shaped amplitude bits; a systematic LDPC code adds uniform parity; signs and parity drive the QAM mapper. Now standard in 400G optical transmission and proposed for 6G.
Related: Mb Distribution, Constant-Composition Distribution Matcher (CCDM), BICM Is a Paradigm, Not a Specification
Distribution Matcher (DM)
A bijective block code that maps uniform input bits to an output sequence with a target non-uniform distribution. The CCDM (constant- composition DM) is the canonical example, using arithmetic coding over a fixed-type multinomial. Rate-optimal DMs approach the target entropy as the block length grows.
Related: Constant-Composition Distribution Matcher (CCDM), Probabilistic Amplitude Shaping (PAS), Probabilistic Shaping
Key Takeaway
Probabilistic shaping closes the 1.53 dB BICM gap to Shannon. The Maxwell-Boltzmann distribution maximises entropy at fixed energy — a direct Lagrangian. The PAS architecture (Böcherer-Steiner-Schulte 2015) realises MB-shaped QAM within the BICM framework by feeding MB-shaped amplitude bits through a systematic LDPC encoder. PAS is now mandatory in 400G optical (OIF 400ZR) and is being proposed for 6G. Chapter 19 takes this much further.