Maxwell-Boltzmann Shaping
From the 1.53 dB Ceiling to a Constructive Recipe
Chapter 4 told us, with a beautiful entropy-power argument, that on a bounded 2D lattice we leave up to dB on the table by transmitting uniform QAM. Chapter 9 showed that 400G coherent optical has actually claimed that dB in the field. But how do you claim it? What is the concrete input distribution that the system designer should target?
The answer is the Maxwell-Boltzmann (MB) distribution . It is the minimum-change, most natural recipe: keep the same QAM grid, just weight inner points exponentially more than outer points. The parameter is the shaping knob — zero recovers uniform QAM, large collapses onto the constellation origin. For every operating SNR there is a unique optimal that extracts the maximum shaping gain.
This is not an ad-hoc choice. MB is the unique max-entropy distribution on a finite alphabet subject to an average-power constraint — exactly the same variational principle that gives Gaussian on , Poisson on , and exponential on . The KKT conditions drop out an exponential form. The Gaussian capacity-achieving theorem on AWGN thus maps, lattice by lattice, onto MB as the finite-constellation analogue.
Two payoffs from this section:
- Theoretical: a complete derivation of MB from Lagrangian KKT, and a numerical read of MB entropy vs for 16-/64-/256-QAM.
- Operational: a plot of achievable rates — Shannon vs uniform QAM vs MB-shaped QAM — showing where the 1.53 dB is recovered and where it isn't (hint: at low SNR, shaping barely helps).
Definition: Maxwell-Boltzmann Distribution on a Constellation
Maxwell-Boltzmann Distribution on a Constellation
Let be a finite constellation (e.g., square -QAM). For any , the Maxwell-Boltzmann (MB) distribution on with parameter is Here is the partition function. The expected energy under is and its entropy (in bits) is Limits. As , uniform distribution; as , collapses onto the minimum-energy point (the constellation centre). The map from average energy to shaping parameter is strictly decreasing and bijective on .
The MB distribution is the discrete finite-alphabet analogue of the 2D Gaussian on : both are proportional to up to normalisation, and both emerge from the same Lagrangian stationary-point equation. For with a fixed average power, MB on the -QAM lattice converges weakly to the continuous Gaussian.
Theorem: Maxwell-Boltzmann is Capacity-Achieving on a Finite Constellation with Average-Power Constraint
Let be a finite constellation and let be an average-power budget. Consider the AWGN channel , , with the input restricted to and subject to . At high SNR, among all probability mass functions on with , mutual information is maximised by the Maxwell-Boltzmann distribution with chosen so that .
The maximum entropy (bits) is
At high SNR, because . So maximising mutual information reduces to maximising entropy at fixed energy — a textbook Lagrangian. The KKT stationarity equation gives an exponential form , the MB distribution. The same Lagrangian machinery on a continuous alphabet with a second-moment constraint yields the Gaussian.
Note the high-SNR qualifier: at finite SNR the optimum deviates slightly from MB (the Arimoto-Blahut iteration converges to a different vector because ). But the MB distribution is always a very good approximation, and in the high-SNR limit it is exactly optimal.
Use and at high SNR.
Maximise subject to and via Lagrangian .
Set and solve for .
Normalise via the constraint.
The Lagrange multiplier is determined uniquely by the energy constraint.
Step 1: High-SNR reduction to entropy maximisation
Let be any distribution on with energy . Then where is discrete and is continuous. At high SNR the receiver identifies with high probability, so as . Hence asymptotically , and the optimisation reduces to subject to the constraints.
Step 2: Lagrangian for constrained max-entropy
Form the Lagrangian The Lagrange multipliers (power constraint) and (normalisation) are unknowns to be determined from the constraints.
Step 3: Stationarity gives MB form
Stationarity: , so where absorbs the constant. This is exactly the MB functional form.
Step 4: Normalisation gives partition function
The constraint fixes with . So
Step 5: Energy constraint fixes $\lambda$
The remaining constraint defines uniquely because the map is strictly decreasing (by convexity of in ). So for every achievable there exists a unique and hence a unique MB distribution.
Step 6: Maximum entropy value
Substituting into : In bits, .
Maxwell-Boltzmann Distribution on -QAM
Visualisation of the MB distribution on a square -QAM grid. Each constellation point is drawn as a disk whose area is proportional to the MB probability . Move the slider: at all disks are equal (uniform); as grows, outer (high-energy) points shrink and inner points grow, producing a Gaussian-like envelope on the lattice. Observe that the effect is barely visible for very small (the MB distribution is locally a smooth perturbation of uniform), becomes pronounced at , and collapses onto the inner ring for large . Operating range in practice: for 64-QAM at SNR between 12 and 25 dB.
Parameters
Example: MB Entropy for 64-QAM at
Compute the entropy in bits/symbol of the MB distribution on 64-QAM (unit-spacing square grid, points at per dimension) with . Compare to the uniform-QAM entropy of 6 bits/symbol.
Enumerate the 64 QAM points
Each point has coordinates with . The squared norm is . There are 4 points with (the inner ring ), 8 points with , 4 with , and so on up to with .
Compute the partition function
where is the multiplicity. Numerical evaluation at gives (versus for uniform).
Compute the average energy
. At this is approximately (versus for uniform 64-QAM). The shaping has cut average energy by almost a factor 3 — that is exactly the power saving at constant rate.
Compute the entropy
bits/symbol. So MB-shaped 64-QAM at carries bits/symbol at average energy , versus uniform 64-QAM at bits/symbol at average energy .
Operational comparison
To transmit bits/symbol reliably, uniform 64-QAM would need to drop to 32-QAM (5 bits/symbol, ) — or use a rate- code on 64-QAM at . MB-shaped 64-QAM achieves the same rate at , which is a power saving of dB over the 32-QAM route and dB over the brute-force rate-controlled route. The 1.6 dB saving matches the shaping-gain prediction of roughly 1.5 dB.
Shaping Gain vs SNR: Shannon, Uniform QAM, and MB-Shaped QAM
Achievable rate (bits/2D symbol) versus in dB for three input distributions: (i) Shannon bound (continuous Gaussian input); (ii) uniform -QAM; (iii) optimally MB-shaped -QAM (i.e., chosen per SNR to maximise mutual information). The shaping gain (in dB at a fixed target rate) grows from dB at low SNR to dB at the high-SNR "knee" of each QAM curve. The optimally-shaped curve asymptotically approaches Shannon within dB for large . Toggle the 256-QAM curve to see how larger constellations close the gap faster but at higher SNR.
Parameters
Operational Reading of the Shaping Curve
The shaping-gain plot tells a sharp story. At low SNR (below the uniform-QAM knee), the uniform distribution is already near-optimal because the outer constellation points are unusable — noise dominates and only the inner ring survives. There MB barely beats uniform.
At the high-SNR knee, where the uniform rate is close to , the shaping gain is largest: MB squeezes an extra to dB out of the constellation. This is the design sweet spot.
In deployment:
- 400ZR optical: DP-16QAM at dB, target rate bits/polarisation. Uniform rate limit bits/pol is at the knee; shaping saves dB.
- 6G eMBB (proposed): 256-QAM or 1024-QAM at - dB. Same knee logic; dB savings.
- Satellite DVB-S2X: 32-APSK or 64-APSK; shaping is optional (Annex extension), used for highest-rate MODCODs where the knee is near the link budget.
A rough rule: shaping pays off when the uniform BICM rate is within bit/symbol of . Below that, switch to a smaller constellation.
Common Mistake: MB Distribution is NOT a Discrete Gaussian
Mistake:
A common confusion is to think that the Maxwell-Boltzmann distribution on a constellation is the same thing as a "discretised Gaussian" obtained by restricting to the constellation lattice and renormalising. They look similar — both are proportional to — but they are not the same distribution.
Correction:
A Gaussian restricted to and renormalised has the form , which after simplification is . This is MB with . So up to the parameter mapping , the two distributions are the same functional form, and the confusion is harmless.
The subtle but real difference: the discrete MB distribution is optimised to maximise entropy on the finite grid, not to match the continuous Gaussian on . For large the two converge; for small (e.g., 16-QAM), MB entropy at a target energy exceeds the entropy of the clipped-Gaussian approximation by a small but non-trivial amount. Always specify MB and solve from the energy constraint rather than setting by eyeballing the Gaussian envelope.
Historical Note: From Shannon 1948 to Forney's Sphere Bound to MB
1948-1984Shannon's 1948 capacity theorem establishes that the Gaussian distribution is optimal on AWGN, leaving open the question of constellation design for practical codes. Kschischang and Pasupathy (1993, 2016) gave the first systematic treatment of the shaping gain as the ratio of spherical to cubic second moments — independent of any code. Forney (1984, 1992) developed the lattice- theoretic framework: shape the constellation bounding region as a sphere-like fundamental domain (Voronoi region of a dense lattice), recovering up to dB asymptotically. This was the "geometric" side of shaping — before it had that name.
The "probabilistic" side emerged from information theory: the MB distribution was introduced by Kschischang and Pasupathy in the shaping-gain analysis, and later by Calderbank-Ozarow (1990) in non-equiprobable signalling. The key 1998 survey of Forney and Ungerboeck pulled it all together: for a finite constellation, MB is the max-entropy-at-fixed-energy distribution, and it achieves the same asymptotic gain as sphere shaping.
The practical adoption had to wait another 17 years: shaping remained theoretical until Bocherer-Steiner-Schulte (2015) showed how to reconcile MB with the systematic LDPC + BICM infrastructure of 5G-era systems. We pick up that story in Section 2.
Why High-SNR Optical Was First to Deploy MB Shaping
Probabilistic shaping with MB distribution only pays off at high SNR — specifically when the uniform BICM rate is within about bit per 2D symbol of the cardinality bound . The first commercial domain where this was routinely the operating point was coherent optical transmission:
- Optical AWGN channel: dominated by amplified spontaneous emission (ASE) from erbium-doped fibre amplifiers. At the target km reach of 400ZR, per-polarisation SNR is around dB — right at the DP-16QAM knee.
- Flexibility: optical links are upgraded in discrete Gbps steps, and the fibre reach varies by deployment. Shaping provides continuous rate adaptation (by varying ) without changing hardware.
- DSP budget: coherent modems already run GHz ASICs with sophisticated DSP; adding a CCDM block is a small incremental cost.
In contrast, cellular has historically operated at lower SNR per stream (most UEs are not at the knee) and uses discrete MCS indices for AMC. These conditions made PAS low-priority for LTE and 5G NR Release 15-17. The 6G study item (Release 20+) revisits PAS for the highest-MCS modes.
- •
Shaping pays when uniform BICM rate is within 1 bit/symbol of
- •
Requires average SNR at the constellation knee (typically dB for 16-QAM, dB for 64-QAM)
- •
Adds CCDM computational complexity per block
Quick Check
In the Lagrangian derivation of the MB distribution, the exponential form arises from which source?
The KKT stationarity condition applied to an entropy objective with a quadratic energy constraint
A Gaussian assumption on the transmitted waveform
The central limit theorem applied to a uniform QAM distribution
Arithmetic coding of the information bits
Correct. The stationarity equation gives , which after normalisation is the MB distribution. The quadratic constraint is what produces the exponential-squared form (rather than an exponential-linear form that would arise from an constraint).
Maxwell-Boltzmann Distribution
A probability mass function on a finite constellation of the form , where is a shaping parameter and is the partition function. Uniquely maximises entropy subject to a fixed-energy constraint; the finite-alphabet analogue of the Gaussian distribution on .
Related: Probabilistic Shaping, Constant-Composition Distribution Matcher (CCDM), Probabilistic Amplitude Shaping (PAS) Architecture
Partition Function
The normalisation constant of the MB distribution. The name is borrowed from statistical mechanics, where the analogous normalises the Boltzmann distribution over energy states. The first-log-derivative gives the expected energy; the second gives the energy variance.
Related: Mb Distribution, Lagrangian, Convexity
Key Takeaway
The Maxwell-Boltzmann distribution is the unique capacity-achieving input distribution on a finite constellation at high SNR, derived by a textbook Lagrangian that places an exponential weight on each constellation point according to its squared norm. The shaping parameter is a one-dimensional knob that continuously interpolates between uniform and collapsed-to-origin distributions. At the high-SNR knee of any QAM curve, buys - dB — close to the asymptote from Chapter 4.