The Gaussian Maximizes Entropy
The Most Important Fact for Gaussian Channels
We now prove what is arguably the single most important result for Gaussian channel analysis: among all distributions with a given variance , the Gaussian uniquely maximizes differential entropy. This has far-reaching consequences:
- It implies that Gaussian noise is the worst-case additive noise for communication — it maximizes uncertainty for a given power.
- It leads directly to the AWGN capacity formula .
- It shows that Gaussian inputs are optimal for Gaussian channels.
Every time you see in a capacity expression, this theorem is working behind the scenes.
Theorem: Gaussian Maximizes Differential Entropy Under Variance Constraint
Let be any continuous random variable with mean and variance . Then:
with equality if and only if .
The Gaussian is the "most random" distribution you can have for a given power (variance). Any other distribution with the same variance is more structured and therefore has lower entropy. Intuitively, what happens is that the Gaussian spreads its probability mass as smoothly as possible subject to the power constraint.
Write as KL divergence
Let be the PDF of and let be the PDF of . Both have mean and variance . We compute:
Evaluate the cross-entropy term
Since :
This equals — it depends on only through the variance, which equals by assumption.
Conclude
h(X) \leq \frac{1}{2}\log(2\pi e \sigma^2)D(f | \phi) = 0f = \phiX \sim \mathcal{N}(\mu, \sigma^2)$.
Gaussian Noise Is the Worst-Case Noise
Consider an additive noise channel where has variance and is independent of . The mutual information is:
For fixed noise power , with equality for Gaussian . So Gaussian noise maximizes the noise entropy, which minimizes the mutual information. The Gaussian is the hardest noise to communicate through.
The point is that the AWGN channel capacity is actually a lower bound on the capacity of any additive noise channel with the same noise power. Any other noise distribution yields higher capacity.
Example: From Maximum Entropy to the AWGN Capacity Formula
Derive the capacity of the additive white Gaussian noise (AWGN) channel , where , , and .
Mutual information
.
Bound on $\ntn{hd}(Y)$
.
By the maximum entropy theorem: , with equality when is Gaussian, which happens when is Gaussian.
Capacity
X \sim \mathcal{N}(0, P)$.
Gaussian vs Other Distributions: Entropy Comparison
Compare the differential entropy of several distributions (Gaussian, uniform, triangular, Laplace) all with the same variance . The Gaussian always has the highest differential entropy.
Parameters
Common variance for all distributions
Why This Matters: The AWGN Capacity Formula in Wireless Systems
The formula (or for complex channels) is the starting point for all wireless system design. Modern systems like 5G NR and Wi-Fi 7 use adaptive modulation and coding to approach this limit. For MIMO channels with transmit and receive antennas, the capacity becomes , which decomposes into parallel AWGN channels via the SVD of — each with its own SNR determined by the corresponding singular value. See Book telecom, Chapters 15-16.
Key Takeaway
The Gaussian maximizes differential entropy under a variance constraint. This single theorem implies: (1) Gaussian noise is worst-case for communication, (2) Gaussian inputs achieve capacity for Gaussian channels, and (3) the AWGN capacity is . Whenever you see , this theorem is at work.
The Gap Between Theory and Practice
The AWGN capacity is an asymptotic limit — achievable only in the limit of infinite blocklength. Practical systems operate at finite blocklength and incur a coding gap:
- Turbo codes (1993): within ~0.5 dB of capacity at BER
- LDPC codes (Gallager 1960, rediscovered 1990s): within ~0.1 dB
- Polar codes (Arıkan 2009): provably achieve capacity, but convergence is slow for short blocklengths
The gap decreases approximately as where is the channel dispersion and is the blocklength (Chapter 26, finite-blocklength theory).
- •
Finite blocklength incurs a rate penalty proportional to
- •
Practical decoders have complexity constraints limiting blocklength
- •
Latency requirements upper-bound the blocklength in real-time systems
Quick Check
Among all additive noise distributions with variance (independent of input ), which minimizes the channel capacity?
Gaussian noise
Uniform noise
Laplace noise
The noise distribution does not affect capacity
Gaussian noise maximizes for a given variance, which maximizes in ... Actually, the story is more subtle: Gaussian noise minimizes capacity because it maximizes the noise entropy AND it forces the optimal to also be Gaussian. The full argument uses the EPI (Section 2.4).