The Gaussian Maximizes Entropy

The Most Important Fact for Gaussian Channels

We now prove what is arguably the single most important result for Gaussian channel analysis: among all distributions with a given variance σ2\sigma^2, the Gaussian N(0,σ2)\mathcal{N}(0, \sigma^2) uniquely maximizes differential entropy. This has far-reaching consequences:

  • It implies that Gaussian noise is the worst-case additive noise for communication — it maximizes uncertainty for a given power.
  • It leads directly to the AWGN capacity formula C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}).
  • It shows that Gaussian inputs are optimal for Gaussian channels.

Every time you see log(1+SNR)\log(1 + \text{SNR}) in a capacity expression, this theorem is working behind the scenes.

Theorem: Gaussian Maximizes Differential Entropy Under Variance Constraint

Let XX be any continuous random variable with mean μ\mu and variance σ2\sigma^2. Then:

h(X)12log(2πeσ2),h(X) \leq \frac{1}{2}\log(2\pi e \sigma^2),

with equality if and only if XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2).

The Gaussian is the "most random" distribution you can have for a given power (variance). Any other distribution with the same variance is more structured and therefore has lower entropy. Intuitively, what happens is that the Gaussian spreads its probability mass as smoothly as possible subject to the power constraint.

Gaussian Noise Is the Worst-Case Noise

Consider an additive noise channel Y=X+ZY = X + Z where ZZ has variance σ2\sigma^2 and is independent of XX. The mutual information is:

I(X;Y)=h(Y)h(YX)=h(Y)h(Z).I(X;Y) = h(Y) - h(Y|X) = h(Y) - h(Z).

For fixed noise power σ2\sigma^2, h(Z)12log(2πeσ2)h(Z) \leq \frac{1}{2}\log(2\pi e \sigma^2) with equality for Gaussian ZZ. So Gaussian noise maximizes the noise entropy, which minimizes the mutual information. The Gaussian is the hardest noise to communicate through.

The point is that the AWGN channel capacity C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}) is actually a lower bound on the capacity of any additive noise channel with the same noise power. Any other noise distribution yields higher capacity.

Example: From Maximum Entropy to the AWGN Capacity Formula

Derive the capacity of the additive white Gaussian noise (AWGN) channel Y=X+ZY = X + Z, where ZN(0,σ2)Z \sim \mathcal{N}(0, \sigma^2), E[X2]P\mathbb{E}[X^2] \leq P, and XZX \perp Z.

Gaussian vs Other Distributions: Entropy Comparison

Compare the differential entropy of several distributions (Gaussian, uniform, triangular, Laplace) all with the same variance σ2\sigma^2. The Gaussian always has the highest differential entropy.

Parameters
1

Common variance for all distributions

Why This Matters: The AWGN Capacity Formula in Wireless Systems

The formula C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}) (or C=log(1+SNR)C = \log(1 + \text{SNR}) for complex channels) is the starting point for all wireless system design. Modern systems like 5G NR and Wi-Fi 7 use adaptive modulation and coding to approach this limit. For MIMO channels with nTn_T transmit and nRn_R receive antennas, the capacity becomes C=logdet(I+SNRnTHHH)C = \log\det(\mathbf{I} + \frac{\text{SNR}}{n_T}\mathbf{H}\mathbf{H}^{H}), which decomposes into parallel AWGN channels via the SVD of H\mathbf{H} — each with its own SNR determined by the corresponding singular value. See Book telecom, Chapters 15-16.

Key Takeaway

The Gaussian maximizes differential entropy under a variance constraint. This single theorem implies: (1) Gaussian noise is worst-case for communication, (2) Gaussian inputs achieve capacity for Gaussian channels, and (3) the AWGN capacity is C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}). Whenever you see log(1+SNR)\log(1 + \text{SNR}), this theorem is at work.

⚠️Engineering Note

The Gap Between Theory and Practice

The AWGN capacity 12log(1+SNR)\frac{1}{2}\log(1 + \text{SNR}) is an asymptotic limit — achievable only in the limit of infinite blocklength. Practical systems operate at finite blocklength and incur a coding gap:

  • Turbo codes (1993): within ~0.5 dB of capacity at BER 10510^{-5}
  • LDPC codes (Gallager 1960, rediscovered 1990s): within ~0.1 dB
  • Polar codes (Arıkan 2009): provably achieve capacity, but convergence is slow for short blocklengths

The gap decreases approximately as V/n\sqrt{V/n} where VV is the channel dispersion and nn is the blocklength (Chapter 26, finite-blocklength theory).

Practical Constraints
  • Finite blocklength incurs a rate penalty proportional to 1/n1/\sqrt{n}

  • Practical decoders have complexity constraints limiting blocklength

  • Latency requirements upper-bound the blocklength in real-time systems

Quick Check

Among all additive noise distributions ZZ with variance σ2\sigma^2 (independent of input XX), which minimizes the channel capacity?

Gaussian noise

Uniform noise

Laplace noise

The noise distribution does not affect capacity

Gaussian Maximizes Entropy

Compares several distributions (Gaussian, uniform, Laplace) with the same variance, showing their PDFs and differential entropies. The Gaussian always has the highest entropy.

AWGN Channel Capacity Derivation

Step-by-step derivation of C=12log(1+SNR)C = \frac{1}{2}\log(1 + \text{SNR}) from the Gaussian maximum entropy theorem. Shows each step from Y=X+ZY = X + Z to the final capacity formula.