Ferkans — Interactive Telecom Tutor

The Most Important Fact for Gaussian Channels

We now prove what is arguably the single most important result for Gaussian channel analysis: among all distributions with a given variance $\sigma^2$ , the Gaussian $\mathcal{N}(0, \sigma^2)$ uniquely maximizes differential entropy. This has far-reaching consequences:

It implies that Gaussian noise is the worst-case additive noise for communication — it maximizes uncertainty for a given power.
It leads directly to the AWGN capacity formula $C = \frac{1}{2}\log(1 + \text{SNR})$ .
It shows that Gaussian inputs are optimal for Gaussian channels.

Every time you see $\log(1 + \text{SNR})$ in a capacity expression, this theorem is working behind the scenes.

Theorem: Gaussian Maximizes Differential Entropy Under Variance Constraint

Let $X$ be any continuous random variable with mean $\mu$ and variance $\sigma^2$ . Then:

$h(X) \leq \frac{1}{2}\log(2\pi e \sigma^2),$

with equality if and only if $X \sim \mathcal{N}(\mu, \sigma^2)$ .

The Gaussian is the "most random" distribution you can have for a given power (variance). Any other distribution with the same variance is more structured and therefore has lower entropy. Intuitively, what happens is that the Gaussian spreads its probability mass as smoothly as possible subject to the power constraint.

Proof

Write as KL divergence

Let $f$ be the PDF of $X$ and let $\phi$ be the PDF of $\mathcal{N}(\mu, \sigma^2)$ . Both have mean $\mu$ and variance $\sigma^2$ . We compute:

$D(f \| \phi) = \int f(x) \log \frac{f(x)}{\phi(x)}\,dx = -h(X) - \int f(x) \log \phi(x)\,dx.$

Evaluate the cross-entropy term

Since $\phi(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-(x-\mu)^2/(2\sigma^2)}$ :

$-\int f(x) \log \phi(x)\,dx = \frac{1}{2}\log(2\pi\sigma^2) + \frac{\mathbb{E}[(X-\mu)^2]}{2\sigma^2 \ln 2} = \frac{1}{2}\log(2\pi\sigma^2) + \frac{1}{2\ln 2}.$

This equals $\frac{1}{2}\log(2\pi e \sigma^2)$ — it depends on $f$ only through the variance, which equals $\sigma^2$ by assumption.

Conclude

$D(f \| \phi) = -h(X) + \frac{1}{2}\log(2\pi e \sigma^2) \geq 0.$ $Therefore$ h(X) \leq \frac{1}{2}\log(2\pi e \sigma^2) $. Equality holds iff$ D(f | \phi) = 0 $, i.e.,$ f = \phi $a.e., i.e.,$ X \sim \mathcal{N}(\mu, \sigma^2)$.

Gaussian Noise Is the Worst-Case Noise

Consider an additive noise channel $Y = X + Z$ where $Z$ has variance $\sigma^2$ and is independent of $X$ . The mutual information is:

$I(X;Y) = h(Y) - h(Y|X) = h(Y) - h(Z).$

For fixed noise power $\sigma^2$ , $h(Z) \leq \frac{1}{2}\log(2\pi e \sigma^2)$ with equality for Gaussian $Z$ . So Gaussian noise maximizes the noise entropy, which minimizes the mutual information. The Gaussian is the hardest noise to communicate through.

The point is that the AWGN channel capacity $C = \frac{1}{2}\log(1 + \text{SNR})$ is actually a lower bound on the capacity of any additive noise channel with the same noise power. Any other noise distribution yields higher capacity.

Example: From Maximum Entropy to the AWGN Capacity Formula

Derive the capacity of the additive white Gaussian noise (AWGN) channel $Y = X + Z$ , where $Z \sim \mathcal{N}(0, \sigma^2)$ , $\mathbb{E}[X^2] \leq P$ , and $X \perp Z$ .

Solution

Mutual information

$I(X;Y) = h(Y) - h(Y|X) = h(Y) - h(Z) = h(Y) - \frac{1}{2}\log(2\pi e \sigma^2)$ .

Bound on $\ntn{hd}(Y)$

$\text{Var}(Y) = \text{Var}(X) + \text{Var}(Z) \leq P + \sigma^2$ .

By the maximum entropy theorem: $h(Y) \leq \frac{1}{2}\log(2\pi e (P + \sigma^2))$ , with equality when $Y$ is Gaussian, which happens when $X$ is Gaussian.

Capacity

$C = \max_{\mathbb{E}[X^2] \leq P} I(X;Y) = \frac{1}{2}\log\frac{2\pi e(P + \sigma^2)}{2\pi e \sigma^2} = \frac{1}{2}\log\!\left(1 + \frac{P}{\sigma^2}\right) = \frac{1}{2}\log(1 + \text{SNR}).$ $The capacity-achieving input is$ X \sim \mathcal{N}(0, P)$.

Gaussian vs Other Distributions: Entropy Comparison

Compare the differential entropy of several distributions (Gaussian, uniform, triangular, Laplace) all with the same variance $\sigma^2$ . The Gaussian always has the highest differential entropy.

Parameters

Variance σ²1

Common variance for all distributions

Why This Matters: The AWGN Capacity Formula in Wireless Systems

The formula $C = \frac{1}{2}\log(1 + \text{SNR})$ (or $C = \log(1 + \text{SNR})$ for complex channels) is the starting point for all wireless system design. Modern systems like 5G NR and Wi-Fi 7 use adaptive modulation and coding to approach this limit. For MIMO channels with $n_T$ transmit and $n_R$ receive antennas, the capacity becomes $C = \log\det(\mathbf{I} + \frac{\text{SNR}}{n_T}\mathbf{H}\mathbf{H}^{H})$ , which decomposes into parallel AWGN channels via the SVD of $\mathbf{H}$ — each with its own SNR determined by the corresponding singular value. See Book telecom, Chapters 15-16.

Key Takeaway

The Gaussian maximizes differential entropy under a variance constraint. This single theorem implies: (1) Gaussian noise is worst-case for communication, (2) Gaussian inputs achieve capacity for Gaussian channels, and (3) the AWGN capacity is $C = \frac{1}{2}\log(1 + \text{SNR})$ . Whenever you see $\log(1 + \text{SNR})$ , this theorem is at work.

⚠️Engineering Note

The Gap Between Theory and Practice

The AWGN capacity $\frac{1}{2}\log(1 + \text{SNR})$ is an asymptotic limit — achievable only in the limit of infinite blocklength. Practical systems operate at finite blocklength and incur a coding gap:

Turbo codes (1993): within ~0.5 dB of capacity at BER $10^{-5}$
LDPC codes (Gallager 1960, rediscovered 1990s): within ~0.1 dB
Polar codes (Arıkan 2009): provably achieve capacity, but convergence is slow for short blocklengths

The gap decreases approximately as $\sqrt{V/n}$ where $V$ is the channel dispersion and $n$ is the blocklength (Chapter 26, finite-blocklength theory).

Practical Constraints

•
Finite blocklength incurs a rate penalty proportional to $1/\sqrt{n}$
•
Practical decoders have complexity constraints limiting blocklength
•
Latency requirements upper-bound the blocklength in real-time systems

Quick Check

Among all additive noise distributions $Z$ with variance $\sigma^2$ (independent of input $X$ ), which minimizes the channel capacity?

Gaussian noise

Uniform noise

Laplace noise

The noise distribution does not affect capacity

Correction:

Gaussian noise

Gaussian noise maximizes $h(Z)$ for a given variance, which maximizes $h(Z)$ in $I(X;Y) = h(Y) - h(Z)$ ... Actually, the story is more subtle: Gaussian noise minimizes capacity because it maximizes the noise entropy AND it forces the optimal $Y$ to also be Gaussian. The full argument uses the EPI (Section 2.4).

Gaussian Maximizes Entropy

Compares several distributions (Gaussian, uniform, Laplace) with the same variance, showing their PDFs and differential entropies. The Gaussian always has the highest entropy.

AWGN Channel Capacity Derivation

Step-by-step derivation of

C = \frac{1}{2}\log(1 + \text{SNR})

from the Gaussian maximum entropy theorem. Shows each step from

Y = X + Z

to the final capacity formula.

The Gaussian Maximizes Entropy

The Most Important Fact for Gaussian Channels

Theorem: Gaussian Maximizes Differential Entropy Under Variance Constraint

Write as KL divergence

Evaluate the cross-entropy term

Conclude

Gaussian Noise Is the Worst-Case Noise

Example: From Maximum Entropy to the AWGN Capacity Formula

Mutual information

Bound on $\ntn{hd}(Y)$

Capacity

Gaussian vs Other Distributions: Entropy Comparison

Parameters

Why This Matters: The AWGN Capacity Formula in Wireless Systems

Key Takeaway

The Gap Between Theory and Practice

Quick Check

Gaussian Maximizes Entropy

AWGN Channel Capacity Derivation