The Entropy Power Inequality

A Deep Inequality with Powerful Consequences

The entropy power inequality (EPI) is one of the deeper results in information theory. It provides a lower bound on the entropy of the sum of independent random variables — and this bound is tight exactly when both variables are Gaussian. The EPI is the key tool for:

  • Proving that Gaussian noise is the worst-case noise (converse of the AWGN capacity theorem)
  • The converse of the Gaussian broadcast channel capacity
  • Bounds in the CEO problem and distributed compression
  • Connections to the Brunn-Minkowski inequality in geometry

The proof is non-trivial and relies on Fisher information and the de Bruijn identity. We state the result and sketch the main ideas.

Definition:

Entropy Power

The entropy power of a continuous random vector XRn\mathbf{X} \in \mathbb{R}^n is

N(X)=12πe22h(X)/n.N(\mathbf{X}) = \frac{1}{2\pi e} 2^{2h(\mathbf{X})/n}.

For a Gaussian vector XN(0,σ2In)\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n):

N(X)=12πe22n2log(2πeσ2)/n=σ2.N(\mathbf{X}) = \frac{1}{2\pi e} \cdot 2^{2 \cdot \frac{n}{2}\log(2\pi e \sigma^2)/n} = \sigma^2.

The entropy power of a Gaussian is its variance. For non-Gaussian distributions, the entropy power is the variance of a Gaussian with the same entropy.

Theorem: Entropy Power Inequality (EPI)

Let X,YRn\mathbf{X}, \mathbf{Y} \in \mathbb{R}^n be independent continuous random vectors with well-defined differential entropies. Then:

N(X+Y)N(X)+N(Y),N(\mathbf{X} + \mathbf{Y}) \geq N(\mathbf{X}) + N(\mathbf{Y}),

or equivalently:

22h(X+Y)/n22h(X)/n+22h(Y)/n.2^{2h(\mathbf{X}+\mathbf{Y})/n} \geq 2^{2h(\mathbf{X})/n} + 2^{2h(\mathbf{Y})/n}.

Equality holds if and only if X\mathbf{X} and Y\mathbf{Y} are Gaussian with proportional covariance matrices.

The EPI says that the entropy of a sum is "at least as large" as what you would get from summing two Gaussians with the same individual entropies. Adding independent random variables cannot produce less entropy than adding Gaussians. Intuitively, non-Gaussian distributions have more structure, and adding them creates more entropy than adding the equivalent Gaussians.

The EPI is the information-theoretic analogue of the Brunn-Minkowski inequality in convex geometry: Vol(A+B)1/nVol(A)1/n+Vol(B)1/n\text{Vol}(A + B)^{1/n} \geq \text{Vol}(A)^{1/n} + \text{Vol}(B)^{1/n}.

,

Example: EPI in the AWGN Capacity Converse

Use the EPI to prove that the capacity of the AWGN channel Y=X+ZY = X + Z with ZN(0,σ2)Z \sim \mathcal{N}(0, \sigma^2) and power constraint E[X2]P\mathbb{E}[X^2] \leq P is at most 12log(1+P/σ2)\frac{1}{2}\log(1 + P/\sigma^2).

Historical Note: The Long Road to Proving the EPI

1948-2006

Shannon stated the EPI in his 1948 paper but gave only an incomplete proof. A rigorous proof was first provided by Stam (1959) using Fisher information, later simplified by Blachman (1965). The connection to the Brunn-Minkowski inequality was made explicit by Costa (1985), who also proved the "EPI along a Gaussian channel" — that N(X+tZ)N(X + \sqrt{t}Z) is concave in tt.

The EPI remains an active research area. Extensions to non-independent random variables, discrete analogues, and connections to optimal transport are all subjects of current research. Verdú and Guo (2006) found an elegant connection between the EPI and the MMSE dimension — deepening the relationship between estimation theory and information theory.

Common Mistake: EPI Is Stronger Than the Variance Inequality

Mistake:

Assuming that Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) (for independent X,YX, Y) is equivalent to the EPI.

Correction:

The variance additivity Var(X+Y)=Var(X)+Var(Y)\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) is a weaker statement. The EPI says that entropy powers add: N(X+Y)N(X)+N(Y)N(X+Y) \geq N(X) + N(Y). Since N(X)Var(X)N(X) \leq \text{Var}(X) with equality only for Gaussians, the EPI implies N(X+Y)N(X)+N(Y)Var(X)+Var(Y)=Var(X+Y)N(X+Y) \geq N(X) + N(Y) \leq \text{Var}(X) + \text{Var}(Y) = \text{Var}(X+Y). The EPI captures the entropy structure, not just the second moment.

Fisher information

For a continuous RV XX with PDF ff: J(X)=E[(f(X)/f(X))2]=(f(x))2/f(x)dxJ(X) = \mathbb{E}[(f'(X)/f(X))^2] = \int (f'(x))^2/f(x)\,dx. Governs the rate at which differential entropy increases when Gaussian noise is added (de Bruijn's identity). Satisfies the Cramér-Rao bound: Var(θ^)1/J(θ)\text{Var}(\hat{\theta}) \geq 1/J(\theta).

Related: Entropy power, Differential entropy

🔧Engineering Note

EPI and Non-Gaussian Interference

In practical wireless systems, interference is often non-Gaussian (e.g., aggregate interference from many users, or impulsive noise). The EPI tells us that treating non-Gaussian interference as Gaussian (for the purpose of computing capacity bounds) is pessimistic — the actual capacity with non-Gaussian interference is at least as large as the Gaussian case.

This justifies the common engineering practice of modeling interference as Gaussian when computing capacity limits. The resulting bounds are conservative, which is safe for system design.

Practical Constraints
  • The Gaussian assumption is pessimistic for capacity — actual capacity may be higher

  • For error probability analysis, the Gaussian assumption may not be conservative

  • For very bursty/impulsive interference, the Gaussian model can be very loose

Entropy Power Inequality Visualization

Visualizes the superadditivity of entropy powers: N(X+Y)N(X)+N(Y)N(X+Y) \geq N(X) + N(Y) for independent X,YX, Y. Shows bar comparison and notes that equality holds only for Gaussian distributions.