The Entropy Power Inequality
A Deep Inequality with Powerful Consequences
The entropy power inequality (EPI) is one of the deeper results in information theory. It provides a lower bound on the entropy of the sum of independent random variables — and this bound is tight exactly when both variables are Gaussian. The EPI is the key tool for:
- Proving that Gaussian noise is the worst-case noise (converse of the AWGN capacity theorem)
- The converse of the Gaussian broadcast channel capacity
- Bounds in the CEO problem and distributed compression
- Connections to the Brunn-Minkowski inequality in geometry
The proof is non-trivial and relies on Fisher information and the de Bruijn identity. We state the result and sketch the main ideas.
Definition: Entropy Power
Entropy Power
The entropy power of a continuous random vector is
For a Gaussian vector :
The entropy power of a Gaussian is its variance. For non-Gaussian distributions, the entropy power is the variance of a Gaussian with the same entropy.
Theorem: Entropy Power Inequality (EPI)
Let be independent continuous random vectors with well-defined differential entropies. Then:
or equivalently:
Equality holds if and only if and are Gaussian with proportional covariance matrices.
The EPI says that the entropy of a sum is "at least as large" as what you would get from summing two Gaussians with the same individual entropies. Adding independent random variables cannot produce less entropy than adding Gaussians. Intuitively, non-Gaussian distributions have more structure, and adding them creates more entropy than adding the equivalent Gaussians.
The EPI is the information-theoretic analogue of the Brunn-Minkowski inequality in convex geometry: .
Key ingredients (proof sketch)
The proof uses three ingredients:
-
Fisher information: .
-
De Bruijn's identity: If is independent of : .
-
Fisher information inequality: For independent : (reciprocal Fisher info is superadditive).
Combining the ingredients
Consider and where are i.i.d. standard Gaussians. As increases, both and become "more Gaussian" (central limit theorem heuristic).
The Fisher information inequality, combined with de Bruijn's identity, shows that with equality in the limit . Taking gives the EPI.
Example: EPI in the AWGN Capacity Converse
Use the EPI to prove that the capacity of the AWGN channel with and power constraint is at most .
Bound via EPI
.
Also .
Mutual information bound
.
where we used (entropy power is at most the variance, with equality for Gaussians) and .
Conclude
.
The bound is achieved when , confirming .
Historical Note: The Long Road to Proving the EPI
1948-2006Shannon stated the EPI in his 1948 paper but gave only an incomplete proof. A rigorous proof was first provided by Stam (1959) using Fisher information, later simplified by Blachman (1965). The connection to the Brunn-Minkowski inequality was made explicit by Costa (1985), who also proved the "EPI along a Gaussian channel" — that is concave in .
The EPI remains an active research area. Extensions to non-independent random variables, discrete analogues, and connections to optimal transport are all subjects of current research. Verdú and Guo (2006) found an elegant connection between the EPI and the MMSE dimension — deepening the relationship between estimation theory and information theory.
Common Mistake: EPI Is Stronger Than the Variance Inequality
Mistake:
Assuming that (for independent ) is equivalent to the EPI.
Correction:
The variance additivity is a weaker statement. The EPI says that entropy powers add: . Since with equality only for Gaussians, the EPI implies . The EPI captures the entropy structure, not just the second moment.
Fisher information
For a continuous RV with PDF : . Governs the rate at which differential entropy increases when Gaussian noise is added (de Bruijn's identity). Satisfies the Cramér-Rao bound: .
Related: Entropy power, Differential entropy
EPI and Non-Gaussian Interference
In practical wireless systems, interference is often non-Gaussian (e.g., aggregate interference from many users, or impulsive noise). The EPI tells us that treating non-Gaussian interference as Gaussian (for the purpose of computing capacity bounds) is pessimistic — the actual capacity with non-Gaussian interference is at least as large as the Gaussian case.
This justifies the common engineering practice of modeling interference as Gaussian when computing capacity limits. The resulting bounds are conservative, which is safe for system design.
- •
The Gaussian assumption is pessimistic for capacity — actual capacity may be higher
- •
For error probability analysis, the Gaussian assumption may not be conservative
- •
For very bursty/impulsive interference, the Gaussian model can be very loose