Multivariate Differential Entropy
From Scalars to Vectors
In MIMO systems, we transmit and receive vectors — the channel input is (or ), the noise is a random vector, and the capacity depends on the covariance structure. Understanding differential entropy for random vectors is essential for the Gaussian vector channel (Chapter 10), the MIMO capacity (Book telecom, Ch. 15), and the entropy power inequality (Section 2.4).
Definition: Multivariate Differential Entropy
Multivariate Differential Entropy
Let be a continuous random vector with joint PDF . The joint differential entropy is
The conditional differential entropy is .
Theorem: Gaussian Vector Maximizes Entropy Under Covariance Constraint
Let be a random vector with covariance matrix . Then:
with equality if and only if .
The determinant of the covariance matrix measures the "volume" of the uncertainty ellipsoid. The Gaussian spreads its probability as uniformly as possible over this ellipsoid, maximizing entropy. The factor is the "volume efficiency" of the Gaussian in dimensions.
Reduce to the scalar case
The proof follows the same KL divergence technique as the scalar case. Let be the PDF of :
Evaluate the cross-entropy
\mathbb{E}[(\mathbf{X}-\boldsymbol{\mu})^T \mathbf{K}_X^{-1} (\mathbf{X}-\boldsymbol{\mu})] = \text{tr}(\mathbf{K}_X^{-1}\mathbf{K}_X) = n$.
Conclude
, giving the result.
Example: Entropy of a Bivariate Gaussian
Let with where . Compute , , , and .
Joint entropy
, so:
.
Marginal entropy
, so .
Conditional entropy
.
This equals the differential entropy of , which has conditional variance .
Mutual information
\rho = 0I = 0|\rho| \to 1I \to \infty$ (nearly deterministic relationship).
Mutual Information of Correlated Gaussians
Visualize how the mutual information grows as the correlation increases. The joint density contours flatten toward a line as , making increasingly predictable from .
Parameters
Correlation coefficient between X₁ and X₂
Theorem: Hadamard's Inequality via Entropy
For any random vector with covariance matrix :
with equality iff are independent. For Gaussian vectors, this implies Hadamard's inequality:
Independence maximizes joint entropy for given marginals. For Gaussian vectors, this translates into the determinant inequality: the product of diagonal entries (variances) upper-bounds the determinant. Equality holds when the covariance matrix is diagonal.
Chain rule and conditioning
,
where the inequality uses "conditioning reduces differential entropy."
Gaussian specialization
For :
.
Covariance matrix
For a random vector with mean : . Always positive semidefinite. For Gaussian vectors, the covariance matrix (together with the mean) fully determines the distribution.
Related: Differential entropy
Entropy power
For a random variable in : . The entropy power of a Gaussian is its variance. The entropy power inequality states that for independent .
Related: Differential entropy
Discrete vs Continuous Information Measures
| Property | Discrete () | Continuous () |
|---|---|---|
| Non-negativity | always | can be negative |
| Maximum entropy | Uniform on : | Gaussian with variance : |
| Coordinate invariance | Yes (depends only on PMF) | No (changes under coordinate transforms) |
| MI well-defined? | Yes, | Yes, |
| Operational meaning | Minimum avg. description length | No direct operational meaning alone |
Quick Check
For a Gaussian vector , what happens to when we double (i.e., replace by )?
increases by bit
doubles
increases by bit
stays the same
changes by bits. Doubling the covariance in dimensions adds bits of entropy.