Ferkans — Interactive Telecom Tutor

From Scalars to Vectors

In MIMO systems, we transmit and receive vectors — the channel input is $\mathbf{X} \in \mathbb{R}^n$ (or $\mathbb{C}^n$ ), the noise is a random vector, and the capacity depends on the covariance structure. Understanding differential entropy for random vectors is essential for the Gaussian vector channel (Chapter 10), the MIMO capacity (Book telecom, Ch. 15), and the entropy power inequality (Section 2.4).

Definition:
Multivariate Differential Entropy

Let $\mathbf{X} = (X_1, \ldots, X_n)^T$ be a continuous random vector with joint PDF $f_{\mathbf{X}}(\mathbf{x})$ . The joint differential entropy is

$h(\mathbf{X}) = -\int_{\mathbb{R}^n} f_{\mathbf{X}}(\mathbf{x}) \log f_{\mathbf{X}}(\mathbf{x}) \, d\mathbf{x}.$

The conditional differential entropy is $h(\mathbf{X}|\mathbf{Y}) = -\int f_{\mathbf{X},\mathbf{Y}} \log f_{\mathbf{X}|\mathbf{Y}} \, d\mathbf{x}\,d\mathbf{y}$ .

Theorem: Gaussian Vector Maximizes Entropy Under Covariance Constraint

Let $\mathbf{X} \in \mathbb{R}^n$ be a random vector with covariance matrix $\mathbf{K}_X = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T]$ . Then:

$h(\mathbf{X}) \leq \frac{1}{2}\log\bigl((2\pi e)^n \det(\mathbf{K}_X)\bigr),$

with equality if and only if $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K}_X)$ .

The determinant of the covariance matrix measures the "volume" of the uncertainty ellipsoid. The Gaussian spreads its probability as uniformly as possible over this ellipsoid, maximizing entropy. The factor $(2\pi e)^n$ is the "volume efficiency" of the Gaussian in $n$ dimensions.

Proof

Reduce to the scalar case

The proof follows the same KL divergence technique as the scalar case. Let $\phi$ be the PDF of $\mathcal{N}(\boldsymbol{\mu}, \mathbf{K}_X)$ :

$D(f \| \phi) = -h(\mathbf{X}) - \int f(\mathbf{x}) \log \phi(\mathbf{x})\,d\mathbf{x}.$

Evaluate the cross-entropy

$-\int f(\mathbf{x}) \log \phi(\mathbf{x})\,d\mathbf{x} = \frac{n}{2}\log(2\pi) + \frac{1}{2}\log\det(\mathbf{K}_X) + \frac{1}{2\ln 2}\text{tr}(\mathbf{K}_X^{-1} \mathbf{K}_X) = \frac{1}{2}\log((2\pi e)^n \det(\mathbf{K}_X)).$ $The key step uses$ \mathbb{E}[(\mathbf{X}-\boldsymbol{\mu})^T \mathbf{K}_X^{-1} (\mathbf{X}-\boldsymbol{\mu})] = \text{tr}(\mathbf{K}_X^{-1}\mathbf{K}_X) = n$.

Conclude

$D(f\|\phi) = -h(\mathbf{X}) + \frac{1}{2}\log((2\pi e)^n \det(\mathbf{K}_X)) \geq 0$ , giving the result.

Example: Entropy of a Bivariate Gaussian

Let $\mathbf{X} = (X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \mathbf{K})$ with $\mathbf{K} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}$ where $|\rho| < 1$ . Compute $h(\mathbf{X})$ , $h(X_1)$ , $h(X_2|X_1)$ , and $I(X_1; X_2)$ .

Solution

Joint entropy

$\det(\mathbf{K}) = 1 - \rho^2$ , so:

$h(\mathbf{X}) = \frac{1}{2}\log((2\pi e)^2 (1-\rho^2)) = \log(2\pi e) + \frac{1}{2}\log(1-\rho^2)$ .

Marginal entropy

$X_1 \sim \mathcal{N}(0,1)$ , so $h(X_1) = \frac{1}{2}\log(2\pi e)$ .

Conditional entropy

$h(X_2|X_1) = h(\mathbf{X}) - h(X_1) = \frac{1}{2}\log(2\pi e) + \frac{1}{2}\log(1-\rho^2) = \frac{1}{2}\log(2\pi e(1-\rho^2))$ .

This equals the differential entropy of $X_2|X_1 \sim \mathcal{N}(\rho X_1, 1-\rho^2)$ , which has conditional variance $1 - \rho^2$ .

Mutual information

$I(X_1; X_2) = h(X_2) - h(X_2|X_1) = \frac{1}{2}\log(2\pi e) - \frac{1}{2}\log(2\pi e(1-\rho^2)) = -\frac{1}{2}\log(1-\rho^2).$ $When$ \rho = 0 $:$ I = 0 $(independent). As$ |\rho| \to 1 $:$ I \to \infty$ (nearly deterministic relationship).

Mutual Information of Correlated Gaussians

Visualize how the mutual information $I(X_1; X_2) = -\frac{1}{2}\log(1 - \rho^2)$ grows as the correlation $\rho$ increases. The joint density contours flatten toward a line as $|\rho| \to 1$ , making $X_2$ increasingly predictable from $X_1$ .

Parameters

Correlation ρ0.5

Correlation coefficient between X₁ and X₂

Theorem: Hadamard's Inequality via Entropy

For any random vector $\mathbf{X} = (X_1, \ldots, X_n)^T$ with covariance matrix $\mathbf{K}_X$ :

$h(\mathbf{X}) \leq \sum_{i=1}^{n} h(X_i),$

with equality iff $X_1, \ldots, X_n$ are independent. For Gaussian vectors, this implies Hadamard's inequality:

$\det(\mathbf{K}_X) \leq \prod_{i=1}^{n} K_{ii}.$

Independence maximizes joint entropy for given marginals. For Gaussian vectors, this translates into the determinant inequality: the product of diagonal entries (variances) upper-bounds the determinant. Equality holds when the covariance matrix is diagonal.

Proof

Chain rule and conditioning

$h(\mathbf{X}) = \sum_{i=1}^n h(X_i | X_1, \ldots, X_{i-1}) \leq \sum_{i=1}^n h(X_i)$ ,

where the inequality uses "conditioning reduces differential entropy."

Gaussian specialization

For $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{K}_X)$ :

$\frac{1}{2}\log((2\pi e)^n \det(\mathbf{K}_X)) \leq \sum_i \frac{1}{2}\log(2\pi e K_{ii})$

$\Rightarrow \det(\mathbf{K}_X) \leq \prod_i K_{ii}$ .

Covariance matrix

For a random vector $\mathbf{X}$ with mean $\boldsymbol{\mu}$ : $\boldsymbol{\Sigma}_{X} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T]$ . Always positive semidefinite. For Gaussian vectors, the covariance matrix (together with the mean) fully determines the distribution.

Related: Differential entropy

Entropy power

For a random variable $X$ in $\mathbb{R}^n$ : $N(X) = \frac{1}{2\pi e} 2^{2h(X)/n}$ . The entropy power of a Gaussian is its variance. The entropy power inequality states that $N(X+Y) \geq N(X) + N(Y)$ for independent $X, Y$ .

Related: Differential entropy

Discrete vs Continuous Information Measures

Property	Discrete ( $H$ )	Continuous ( $h$ )
Non-negativity	$H(X) \geq 0$ always	$h(X)$ can be negative
Maximum entropy	Uniform on $\mathcal{X}$ : $\log\|\mathcal{X}\|$	Gaussian with variance $\sigma^2$ : $\frac{1}{2}\log(2\pi e\sigma^2)$
Coordinate invariance	Yes (depends only on PMF)	No (changes under coordinate transforms)
MI well-defined?	Yes, $I(X;Y) \geq 0$	Yes, $I(X;Y) = h(X) - h(X\|Y) \geq 0$
Operational meaning	Minimum avg. description length	No direct operational meaning alone

Quick Check

For a Gaussian vector $\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \mathbf{K})$ , what happens to $h(\mathbf{X})$ when we double $\mathbf{K}$ (i.e., replace $\mathbf{K}$ by $2\mathbf{K}$ )?

$h$ increases by $\frac{n}{2}\log 2 = \frac{n}{2}$ bit

$h$ doubles

$h$ increases by $\frac{1}{2}\log 2$ bit

$h$ stays the same

Correction:

h

increases by

\frac{n}{2}\log 2 = \frac{n}{2}

bit

$h$ changes by $\frac{1}{2}\log\det(2\mathbf{K}) - \frac{1}{2}\log\det(\mathbf{K}) = \frac{1}{2}\log(2^n) = \frac{n}{2}$ bits. Doubling the covariance in $n$ dimensions adds $n/2$ bits of entropy.

Multivariate Differential Entropy