The Multivariate Gaussian Distribution

Why the Multivariate Gaussian Is Central

The multivariate Gaussian is the single most important distribution in engineering. There are at least three reasons. First, the Central Limit Theorem (Chapter 11) guarantees that the sum of many small independent effects is approximately Gaussian — and thermal noise, aggregate interference, and quantization error are all such sums. Second, the Gaussian is the maximum-entropy distribution for a given mean and covariance — so when we know only these two statistics, the Gaussian is the most "conservative" (least committal) model. Third, and most remarkably, the Gaussian family is closed under a rich set of operations: linear transformation, marginalization, and conditioning all produce Gaussian results. This closure makes the entire machinery of LMMSE estimation, Kalman filtering, and MIMO capacity analysis tractable.

Definition:

Multivariate Gaussian Distribution

A random vector XRn\mathbf{X} \in \mathbb{R}^n has the multivariate Gaussian (or normal) distribution with mean μ\boldsymbol{\mu} and covariance Σ0\boldsymbol{\Sigma} \succ 0, written XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}), if its joint PDF is

fX(x)=1(2π)n/2Σ1/2exp ⁣(12(xμ)TΣ1(xμ)),f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right),

where Σ=det(Σ)|\boldsymbol{\Sigma}| = \det(\boldsymbol{\Sigma}).

The matrix Σ1\boldsymbol{\Sigma}^{-1} is called the precision matrix or information matrix. It appears naturally in the exponent of the Gaussian PDF and plays a central role in graphical models, where a zero entry (Σ1)ij=0(\boldsymbol{\Sigma}^{-1})_{ij} = 0 indicates conditional independence of XiX_i and XjX_j given all other components.

Precision matrix

The inverse of the covariance matrix, Σ1\boldsymbol{\Sigma}^{-1}. For a multivariate Gaussian, a zero entry (Σ1)ij=0(\boldsymbol{\Sigma}^{-1})_{ij} = 0 means XiX_i and XjX_j are conditionally independent given all remaining variables.

Related: Covariance matrix

Multivariate Gaussian distribution

The distribution N(μ,Σ)\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}), fully parameterized by its mean vector and covariance matrix. Uniquely characterized among all distributions by the property that every linear combination of its components is a scalar Gaussian.

Related: Precision matrix, Covariance matrix

The Mahalanobis Distance

The exponent in the Gaussian PDF involves the quadratic form

d2(x)=(xμ)TΣ1(xμ),d^2(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}),

known as the Mahalanobis distance (squared) from x\mathbf{x} to μ\boldsymbol{\mu}. Contours of constant density are ellipsoids { x:d2(x)=c }\{\ \mathbf{x} : d^2(\mathbf{x}) = c\ \}. The shape and orientation of these ellipsoids are determined by the eigenvectors and eigenvalues of Σ\boldsymbol{\Sigma}: the principal axes lie along the eigenvectors, and the half-axis lengths are proportional to λi\sqrt{\lambda_i}.

2D Gaussian Contour Explorer

Explore how the correlation coefficient ρ\rho and the variances σ12,σ22\sigma_1^2, \sigma_2^2 shape the density contours of a bivariate Gaussian. The ellipses tilt as ρ\rho moves away from zero.

Parameters
0.5
1
1

Historical Note: Gauss, Bravais, and the Bivariate Normal

18th–20th century

The univariate Gaussian distribution was introduced by Abraham de Moivre in 1733 and later developed by Gauss in the context of astronomical error analysis (1809). The bivariate extension, with the correlation coefficient as a parameter, was studied systematically by Auguste Bravais (1846) and later by Francis Galton, who used it to model regression. The general nn-dimensional Gaussian was formalized in the early 20th century, but its full power was not appreciated until the development of multivariate statistics by Wishart, Hotelling, and Anderson in the 1930s–1950s.

Example: The Bivariate Gaussian PDF

Let (X1,X2)TN(0,Σ)(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) with

Σ=(1ρρ1),ρ<1.\boldsymbol{\Sigma} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}, \quad |\rho| < 1.

Write the joint PDF explicitly and identify the level curves.

Gaussian Contours as ρ\rho Varies

A Manim animation showing how the density contours of a bivariate Gaussian rotate and stretch as the correlation ρ\rho sweeps from 0.95-0.95 to 0.950.95.
Bivariate N(0,Σ)\mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) contours for varying correlation ρ\rho

Common Mistake: Singular Covariance Matrix

Mistake:

Assuming the multivariate Gaussian PDF always exists. When det(Σ)=0\det(\boldsymbol{\Sigma}) = 0, the distribution is supported on a lower-dimensional affine subspace and has no density with respect to Lebesgue measure on Rn\mathbb{R}^n.

Correction:

If rank(Σ)=m<n\operatorname{rank}(\boldsymbol{\Sigma}) = m < n, then nmn - m components of X\mathbf{X} are deterministic affine functions of the remaining mm. One can still define the Gaussian via its characteristic function (see Section 6). In practice, this occurs when measurements are linearly dependent — for instance, when an antenna array has redundant elements.

Key Takeaway

The multivariate Gaussian N(μ,Σ)\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) is completely specified by its mean vector and covariance matrix. No distribution with the same first two moments can have more entropy. This is why, in the absence of additional information, the Gaussian is the default model throughout engineering.

Quick Check

A random vector XR5\mathbf{X} \in \mathbb{R}^5 has the distribution N(μ,Σ)\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). How many free parameters does this distribution have?

5 (mean) + 25 (covariance) = 30

5 (mean) + 15 (covariance) = 20

5 (mean) + 10 (covariance) = 15

25 (the entire covariance matrix)