Ferkans — Interactive Telecom Tutor

Why the Multivariate Gaussian Is Central

The multivariate Gaussian is the single most important distribution in engineering. There are at least three reasons. First, the Central Limit Theorem (Chapter 11) guarantees that the sum of many small independent effects is approximately Gaussian — and thermal noise, aggregate interference, and quantization error are all such sums. Second, the Gaussian is the maximum-entropy distribution for a given mean and covariance — so when we know only these two statistics, the Gaussian is the most "conservative" (least committal) model. Third, and most remarkably, the Gaussian family is closed under a rich set of operations: linear transformation, marginalization, and conditioning all produce Gaussian results. This closure makes the entire machinery of LMMSE estimation, Kalman filtering, and MIMO capacity analysis tractable.

Definition:
Multivariate Gaussian Distribution

A random vector $\mathbf{X} \in \mathbb{R}^n$ has the multivariate Gaussian (or normal) distribution with mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Sigma} \succ 0$ , written $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ , if its joint PDF is

$f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right),$

where $|\boldsymbol{\Sigma}| = \det(\boldsymbol{\Sigma})$ .

The matrix $\boldsymbol{\Sigma}^{-1}$ is called the precision matrix or information matrix. It appears naturally in the exponent of the Gaussian PDF and plays a central role in graphical models, where a zero entry $(\boldsymbol{\Sigma}^{-1})_{ij} = 0$ indicates conditional independence of $X_i$ and $X_j$ given all other components.

Precision matrix

The inverse of the covariance matrix, $\boldsymbol{\Sigma}^{-1}$ . For a multivariate Gaussian, a zero entry $(\boldsymbol{\Sigma}^{-1})_{ij} = 0$ means $X_i$ and $X_j$ are conditionally independent given all remaining variables.

Related: Covariance matrix

Multivariate Gaussian distribution

The distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ , fully parameterized by its mean vector and covariance matrix. Uniquely characterized among all distributions by the property that every linear combination of its components is a scalar Gaussian.

The Mahalanobis Distance

The exponent in the Gaussian PDF involves the quadratic form

$d^2(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}),$

known as the Mahalanobis distance (squared) from $\mathbf{x}$ to $\boldsymbol{\mu}$ . Contours of constant density are ellipsoids $\{\ \mathbf{x} : d^2(\mathbf{x}) = c\ \}$ . The shape and orientation of these ellipsoids are determined by the eigenvectors and eigenvalues of $\boldsymbol{\Sigma}$ : the principal axes lie along the eigenvectors, and the half-axis lengths are proportional to $\sqrt{\lambda_i}$ .

2D Gaussian Contour Explorer

Explore how the correlation coefficient $\rho$ and the variances $\sigma_1^2, \sigma_2^2$ shape the density contours of a bivariate Gaussian. The ellipses tilt as $\rho$ moves away from zero.

Parameters

\rho

(correlation)0.5

\sigma_1

1

\sigma_2

1

Historical Note: Gauss, Bravais, and the Bivariate Normal

18th–20th century

The univariate Gaussian distribution was introduced by Abraham de Moivre in 1733 and later developed by Gauss in the context of astronomical error analysis (1809). The bivariate extension, with the correlation coefficient as a parameter, was studied systematically by Auguste Bravais (1846) and later by Francis Galton, who used it to model regression. The general $n$ -dimensional Gaussian was formalized in the early 20th century, but its full power was not appreciated until the development of multivariate statistics by Wishart, Hotelling, and Anderson in the 1930s–1950s.

Example: The Bivariate Gaussian PDF

Let $(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ with

$\boldsymbol{\Sigma} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}, \quad |\rho| < 1.$

Write the joint PDF explicitly and identify the level curves.

Solution

Compute the precision matrix

$\boldsymbol{\Sigma}^{-1} = \frac{1}{1 - \rho^2} \begin{pmatrix} 1 & -\rho \\ -\rho & 1 \end{pmatrix}, \quad |\boldsymbol{\Sigma}| = 1 - \rho^2.$ $

Write the PDF

$f(x_1, x_2) = \frac{1}{2\pi\sqrt{1-\rho^2}} \exp\!\left(-\frac{x_1^2 - 2\rho x_1 x_2 + x_2^2}{2(1-\rho^2)}\right).$ $

Identify level curves

Setting the exponent equal to $-c/2$ gives the ellipse $x_1^2 - 2\rho x_1 x_2 + x_2^2 = c(1-\rho^2)$ . When $\rho = 0$ , the ellipses become circles; when $\rho > 0$ , the major axis tilts toward the line $x_1 = x_2$ .

Gaussian Contours as $\rho$ Varies

A Manim animation showing how the density contours of a bivariate Gaussian rotate and stretch as the correlation

\rho

sweeps from

-0.95

to

0.95

.

Bivariate

\mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})

contours for varying correlation

\rho

Common Mistake: Singular Covariance Matrix

Mistake:

Assuming the multivariate Gaussian PDF always exists. When $\det(\boldsymbol{\Sigma}) = 0$ , the distribution is supported on a lower-dimensional affine subspace and has no density with respect to Lebesgue measure on $\mathbb{R}^n$ .

Correction:

If $\operatorname{rank}(\boldsymbol{\Sigma}) = m < n$ , then $n - m$ components of $\mathbf{X}$ are deterministic affine functions of the remaining $m$ . One can still define the Gaussian via its characteristic function (see Section 6). In practice, this occurs when measurements are linearly dependent — for instance, when an antenna array has redundant elements.

Key Takeaway

The multivariate Gaussian $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ is completely specified by its mean vector and covariance matrix. No distribution with the same first two moments can have more entropy. This is why, in the absence of additional information, the Gaussian is the default model throughout engineering.

Quick Check

A random vector $\mathbf{X} \in \mathbb{R}^5$ has the distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ . How many free parameters does this distribution have?

5 (mean) + 25 (covariance) = 30

5 (mean) + 15 (covariance) = 20

5 (mean) + 10 (covariance) = 15

25 (the entire covariance matrix)