Uncorrelated Implies Independent

The Gaussian Miracle

In general, uncorrelation (Cov(Xi,Xj)=0\text{Cov}(X_i, X_j) = 0) is strictly weaker than independence. We saw counterexamples in Chapter 7: two random variables can have zero covariance yet remain strongly dependent. The multivariate Gaussian is the grand exception. For Gaussian vectors, uncorrelation and independence are equivalent. This is not just a curiosity — it is the structural property that makes Gaussian models so powerful, because it means that decorrelation (a second-order, linear operation) achieves full statistical independence.

Theorem: Uncorrelated Gaussian Components Are Independent

Let XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) with Σ=diag(σ12,,σn2)\boldsymbol{\Sigma} = \operatorname{diag}(\sigma_1^2, \ldots, \sigma_n^2) (diagonal covariance). Then X1,,XnX_1, \ldots, X_n are mutually independent, each with XiN(μi,σi2)X_i \sim \mathcal{N}(\mu_i, \sigma_i^2).

More generally, if Σ\boldsymbol{\Sigma} is block-diagonal with blocks corresponding to sub-vectors X1,,Xk\mathbf{X}_1, \ldots, \mathbf{X}_k, then these sub-vectors are mutually independent.

When Σ\boldsymbol{\Sigma} is diagonal, the quadratic form in the exponent separates: (xμ)TΣ1(xμ)=i(xiμi)2/σi2(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu}) = \sum_i (x_i - \mu_i)^2/\sigma_i^2. The joint PDF factors into a product of marginal PDFs, which is precisely the definition of independence.

The Converse Fails for Non-Gaussian Distributions

Consider XN(0,1)X \sim \mathcal{N}(0, 1) and Y=X2Y = X^2. Then Cov(X,Y)=E[X3]=0\text{Cov}(X, Y) = \mathbb{E}[X^3] = 0 (by symmetry), so XX and YY are uncorrelated. But YY is a deterministic function of XX — they are maximally dependent! The Gaussian is special precisely because its distribution is fully determined by second-order statistics.

Example: Decorrelation via Eigenrotation

Let XN(0,Σ)\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) with Σ=(3113)\boldsymbol{\Sigma} = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}. Find an orthogonal transformation Y=QTX\mathbf{Y} = \mathbf{Q}^T\mathbf{X} such that Y1Y_1 and Y2Y_2 are independent.

Decorrelation = Independence for Gaussians

A Manim animation showing a correlated 2D Gaussian cloud being rotated to its principal axes, where the components become independent. Contrast with a non-Gaussian distribution where the same rotation decorrelates but does not make the components independent.
Eigenrotation decorrelates Gaussian vectors (left), but not all distributions (right)

Key Takeaway

For jointly Gaussian vectors, uncorrelated \Longleftrightarrow independent. This is a uniquely Gaussian property. It means that PCA, whitening, and any linear decorrelation technique automatically achieves full statistical independence — but only under the Gaussian assumption.

Uncorrelated vs. Independent

PropertyGeneral distributionsGaussian
Independent \Rightarrow UncorrelatedYes (always)Yes (always)
Uncorrelated \Rightarrow IndependentNo (counterexample: XX, X2X^2)Yes (unique to Gaussian)
Decorrelation techniqueRemoves linear dependence onlyRemoves all dependence
Sufficient statisticsMean + covariance are not sufficientMean + covariance are sufficient
Practical implicationMust check higher-order momentsSecond-order analysis is complete

Common Mistake: Marginally Gaussian Does Not Imply Jointly Gaussian

Mistake:

Assuming that if X1X_1 and X2X_2 are each marginally Gaussian, then (X1,X2)(X_1, X_2) is jointly Gaussian.

Correction:

Marginal Gaussianity is necessary but not sufficient for joint Gaussianity. Counterexample: let X1N(0,1)X_1 \sim \mathcal{N}(0,1), Z{+1,1}Z \sim \{+1, -1\} uniformly, independent of X1X_1, and X2=ZX1X_2 = Z \cdot X_1. Then X2X_2 is also N(0,1)\mathcal{N}(0,1), but (X1,X2)(X_1, X_2) is not jointly Gaussian (the conditional X2X1=x1X_2 | X_1 = x_1 takes values ±x1\pm x_1, not a Gaussian). Joint Gaussianity requires that every linear combination is Gaussian.

Historical Note: Darmois and Skitovich: The Gaussian Uniqueness

1953

The Darmois–Skitovich theorem (1953) provides a deep converse: if X1,,XnX_1, \ldots, X_n are independent and the linear forms L1=aiXiL_1 = \sum a_i X_i and L2=biXiL_2 = \sum b_i X_i are also independent, then every XiX_i for which both ai0a_i \neq 0 and bi0b_i \neq 0 must be Gaussian. In other words, the Gaussian is the only distribution for which independence can be preserved under arbitrary linear combinations. This is a characterization theorem that places the Gaussian family in a unique position among all probability distributions.