Marginals and Conditionals

Extracting Structure from the Joint

One of the most powerful features of the Gaussian family is that both marginals and conditionals remain Gaussian, with parameters given by simple formulas involving the block structure of Σ\boldsymbol{\Sigma}. The conditional mean formula μ12=μ1+Σ12Σ221(x2μ2)\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2) is the foundation of LMMSE estimation — and for Gaussians, the LMMSE estimator is the MMSE estimator.

Theorem: Marginals of a Gaussian Are Gaussian

Let XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) with the partition

X=(X1X2),μ=(μ1μ2),Σ=(Σ11Σ12Σ21Σ22).\mathbf{X} = \begin{pmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \end{pmatrix}, \quad \boldsymbol{\mu} = \begin{pmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{pmatrix}, \quad \boldsymbol{\Sigma} = \begin{pmatrix} \boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{pmatrix}.

Then the marginal distribution of X1\mathbf{X}_1 is

X1N(μ1,Σ11).\mathbf{X}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}).

Marginalizing (integrating out X2\mathbf{X}_2) simply "reads off" the relevant block of the mean and covariance. No Schur complement or matrix inversion is needed for marginals — only for conditionals.

Theorem: Conditional Distribution of a Gaussian Vector

With the same partition as above and Σ220\boldsymbol{\Sigma}_{22} \succ 0, the conditional distribution of X1\mathbf{X}_1 given X2=x2\mathbf{X}_2 = \mathbf{x}_2 is

X1X2=x2    N ⁣(μ12,  Σ12),\mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2 \;\sim\; \mathcal{N}\!\left(\boldsymbol{\mu}_{1|2},\; \boldsymbol{\Sigma}_{1|2}\right),

where

μ12=μ1+Σ12Σ221(x2μ2),\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),

Σ12=Σ11Σ12Σ221Σ21.\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}.

The matrix Σ12\boldsymbol{\Sigma}_{1|2} is the Schur complement of Σ22\boldsymbol{\Sigma}_{22} in Σ\boldsymbol{\Sigma}.

Two remarkable properties: (1) the conditional mean is an affine function of x2\mathbf{x}_2, and (2) the conditional covariance does not depend on x2\mathbf{x}_2. Property (1) means the MMSE estimator of X1\mathbf{X}_1 given X2\mathbf{X}_2 is linear — the general MMSE problem reduces to LMMSE in the Gaussian case. Property (2) means the estimation uncertainty is the same regardless of the observed value.

Schur complement

For a block matrix (ABCD)\begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix} with D\mathbf{D} invertible, the Schur complement of D\mathbf{D} is ABD1C\mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}. It governs the conditional covariance in the Gaussian conditional distribution.

Related: Covariance matrix

Example: Conditional Distribution for the Bivariate Gaussian

Let (X1,X2)TN(0,Σ)(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) with Σ=(1ρρ1)\boldsymbol{\Sigma} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}. Find the conditional distribution of X1X_1 given X2=x2X_2 = x_2.

Conditional Distribution Visualizer

Visualize the conditional PDF fX1X2(x1x2)f_{X_1|X_2}(x_1|x_2) for a bivariate Gaussian. Drag the slider to change the observed value x2x_2 and watch the conditional density shift while its width (determined by 1ρ21-\rho^2) stays constant.

Parameters
0.7
1.5

Common Mistake: The Conditional Variance Does Not Depend on the Observation

Mistake:

Expecting the conditional variance Σ12\boldsymbol{\Sigma}_{1|2} to change depending on the specific observed value x2\mathbf{x}_2.

Correction:

For Gaussian vectors, the conditional covariance Σ12=Σ11Σ12Σ221Σ21\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21} depends only on the covariance structure, not on the realized observation. This is a special property of the Gaussian — for most other distributions, the conditional variance does depend on the conditioning value.

⚠️Engineering Note

Computing the Schur Complement Efficiently

In practice, computing Σ12Σ221Σ21\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21} should never involve forming Σ221\boldsymbol{\Sigma}_{22}^{-1} explicitly. Instead, solve the linear system Σ22Z=Σ21\boldsymbol{\Sigma}_{22}\mathbf{Z} = \boldsymbol{\Sigma}_{21} (e.g., via Cholesky factorization of Σ22\boldsymbol{\Sigma}_{22}) and then compute Σ12Z\boldsymbol{\Sigma}_{12}\mathbf{Z}. This is numerically more stable and has complexity O(n23)O(n_2^3) rather than the O(n3)O(n^3) of inverting the full matrix.

Quick Check

For (X1,X2)TN(0,Σ)(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) with Σ=(4221)\boldsymbol{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 1 \end{pmatrix}, what is E[X1X2=3]\mathbb{E}[X_1 | X_2 = 3]?

66

33

1.51.5

22