Ferkans — Interactive Telecom Tutor

Extracting Structure from the Joint

One of the most powerful features of the Gaussian family is that both marginals and conditionals remain Gaussian, with parameters given by simple formulas involving the block structure of $\boldsymbol{\Sigma}$ . The conditional mean formula $\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2)$ is the foundation of LMMSE estimation — and for Gaussians, the LMMSE estimator is the MMSE estimator.

Theorem: Marginals of a Gaussian Are Gaussian

Let $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ with the partition

$\mathbf{X} = \begin{pmatrix} \mathbf{X}_1 \\ \mathbf{X}_2 \end{pmatrix}, \quad \boldsymbol{\mu} = \begin{pmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{pmatrix}, \quad \boldsymbol{\Sigma} = \begin{pmatrix} \boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{pmatrix}.$

Then the marginal distribution of $\mathbf{X}_1$ is

$\mathbf{X}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}).$

Marginalizing (integrating out $\mathbf{X}_2$ ) simply "reads off" the relevant block of the mean and covariance. No Schur complement or matrix inversion is needed for marginals — only for conditionals.

Proof

Integrate out $\mathbf{X}_2$

Write the joint PDF and integrate over $\mathbf{x}_2$ . Because the exponent is quadratic in $\mathbf{x}_2$ , the integral is a Gaussian integral that evaluates to a constant times the marginal Gaussian PDF.

Alternative via characteristic function

Set $\boldsymbol{\omega}_2 = \mathbf{0}$ in the joint characteristic function $\phi_{\mathbf{X}}(\boldsymbol{\omega}) = \exp(j\boldsymbol{\omega}^T \boldsymbol{\mu} - \tfrac{1}{2}\boldsymbol{\omega}^T \boldsymbol{\Sigma} \boldsymbol{\omega})$ . This gives $\phi_{\mathbf{X}_1}(\boldsymbol{\omega}_1) = \exp(j\boldsymbol{\omega}_1^T \boldsymbol{\mu}_1 - \tfrac{1}{2}\boldsymbol{\omega}_1^T \boldsymbol{\Sigma}_{11} \boldsymbol{\omega}_1)$ , which is the CF of $\mathcal{N}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11})$ .

Theorem: Conditional Distribution of a Gaussian Vector

With the same partition as above and $\boldsymbol{\Sigma}_{22} \succ 0$ , the conditional distribution of $\mathbf{X}_1$ given $\mathbf{X}_2 = \mathbf{x}_2$ is

$\mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2 \;\sim\; \mathcal{N}\!\left(\boldsymbol{\mu}_{1|2},\; \boldsymbol{\Sigma}_{1|2}\right),$

where

$\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),$

$\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}.$

The matrix $\boldsymbol{\Sigma}_{1|2}$ is the Schur complement of $\boldsymbol{\Sigma}_{22}$ in $\boldsymbol{\Sigma}$ .

Two remarkable properties: (1) the conditional mean is an affine function of $\mathbf{x}_2$ , and (2) the conditional covariance does not depend on $\mathbf{x}_2$ . Property (1) means the MMSE estimator of $\mathbf{X}_1$ given $\mathbf{X}_2$ is linear — the general MMSE problem reduces to LMMSE in the Gaussian case. Property (2) means the estimation uncertainty is the same regardless of the observed value.

Proof

Start from $f_{\mathbf{X}_1|\mathbf{X}_2} = f_{\mathbf{X}_1, \mathbf{X}_2} / f_{\mathbf{X}_2}$

Write the joint PDF using the block partition of $\boldsymbol{\Sigma}^{-1}$ . Subtract the marginal PDF of $\mathbf{X}_2$ in the exponent.

Use the block matrix inversion identity

The inverse of $\boldsymbol{\Sigma}$ in block form yields

$(\boldsymbol{\Sigma}^{-1})_{11} = (\boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21})^{-1} = \boldsymbol{\Sigma}_{1|2}^{-1}.$

Complete the square

After substituting the block inverse, the quadratic form in $\mathbf{x}_1$ becomes

$(\mathbf{x}_1 - \boldsymbol{\mu}_{1|2})^T \boldsymbol{\Sigma}_{1|2}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_{1|2}),$

plus terms depending only on $\mathbf{x}_2$ . These cancel with $f_{\mathbf{X}_2}(\mathbf{x}_2)$ , leaving the Gaussian PDF with mean $\boldsymbol{\mu}_{1|2}$ and covariance $\boldsymbol{\Sigma}_{1|2}$ .

Schur complement

For a block matrix $\begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}$ with $\mathbf{D}$ invertible, the Schur complement of $\mathbf{D}$ is $\mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}$ . It governs the conditional covariance in the Gaussian conditional distribution.

Related: Covariance matrix

Example: Conditional Distribution for the Bivariate Gaussian

Let $(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ with $\boldsymbol{\Sigma} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}$ . Find the conditional distribution of $X_1$ given $X_2 = x_2$ .

Solution

Apply the conditional formulas

$\mu_{1|2} = 0 + \rho \cdot 1 \cdot (x_2 - 0) = \rho x_2,KATEXPLACEHOLDER0END\sigma^2_{1|2} = 1 - \rho^2.$ $So$ X_1 \mid X_2 = x_2 ;\sim; \mathcal{N}(\rho x_2,; 1 - \rho^2)$.

Interpretation

The conditional mean is a linear function of $x_2$ with slope $\rho$ — this is the regression line $\mathbb{E}[X_1 | X_2 = x_2] = \rho x_2$ . The conditional variance $1 - \rho^2$ is smaller than the marginal variance 1 whenever $\rho \neq 0$ : observing $X_2$ reduces our uncertainty about $X_1$ .

Conditional Distribution Visualizer

Visualize the conditional PDF $f_{X_1|X_2}(x_1|x_2)$ for a bivariate Gaussian. Drag the slider to change the observed value $x_2$ and watch the conditional density shift while its width (determined by $1-\rho^2$ ) stays constant.

Parameters

\rho

0.7

x_2

(observed)1.5

Common Mistake: The Conditional Variance Does Not Depend on the Observation

Mistake:

Expecting the conditional variance $\boldsymbol{\Sigma}_{1|2}$ to change depending on the specific observed value $\mathbf{x}_2$ .

Correction:

For Gaussian vectors, the conditional covariance $\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}$ depends only on the covariance structure, not on the realized observation. This is a special property of the Gaussian — for most other distributions, the conditional variance does depend on the conditioning value.

⚠️Engineering Note

Computing the Schur Complement Efficiently

In practice, computing $\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}$ should never involve forming $\boldsymbol{\Sigma}_{22}^{-1}$ explicitly. Instead, solve the linear system $\boldsymbol{\Sigma}_{22}\mathbf{Z} = \boldsymbol{\Sigma}_{21}$ (e.g., via Cholesky factorization of $\boldsymbol{\Sigma}_{22}$ ) and then compute $\boldsymbol{\Sigma}_{12}\mathbf{Z}$ . This is numerically more stable and has complexity $O(n_2^3)$ rather than the $O(n^3)$ of inverting the full matrix.

Quick Check

For $(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ with $\boldsymbol{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 1 \end{pmatrix}$ , what is $\mathbb{E}[X_1 | X_2 = 3]$ ?

$6$

$3$

$1.5$

$2$