Affine Transformations and the Whitening Transform

Why Affine Transformations Matter

In engineering, we constantly transform random vectors: a receive filter multiplies the observation by a matrix, a decoder adds a bias, a pre-whitening step decorrelates the input. The fundamental question is: if X\mathbf{X} is Gaussian, what can we say about Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}? The answer is that Y\mathbf{Y} is also Gaussian, and its mean and covariance follow directly from the affine map. This closure property is the reason Gaussian models are so tractable.

Theorem: Affine Transformations Preserve Gaussianity

Let XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) and let Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}, where ARm×n\mathbf{A} \in \mathbb{R}^{m \times n} and bRm\mathbf{b} \in \mathbb{R}^m. Then

YN ⁣(Aμ+b,  AΣAT).\mathbf{Y} \sim \mathcal{N}\!\left(\mathbf{A}\boldsymbol{\mu} + \mathbf{b},\; \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T\right).

A linear map rotates and stretches the Gaussian "cloud," while the translation b\mathbf{b} shifts the center. The covariance transforms as a quadratic form because it involves the outer product of (YE[Y])=A(Xμ)(\mathbf{Y} - \mathbb{E}[\mathbf{Y}]) = \mathbf{A}(\mathbf{X} - \boldsymbol{\mu}).

Every Linear Combination Is Scalar Gaussian

An immediate corollary: for any fixed aRn\mathbf{a} \in \mathbb{R}^n, the scalar Z=aTXZ = \mathbf{a}^T \mathbf{X} is distributed as N(aTμ,aTΣa)\mathcal{N}(\mathbf{a}^T \boldsymbol{\mu}, \mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a}).

In fact, this property characterizes the Gaussian: a random vector is Gaussian if and only if every linear combination of its components is a scalar Gaussian (or degenerate). This is the Cramér–Wold device applied to the Gaussian family.

Definition:

Whitening Transform

Let XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) with Σ0\boldsymbol{\Sigma} \succ 0. The whitening transform is

W=Σ1/2(Xμ),\mathbf{W} = \boldsymbol{\Sigma}^{-1/2}(\mathbf{X} - \boldsymbol{\mu}),

where Σ1/2\boldsymbol{\Sigma}^{-1/2} is any matrix satisfying Σ1/2(Σ1/2)T=Σ1\boldsymbol{\Sigma}^{-1/2}(\boldsymbol{\Sigma}^{-1/2})^T = \boldsymbol{\Sigma}^{-1}. By the affine transformation theorem,

WN(0,In).\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_n).

Common choices for Σ1/2\boldsymbol{\Sigma}^{-1/2}:

  • Λ1/2UT\boldsymbol{\Lambda}^{-1/2}\mathbf{U}^T (eigendecomposition),
  • L1\mathbf{L}^{-1} (inverse Cholesky factor, yielding a causal whitening filter).

The whitening transform is the multivariate analogue of standardization Z=(Xμ)/σZ = (X - \mu)/\sigma. The resulting W\mathbf{W} has independent standard Gaussian components.

Whitening transform

A linear transformation that maps a random vector with covariance Σ\boldsymbol{\Sigma} to one with covariance I\mathbf{I} (uncorrelated, unit-variance components). For Gaussians, whitening produces independent components.

Related: Covariance matrix

Whitening Transform Demo

Start with a correlated bivariate Gaussian and apply the whitening transform. The left panel shows the original elliptical contours; the right panel shows the whitened circular contours. Adjust ρ\rho to see how stronger correlation leads to more dramatic reshaping.

Parameters
0.8
500

Example: Whitening a Bivariate Gaussian

Let XN(0,Σ)\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) with Σ=(4224)\boldsymbol{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 4 \end{pmatrix}. Find the whitening transform using the eigendecomposition.

Definition:

Karhunen-Loeve Expansion

Let Σ=UΛUT\boldsymbol{\Sigma} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^T be the eigendecomposition with U=[u1,,un]\mathbf{U} = [\mathbf{u}_1, \ldots, \mathbf{u}_n] and Λ=diag(λ1,,λn)\boldsymbol{\Lambda} = \operatorname{diag}(\lambda_1, \ldots, \lambda_n). For XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}), define W=UT(Xμ)\mathbf{W} = \mathbf{U}^T(\mathbf{X} - \boldsymbol{\mu}). Then

X=μ+i=1nλiWiui,\mathbf{X} = \boldsymbol{\mu} + \sum_{i=1}^n \sqrt{\lambda_i}\, W_i\, \mathbf{u}_i,

where WiN(0,1)W_i \sim \mathcal{N}(0, 1) are independent. This is the Karhunen-Loeve (KL) expansion: X\mathbf{X} is a sum of deterministic eigenvector "modes" ui\mathbf{u}_i with independent random amplitudes λiWi\sqrt{\lambda_i}\,W_i.

The KL expansion is the probabilistic analogue of PCA (principal component analysis). The first mm terms capture the most variance, giving the best rank-mm approximation to X\mathbf{X} in the MSE sense.

Principal Axes of a 2D Gaussian

Visualize the eigendecomposition of Σ\boldsymbol{\Sigma}: the principal axes (eigenvectors) and their lengths (λi\sqrt{\lambda_i}). Rotate and scale the covariance by adjusting parameters.

Parameters
2
1
0.6

Common Mistake: The Whitening Transform Is Not Unique

Mistake:

Assuming there is a single whitening matrix for a given Σ\boldsymbol{\Sigma}.

Correction:

Any matrix B\mathbf{B} satisfying BΣBT=I\mathbf{B}\boldsymbol{\Sigma}\mathbf{B}^T = \mathbf{I} is a valid whitening transform. Two common choices are the eigendecomposition-based Λ1/2UT\boldsymbol{\Lambda}^{-1/2}\mathbf{U}^T and the Cholesky-based L1\mathbf{L}^{-1}. The Cholesky version is preferred in filtering because it is "causal" — WiW_i depends only on X1,,XiX_1, \ldots, X_i.

🔧Engineering Note

Whitening as a Pre-Processing Step

In detection and estimation, a common first step is to whiten the observation: given y=Hx+w\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w} with wN(0,Σw)\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}_{w}), multiply both sides by Σw1/2\boldsymbol{\Sigma}_{w}^{-1/2} to get y~=H~x+w~\tilde{\mathbf{y}} = \tilde{\mathbf{H}}\mathbf{x} + \tilde{\mathbf{w}} where w~N(0,I)\tilde{\mathbf{w}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}). This reduces colored-noise problems to the white-noise case, where standard matched filtering and LMMSE formulas apply directly.

Quick Check

If XN(0,I3)\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_3) and Y=(110011)X\mathbf{Y} = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}\mathbf{X}, what is ΣY\boldsymbol{\Sigma}_{\mathbf{Y}}?

(2112)\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

(1001)\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}

(2002)\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}

I3\mathbf{I}_3