Ferkans — Interactive Telecom Tutor

Why Affine Transformations Matter

In engineering, we constantly transform random vectors: a receive filter multiplies the observation by a matrix, a decoder adds a bias, a pre-whitening step decorrelates the input. The fundamental question is: if $\mathbf{X}$ is Gaussian, what can we say about $\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}$ ? The answer is that $\mathbf{Y}$ is also Gaussian, and its mean and covariance follow directly from the affine map. This closure property is the reason Gaussian models are so tractable.

Theorem: Affine Transformations Preserve Gaussianity

Let $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and let $\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}$ , where $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$ . Then

$\mathbf{Y} \sim \mathcal{N}\!\left(\mathbf{A}\boldsymbol{\mu} + \mathbf{b},\; \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T\right).$

A linear map rotates and stretches the Gaussian "cloud," while the translation $\mathbf{b}$ shifts the center. The covariance transforms as a quadratic form because it involves the outer product of $(\mathbf{Y} - \mathbb{E}[\mathbf{Y}]) = \mathbf{A}(\mathbf{X} - \boldsymbol{\mu})$ .

Proof

Compute the mean

$\mathbb{E}[\mathbf{Y}] = \mathbb{E}[\mathbf{A}\mathbf{X} + \mathbf{b}] = \mathbf{A}\boldsymbol{\mu} + \mathbf{b}$ .

Compute the covariance

$\boldsymbol{\Sigma}_{\mathbf{Y}} = \mathbb{E}[\mathbf{A}(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T \mathbf{A}^T] = \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T$ .

Gaussianity via characteristic function

The CF of $\mathbf{Y}$ is $\phi_{\mathbf{Y}}(\boldsymbol{\omega}) = e^{j\boldsymbol{\omega}^T \mathbf{b}} \phi_{\mathbf{X}}(\mathbf{A}^T\boldsymbol{\omega}) = \exp\!\left(j\boldsymbol{\omega}^T(\mathbf{A}\boldsymbol{\mu}+\mathbf{b}) - \tfrac{1}{2}\boldsymbol{\omega}^T \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T \boldsymbol{\omega}\right)$ , which is the CF of $\mathcal{N}(\mathbf{A}\boldsymbol{\mu}+\mathbf{b}, \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T)$ .

Every Linear Combination Is Scalar Gaussian

An immediate corollary: for any fixed $\mathbf{a} \in \mathbb{R}^n$ , the scalar $Z = \mathbf{a}^T \mathbf{X}$ is distributed as $\mathcal{N}(\mathbf{a}^T \boldsymbol{\mu}, \mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a})$ .

In fact, this property characterizes the Gaussian: a random vector is Gaussian if and only if every linear combination of its components is a scalar Gaussian (or degenerate). This is the Cramér–Wold device applied to the Gaussian family.

Definition:
Whitening Transform

Let $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ with $\boldsymbol{\Sigma} \succ 0$ . The whitening transform is

$\mathbf{W} = \boldsymbol{\Sigma}^{-1/2}(\mathbf{X} - \boldsymbol{\mu}),$

where $\boldsymbol{\Sigma}^{-1/2}$ is any matrix satisfying $\boldsymbol{\Sigma}^{-1/2}(\boldsymbol{\Sigma}^{-1/2})^T = \boldsymbol{\Sigma}^{-1}$ . By the affine transformation theorem,

$\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_n).$

Common choices for $\boldsymbol{\Sigma}^{-1/2}$ :

$\boldsymbol{\Lambda}^{-1/2}\mathbf{U}^T$ (eigendecomposition),
$\mathbf{L}^{-1}$ (inverse Cholesky factor, yielding a causal whitening filter).

The whitening transform is the multivariate analogue of standardization $Z = (X - \mu)/\sigma$ . The resulting $\mathbf{W}$ has independent standard Gaussian components.

Whitening transform

A linear transformation that maps a random vector with covariance $\boldsymbol{\Sigma}$ to one with covariance $\mathbf{I}$ (uncorrelated, unit-variance components). For Gaussians, whitening produces independent components.

Related: Covariance matrix

Whitening Transform Demo

Start with a correlated bivariate Gaussian and apply the whitening transform. The left panel shows the original elliptical contours; the right panel shows the whitened circular contours. Adjust $\rho$ to see how stronger correlation leads to more dramatic reshaping.

Parameters

\rho

0.8

Number of samples500

Example: Whitening a Bivariate Gaussian

Let $\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ with $\boldsymbol{\Sigma} = \begin{pmatrix} 4 & 2 \\ 2 & 4 \end{pmatrix}$ . Find the whitening transform using the eigendecomposition.

Solution

Eigendecompose $\ntn{covmat}$

The eigenvalues are $\lambda_1 = 6$ , $\lambda_2 = 2$ , with eigenvectors $\mathbf{u}_1 = \frac{1}{\sqrt{2}}(1, 1)^T$ and $\mathbf{u}_2 = \frac{1}{\sqrt{2}}(1, -1)^T$ .

Compute $\ntn{covmat}^{-1/2}$

$\boldsymbol{\Sigma}^{-1/2} = \boldsymbol{\Lambda}^{-1/2}\mathbf{U}^T = \begin{pmatrix} 1/\sqrt{6} & 0 \\ 0 & 1/\sqrt{2} \end{pmatrix} \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}.$ $

Verify

$\mathbf{W} = \boldsymbol{\Sigma}^{-1/2}\mathbf{X}$ has covariance $\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\Sigma}(\boldsymbol{\Sigma}^{-1/2})^T = \mathbf{I}_2$ . The whitened components $W_1, W_2$ are i.i.d. $\mathcal{N}(0, 1)$ .

Definition:
Karhunen-Loeve Expansion

Let $\boldsymbol{\Sigma} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^T$ be the eigendecomposition with $\mathbf{U} = [\mathbf{u}_1, \ldots, \mathbf{u}_n]$ and $\boldsymbol{\Lambda} = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$ . For $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ , define $\mathbf{W} = \mathbf{U}^T(\mathbf{X} - \boldsymbol{\mu})$ . Then

$\mathbf{X} = \boldsymbol{\mu} + \sum_{i=1}^n \sqrt{\lambda_i}\, W_i\, \mathbf{u}_i,$

where $W_i \sim \mathcal{N}(0, 1)$ are independent. This is the Karhunen-Loeve (KL) expansion: $\mathbf{X}$ is a sum of deterministic eigenvector "modes" $\mathbf{u}_i$ with independent random amplitudes $\sqrt{\lambda_i}\,W_i$ .

The KL expansion is the probabilistic analogue of PCA (principal component analysis). The first $m$ terms capture the most variance, giving the best rank- $m$ approximation to $\mathbf{X}$ in the MSE sense.

Principal Axes of a 2D Gaussian

Visualize the eigendecomposition of $\boldsymbol{\Sigma}$ : the principal axes (eigenvectors) and their lengths ( $\sqrt{\lambda_i}$ ). Rotate and scale the covariance by adjusting parameters.

Parameters

\sigma_1

2

\sigma_2

1

\rho

0.6

Common Mistake: The Whitening Transform Is Not Unique

Mistake:

Assuming there is a single whitening matrix for a given $\boldsymbol{\Sigma}$ .

Correction:

Any matrix $\mathbf{B}$ satisfying $\mathbf{B}\boldsymbol{\Sigma}\mathbf{B}^T = \mathbf{I}$ is a valid whitening transform. Two common choices are the eigendecomposition-based $\boldsymbol{\Lambda}^{-1/2}\mathbf{U}^T$ and the Cholesky-based $\mathbf{L}^{-1}$ . The Cholesky version is preferred in filtering because it is "causal" — $W_i$ depends only on $X_1, \ldots, X_i$ .

🔧Engineering Note

Whitening as a Pre-Processing Step

In detection and estimation, a common first step is to whiten the observation: given $\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}_{w})$ , multiply both sides by $\boldsymbol{\Sigma}_{w}^{-1/2}$ to get $\tilde{\mathbf{y}} = \tilde{\mathbf{H}}\mathbf{x} + \tilde{\mathbf{w}}$ where $\tilde{\mathbf{w}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . This reduces colored-noise problems to the white-noise case, where standard matched filtering and LMMSE formulas apply directly.

Quick Check

If $\mathbf{X} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_3)$ and $\mathbf{Y} = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}\mathbf{X}$ , what is $\boldsymbol{\Sigma}_{\mathbf{Y}}$ ?

$\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$

$\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$

$\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}$

$\mathbf{I}_3$

Correction:

\begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

$\boldsymbol{\Sigma}_{\mathbf{Y}} = \mathbf{A}\mathbf{I}\mathbf{A}^T = \mathbf{A}\mathbf{A}^T = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ .

Affine Transformations and the Whitening Transform

Why Affine Transformations Matter

Theorem: Affine Transformations Preserve Gaussianity

Compute the mean

Compute the covariance

Gaussianity via characteristic function

Every Linear Combination Is Scalar Gaussian

Definition: Whitening Transform

Whitening transform

Whitening Transform Demo

Parameters

Example: Whitening a Bivariate Gaussian

Eigendecompose $\ntn{covmat}$

Compute $\ntn{covmat}^{-1/2}$

Verify

Definition: Karhunen-Loeve Expansion

Principal Axes of a 2D Gaussian

Parameters

Common Mistake: The Whitening Transform Is Not Unique

Whitening as a Pre-Processing Step

Quick Check

Definition:
Whitening Transform

Definition:
Karhunen-Loeve Expansion