Ferkans — Interactive Telecom Tutor

When the Gaussian Shortcut Works

The LMMSE estimator is a second-order object: it requires only means and covariances, and gives up everything the higher-order structure of the distribution could teach us. One would expect a big gap between LMMSE and the true MMSE. For the Gaussian distribution — and essentially only for it — this gap vanishes. The reason is that the conditional mean of a jointly Gaussian pair is already an affine function of the conditioning variable. No further nonlinearity can possibly help.

Theorem: Conditional Distribution of a Gaussian Pair

Suppose $\begin{bmatrix}\boldsymbol\theta \\ \mathbf{Y}\end{bmatrix} \sim \mathcal{N}\!\left( \begin{bmatrix}\mathbf{m}_\theta \\ \mathbf{m}_y\end{bmatrix}, \begin{bmatrix}\boldsymbol\Sigma_\theta & \boldsymbol\Sigma_{\theta y} \\ \boldsymbol\Sigma_{y\theta} & \boldsymbol\Sigma_y\end{bmatrix}\right)$ with $\boldsymbol\Sigma_y \succ 0$ . Then the conditional distribution of $\boldsymbol\theta$ given $\mathbf{Y} = \mathbf{y}$ is Gaussian with $\mathbb{E}[\boldsymbol\theta | \mathbf{Y}=\mathbf{y}] \;=\; \mathbf{m}_\theta + \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1} (\mathbf{y} - \mathbf{m}_y),$ $\text{Cov}(\boldsymbol\theta | \mathbf{Y}=\mathbf{y}) \;=\; \boldsymbol\Sigma_\theta - \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1} \boldsymbol\Sigma_{y\theta} \;=\;\boldsymbol\Sigma_{\theta|y} .$ Notably, the posterior covariance does not depend on the value $\mathbf{y}$ .

Gaussian densities are invariant under affine transformations and under conditioning. The joint density is a quadratic form in $(\boldsymbol\theta, \mathbf{y})$ ; completing the square in $\boldsymbol\theta$ at fixed $\mathbf{y}$ gives another Gaussian quadratic form — hence a Gaussian conditional density with affine mean.

Proof

Construct an uncorrelated pair

Let $\mathbf{A} = \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1}$ and define $\mathbf{U} = \boldsymbol\theta - \mathbf{m}_\theta - \mathbf{A}(\mathbf{Y} - \mathbf{m}_y)$ . Then $\mathbb{E}[\mathbf{U}(\mathbf{Y} - \mathbf{m}_y)^\top] = \boldsymbol\Sigma_{\theta y} - \mathbf{A}\boldsymbol\Sigma_y = \mathbf{0}$ .

Invoke Gaussian independence

$(\mathbf{U}, \mathbf{Y})$ is jointly Gaussian (an affine transformation of $(\boldsymbol\theta, \mathbf{Y})$ ) and uncorrelated, hence independent. This is a Gaussian-specific fact — uncorrelated non-Gaussian variables need not be independent.

Read off the conditional

Given $\mathbf{Y} = \mathbf{y}$ , the vector $\mathbf{U}$ is still $\mathcal{N}(\mathbf{0}, \text{Cov}(\mathbf{U}))$ by independence. Since $\boldsymbol\theta = \mathbf{m}_\theta + \mathbf{A}(\mathbf{Y} - \mathbf{m}_y) + \mathbf{U}$ , the conditional distribution of $\boldsymbol\theta$ is obtained by shifting $\mathbf{U}$ by the affine term, yielding the stated mean. Computing $\text{Cov}(\mathbf{U}) = \boldsymbol\Sigma_\theta - \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1}\boldsymbol\Sigma_{y\theta}$ gives the conditional covariance. $\blacksquare$

Theorem: MMSE = LMMSE for Jointly Gaussian Pairs

Let $(\boldsymbol\theta, \mathbf{Y})$ be jointly Gaussian with $\boldsymbol\Sigma_y \succ 0$ . Then $\hat\theta_{\text{MMSE}}(\mathbf{Y}) \;=\; \hat\theta_{\text{LMMSE}}(\mathbf{Y}) \;=\; \hat\theta_{\text{MAP}}(\mathbf{Y}).$ All three coincide with $\mathbf{m}_\theta + \boldsymbol\Sigma_{\theta y} \boldsymbol\Sigma_y^{-1}(\mathbf{Y} - \mathbf{m}_y)$ .

The conditional mean is affine (so MMSE ⊆ affine estimators ⇒ MMSE = LMMSE), and the conditional density is Gaussian and therefore unimodal with mode = mean (so MAP = MMSE). The affine conditional mean is a Gaussian signature.

Proof

MMSE is the conditional mean

By TMMSE Estimator Equals the Conditional Mean, $\hat\theta_{\text{MMSE}}(\mathbf{Y}) = \mathbb{E}[\boldsymbol\theta| \mathbf{Y}]$ .

Use the Gaussian conditional formula

By TConditional Distribution of a Gaussian Pair, $\mathbb{E}[\boldsymbol\theta| \mathbf{Y}] = \mathbf{m}_\theta + \boldsymbol\Sigma_{\theta y} \boldsymbol\Sigma_y^{-1}(\mathbf{Y} - \mathbf{m}_y)$ , which is the LMMSE formula.

Identify MAP

Since the posterior is Gaussian — symmetric and unimodal — its mode equals its mean, so the MAP estimator equals the conditional mean as well. $\blacksquare$

Key Takeaway

"Linear estimators are optimal" is a Gaussian statement, not a universal one. Whenever you see a linear receiver claimed as "optimal", look for an explicit or implicit Gaussian assumption. Outside the Gaussian world, the MMSE estimator is nonlinear and the linear version is strictly suboptimal.

Example: Complex Gaussian Signal in Gaussian Noise

Let $\mathbf{X} \sim \mathcal{CN}(\mathbf{0}, \boldsymbol\Sigma_x)$ and $\mathbf{Y} = \mathbf{H}\mathbf{X} + \mathbf{Z}$ with $\mathbf{Z} \sim \mathcal{CN}(\mathbf{0}, \boldsymbol\Sigma_z)$ independent of $\mathbf{X}$ . Compute $\hat{\mathbf{X}}_{\text{MMSE}}$ and the error covariance.

Solution

Jointly Gaussian

$(\mathbf{X}, \mathbf{Y})$ is jointly proper complex Gaussian (an affine transformation of a Gaussian pair). By TMMSE = LMMSE for Jointly Gaussian Pairs (Hermitian version), the MMSE estimator coincides with the LMMSE.

Apply the LMMSE formula

$\boldsymbol\Sigma_y = \mathbf{H}\boldsymbol\Sigma_x\mathbf{H}^H + \boldsymbol\Sigma_z$ and $\boldsymbol\Sigma_{xy} = \boldsymbol\Sigma_x \mathbf{H}^H$ . Therefore $\hat{\mathbf{X}}_{\text{MMSE}}(\mathbf{y}) = \boldsymbol\Sigma_x \mathbf{H}^H(\mathbf{H}\boldsymbol\Sigma_x\mathbf{H}^H + \boldsymbol\Sigma_z)^{-1}\mathbf{y} .$

Error covariance

$\boldsymbol\Sigma_{x|y} = \boldsymbol\Sigma_x - \boldsymbol\Sigma_x \mathbf{H}^H(\mathbf{H}\boldsymbol\Sigma_x\mathbf{H}^H + \boldsymbol\Sigma_z)^{-1}\mathbf{H}\boldsymbol\Sigma_x$ . Equivalently, by Woodbury, $\boldsymbol\Sigma_{x|y}^{-1} = \boldsymbol\Sigma_x^{-1} + \mathbf{H}^H\boldsymbol\Sigma_z^{-1}\mathbf{H}$ . $\blacksquare$

Joint and Conditional Gaussian Densities

The joint density of $(\theta, Y)$ as a contour plot, with the conditional density $f_{\theta|Y}(\cdot|y)$ overlaid on the vertical slice at $Y = y$ . Vary the correlation coefficient $\rho$ between $\theta$ and $Y$ to see how the conditional distribution narrows.

Parameters

Correlation

\rho

0.7

Observation

y

1

Why This Matters: LMMSE Receivers in Wireless Systems

LMMSE receivers are the backbone of every modern wireless standard. In MIMO detection, the LMMSE detector $\mathbf{G}_{\text{LMMSE}} = (\mathbf{H}^H\mathbf{H} + \sigma_w^2 \mathbf{I})^{-1}\mathbf{H}^H$ trades a small amount of bias for a large reduction in noise enhancement compared with zero-forcing. In OFDM channel estimation, MMSE interpolation across pilot subcarriers using the channel's frequency correlation yields substantial gains over simple least squares. In the massive MIMO uplink, the LMMSE combiner is the capacity-achieving linear receiver when the user symbols are Gaussian. In every case the theoretical justification comes from TMMSE = LMMSE for Jointly Gaussian Pairs: under the Gaussian signaling assumptions, the LMMSE is the true MMSE.

See full treatment in Chapter 11

Historical Note: Wiener, Kolmogorov, and the Birth of LMMSE

1941–1950

Norbert Wiener's classified MIT report Extrapolation, Interpolation, and Smoothing of Stationary Time Series (1942, declassified 1949) derived what we now call the LMMSE estimator in the infinite-dimensional setting of wide-sense stationary processes, motivated by the problem of predicting the trajectory of enemy aircraft for anti-aircraft fire control. Andrey Kolmogorov had obtained essentially the same result independently in 1941. The finite-dimensional matrix version presented here is the natural sampled-time specialization of the Wiener–Kolmogorov filter, and it underpins the Kalman filter (Chapter 10) as a recursive computation along the time axis.

Common Mistake: Marginal Gaussian ≠ Jointly Gaussian

Mistake:

"Both $\\theta$ and $Y$ are Gaussian, so the pair $(\\theta, Y)$ is jointly Gaussian and I can use $\\hat\\theta_{\\text{MMSE}} = \\hat\\theta_{\\text{LMMSE}}$ ."

Correction:

It is perfectly possible to have $\theta \sim \mathcal{N}(0,1)$ and $Y \sim \mathcal{N}(0,1)$ without the pair being jointly Gaussian. A classical counterexample: let $\theta \sim \mathcal{N}(0,1)$ and $Y = S\theta$ with $S \in \{+1,-1\}$ equiprobable and independent of $\theta$ . Both marginals are standard Gaussian, but $Y = \pm\theta$ is not jointly Gaussian with $\theta$ . The MMSE estimator $\mathbb{E}[\theta|Y] = 0$ (by symmetry) differs wildly from the LMMSE (which is zero too here, but in general this is not guaranteed). Always check joint Gaussianity before identifying MMSE with LMMSE.

Quick Check

Under which of the following conditions is the LMMSE estimator necessarily equal to the MMSE estimator?

$\theta$ and $Y$ are each Gaussian

$(\theta,Y)$ is jointly Gaussian

$\theta$ is Gaussian and the channel is linear

$Y$ is a linear function of $\theta$ plus noise of any distribution