The Jointly Gaussian Case

When the Gaussian Shortcut Works

The LMMSE estimator is a second-order object: it requires only means and covariances, and gives up everything the higher-order structure of the distribution could teach us. One would expect a big gap between LMMSE and the true MMSE. For the Gaussian distribution — and essentially only for it — this gap vanishes. The reason is that the conditional mean of a jointly Gaussian pair is already an affine function of the conditioning variable. No further nonlinearity can possibly help.

Theorem: Conditional Distribution of a Gaussian Pair

Suppose [θY]N ⁣([mθmy],[ΣθΣθyΣyθΣy])\begin{bmatrix}\boldsymbol\theta \\ \mathbf{Y}\end{bmatrix} \sim \mathcal{N}\!\left( \begin{bmatrix}\mathbf{m}_\theta \\ \mathbf{m}_y\end{bmatrix}, \begin{bmatrix}\boldsymbol\Sigma_\theta & \boldsymbol\Sigma_{\theta y} \\ \boldsymbol\Sigma_{y\theta} & \boldsymbol\Sigma_y\end{bmatrix}\right) with Σy0\boldsymbol\Sigma_y \succ 0. Then the conditional distribution of θ\boldsymbol\theta given Y=y\mathbf{Y} = \mathbf{y} is Gaussian with E[θY=y]  =  mθ+ΣθyΣy1(ymy),\mathbb{E}[\boldsymbol\theta | \mathbf{Y}=\mathbf{y}] \;=\; \mathbf{m}_\theta + \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1} (\mathbf{y} - \mathbf{m}_y), Cov(θY=y)  =  ΣθΣθyΣy1Σyθ  =  Σθy.\text{Cov}(\boldsymbol\theta | \mathbf{Y}=\mathbf{y}) \;=\; \boldsymbol\Sigma_\theta - \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1} \boldsymbol\Sigma_{y\theta} \;=\;\boldsymbol\Sigma_{\theta|y} . Notably, the posterior covariance does not depend on the value y\mathbf{y}.

Gaussian densities are invariant under affine transformations and under conditioning. The joint density is a quadratic form in (θ,y)(\boldsymbol\theta, \mathbf{y}); completing the square in θ\boldsymbol\theta at fixed y\mathbf{y} gives another Gaussian quadratic form — hence a Gaussian conditional density with affine mean.

Theorem: MMSE = LMMSE for Jointly Gaussian Pairs

Let (θ,Y)(\boldsymbol\theta, \mathbf{Y}) be jointly Gaussian with Σy0\boldsymbol\Sigma_y \succ 0. Then θ^MMSE(Y)  =  θ^LMMSE(Y)  =  θ^MAP(Y).\hat\theta_{\text{MMSE}}(\mathbf{Y}) \;=\; \hat\theta_{\text{LMMSE}}(\mathbf{Y}) \;=\; \hat\theta_{\text{MAP}}(\mathbf{Y}). All three coincide with mθ+ΣθyΣy1(Ymy)\mathbf{m}_\theta + \boldsymbol\Sigma_{\theta y} \boldsymbol\Sigma_y^{-1}(\mathbf{Y} - \mathbf{m}_y).

The conditional mean is affine (so MMSE ⊆ affine estimators ⇒ MMSE = LMMSE), and the conditional density is Gaussian and therefore unimodal with mode = mean (so MAP = MMSE). The affine conditional mean is a Gaussian signature.

Key Takeaway

"Linear estimators are optimal" is a Gaussian statement, not a universal one. Whenever you see a linear receiver claimed as "optimal", look for an explicit or implicit Gaussian assumption. Outside the Gaussian world, the MMSE estimator is nonlinear and the linear version is strictly suboptimal.

Example: Complex Gaussian Signal in Gaussian Noise

Let XCN(0,Σx)\mathbf{X} \sim \mathcal{CN}(\mathbf{0}, \boldsymbol\Sigma_x) and Y=HX+Z\mathbf{Y} = \mathbf{H}\mathbf{X} + \mathbf{Z} with ZCN(0,Σz)\mathbf{Z} \sim \mathcal{CN}(\mathbf{0}, \boldsymbol\Sigma_z) independent of X\mathbf{X}. Compute X^MMSE\hat{\mathbf{X}}_{\text{MMSE}} and the error covariance.

Joint and Conditional Gaussian Densities

The joint density of (θ,Y)(\theta, Y) as a contour plot, with the conditional density fθY(y)f_{\theta|Y}(\cdot|y) overlaid on the vertical slice at Y=yY = y. Vary the correlation coefficient ρ\rho between θ\theta and YY to see how the conditional distribution narrows.

Parameters
0.7
1

Why This Matters: LMMSE Receivers in Wireless Systems

LMMSE receivers are the backbone of every modern wireless standard. In MIMO detection, the LMMSE detector GLMMSE=(HHH+σw2I)1HH\mathbf{G}_{\text{LMMSE}} = (\mathbf{H}^H\mathbf{H} + \sigma_w^2 \mathbf{I})^{-1}\mathbf{H}^H trades a small amount of bias for a large reduction in noise enhancement compared with zero-forcing. In OFDM channel estimation, MMSE interpolation across pilot subcarriers using the channel's frequency correlation yields substantial gains over simple least squares. In the massive MIMO uplink, the LMMSE combiner is the capacity-achieving linear receiver when the user symbols are Gaussian. In every case the theoretical justification comes from TMMSE = LMMSE for Jointly Gaussian Pairs: under the Gaussian signaling assumptions, the LMMSE is the true MMSE.

See full treatment in Chapter 11

Historical Note: Wiener, Kolmogorov, and the Birth of LMMSE

1941–1950

Norbert Wiener's classified MIT report Extrapolation, Interpolation, and Smoothing of Stationary Time Series (1942, declassified 1949) derived what we now call the LMMSE estimator in the infinite-dimensional setting of wide-sense stationary processes, motivated by the problem of predicting the trajectory of enemy aircraft for anti-aircraft fire control. Andrey Kolmogorov had obtained essentially the same result independently in 1941. The finite-dimensional matrix version presented here is the natural sampled-time specialization of the Wiener–Kolmogorov filter, and it underpins the Kalman filter (Chapter 10) as a recursive computation along the time axis.

Common Mistake: Marginal Gaussian ≠ Jointly Gaussian

Mistake:

"Both theta\\theta and YY are Gaussian, so the pair (theta,Y)(\\theta, Y) is jointly Gaussian and I can use hatthetatextMMSE=hatthetatextLMMSE\\hat\\theta_{\\text{MMSE}} = \\hat\\theta_{\\text{LMMSE}}."

Correction:

It is perfectly possible to have θN(0,1)\theta \sim \mathcal{N}(0,1) and YN(0,1)Y \sim \mathcal{N}(0,1) without the pair being jointly Gaussian. A classical counterexample: let θN(0,1)\theta \sim \mathcal{N}(0,1) and Y=SθY = S\theta with S{+1,1}S \in \{+1,-1\} equiprobable and independent of θ\theta. Both marginals are standard Gaussian, but Y=±θY = \pm\theta is not jointly Gaussian with θ\theta. The MMSE estimator E[θY]=0\mathbb{E}[\theta|Y] = 0 (by symmetry) differs wildly from the LMMSE (which is zero too here, but in general this is not guaranteed). Always check joint Gaussianity before identifying MMSE with LMMSE.

Quick Check

Under which of the following conditions is the LMMSE estimator necessarily equal to the MMSE estimator?

θ\theta and YY are each Gaussian

(θ,Y)(\theta,Y) is jointly Gaussian

θ\theta is Gaussian and the channel is linear

YY is a linear function of θ\theta plus noise of any distribution