The Linear MMSE Estimator

Why Restrict to Linear Estimators?

The MMSE estimator E[X∣Y]\mathbb{E}[X|Y] is optimal, but computing it requires knowing the full conditional density f(x∣y)f(x|y) β€” which is often unavailable or intractable. A pragmatic alternative: restrict the estimator to be a linear (really, affine) function of the observations.

The linear MMSE (LMMSE) estimator only requires the first and second moments: means, variances, and covariances. These are typically easier to estimate from data. The price is sub-optimality β€” but for jointly Gaussian data, there is no price at all: LMMSE = MMSE.

Definition:

Linear MMSE Estimator (Scalar Case)

The linear MMSE (LMMSE) estimator of XX given YY is

X^LMMSE=aβˆ—Y+bβˆ—\hat{X}_{\text{LMMSE}} = a^* Y + b^*

where (aβˆ—,bβˆ—)(a^*, b^*) minimize E[(Xβˆ’aYβˆ’b)2]\mathbb{E}[(X - aY - b)^2] over all a,b∈Ra, b \in \mathbb{R}.

The solution is

X^LMMSE=ΞΌX+Cov(X,Y)Var(Y)(Yβˆ’ΞΌY).\hat{X}_{\text{LMMSE}} = \mu_X + \frac{\text{Cov}(X,Y)}{\text{Var}(Y)}(Y - \mu_Y).

The LMMSE MSE is

MSELMMSE=Var(X)(1βˆ’ΟXY2)\text{MSE}_{\text{LMMSE}} = \text{Var}(X)\left(1 - \rho_{XY}^2\right)

where ρXY=Cov(X,Y)/(ΟƒXΟƒY)\rho_{XY} = \text{Cov}(X,Y)/(\sigma_X \sigma_Y) is the correlation coefficient.

The LMMSE estimator depends on YY only through first and second moments. No knowledge of the full distribution is needed.

,

Theorem: Derivation of the LMMSE Estimator

The optimal affine estimator X^=aY+b\hat{X} = aY + b minimizing E[(Xβˆ’aYβˆ’b)2]\mathbb{E}[(X - aY - b)^2] is given by

aβˆ—=Cov(X,Y)Var(Y),bβˆ—=ΞΌXβˆ’aβˆ—ΞΌY.a^* = \frac{\text{Cov}(X,Y)}{\text{Var}(Y)}, \qquad b^* = \mu_X - a^* \mu_Y.

Setting the gradient of the MSE to zero gives two linear equations (the normal equations). The slope aβˆ—a^* is the regression coefficient β€” exactly the slope of the least-squares regression line.

Definition:

LMMSE Estimator (Vector Case: Wiener-Hopf Equation)

For a random variable XX and observation vector Y∈Rm\mathbf{Y} \in \mathbb{R}^m, the LMMSE estimator is

X^LMMSE=ΞΌX+CXYCYYβˆ’1(Yβˆ’ΞΌY)\hat{X}_{\text{LMMSE}} = \mu_X + \mathbf{C}_{X\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})

where CXY=E[(Xβˆ’ΞΌX)(Yβˆ’ΞΌY)T]∈R1Γ—m\mathbf{C}_{X\mathbf{Y}} = \mathbb{E}[(X - \mu_X)(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})^\mathsf{T}] \in \mathbb{R}^{1 \times m} is the cross-covariance (row) vector and CYY\mathbf{C}_{\mathbf{Y}\mathbf{Y}} is the covariance matrix of Y\mathbf{Y}.

More generally, for a random vector X∈Rn\mathbf{X} \in \mathbb{R}^n:

X^LMMSE=ΞΌX+CXYCYYβˆ’1(Yβˆ’ΞΌY).\hat{\mathbf{X}}_{\text{LMMSE}} = \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{X}\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}}).

This is the Wiener-Hopf equation.

The matrix CXYCYYβˆ’1\mathbf{C}_{\mathbf{X}\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1} plays the role of the regression coefficient aβˆ—a^* from the scalar case.

,

Theorem: LMMSE Equals MMSE for Jointly Gaussian Random Variables

If (X,Y)(\mathbf{X}, \mathbf{Y}) is jointly Gaussian, then

X^LMMSE=E[X∣Y]=X^MMSE.\hat{\mathbf{X}}_{\text{LMMSE}} = \mathbb{E}[\mathbf{X}|\mathbf{Y}] = \hat{\mathbf{X}}_{\text{MMSE}}.

The LMMSE estimator is the MMSE estimator β€” restricting to linear functions costs nothing.

For jointly Gaussian variables, the conditional expectation E[X∣Y]\mathbb{E}[\mathbf{X}|\mathbf{Y}] is already a linear function of Y\mathbf{Y} (see Example (X,Y)(X,Y)" data-ref-type="example">EConditional Expectation for Jointly Gaussian (X,Y)(X,Y)). So the linear estimator class already contains the optimal estimator.

,

Example: LMMSE Channel Estimation

A channel coefficient H∼CN(0,ΟƒH2)H \sim \mathcal{CN}(0, \sigma_H^2) is observed through Y=Hβ‹…s+WY = H \cdot s + W where ss is a known pilot symbol and W∼CN(0,Οƒ2)W \sim \mathcal{CN}(0, \sigma^2) is independent noise. Find the LMMSE estimate of HH.

,

LMMSE MSE vs. SNR for Channel Estimation

Plot the LMMSE mean square error as a function of SNR for channel estimation. Compare with the LS estimator MSE and the prior variance (no observation).

Parameters
1

Channel variance

20

Upper end of SNR range

MMSE vs. LMMSE Estimator

PropertyMMSELMMSE
FormulaE[X∣Y]\mathbb{E}[X|Y]ΞΌX+CXYCYYβˆ’1(Yβˆ’ΞΌY)\mu_X + \mathbf{C}_{XY}\mathbf{C}_{YY}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_Y)
RequiresFull conditional densityFirst and second moments only
OptimalityBest among ALL estimatorsBest among LINEAR estimators
MSEE[Var(X∣Y)]\mathbb{E}[\text{Var}(X|Y)]Var(X)(1βˆ’Ο2)\text{Var}(X)(1 - \rho^2) (scalar)
Gaussian caseLinear (coincides with LMMSE)Equals MMSE
Non-Gaussian caseGenerally nonlinearSub-optimal but tractable

Common Mistake: LMMSE Requires Inversion of CYY\mathbf{C}_{YY}

Mistake:

Blindly applying the LMMSE formula CXYCYYβˆ’1\mathbf{C}_{XY}\mathbf{C}_{YY}^{-1} when CYY\mathbf{C}_{YY} is singular or ill-conditioned.

Correction:

If CYY\mathbf{C}_{YY} is singular, some observations are linearly dependent and can be removed. In practice, use the pseudo-inverse or regularize: CXY(CYY+Ο΅I)βˆ’1\mathbf{C}_{XY}(\mathbf{C}_{YY} + \epsilon \mathbf{I})^{-1}. For high-dimensional problems, Tikhonov regularization is standard.

Why This Matters: From LMMSE to Wiener Filtering and MIMO Detection

The LMMSE estimator is the finite-dimensional version of the Wiener filter. In MIMO detection, the received signal is y=Hx+w\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w} where x\mathbf{x} is the transmitted signal. The LMMSE detector is

x^LMMSE=CxxHH(HCxxHH+Οƒ2I)βˆ’1y.\hat{\mathbf{x}}_{\text{LMMSE}} = \mathbf{C}_{\mathbf{x}\mathbf{x}}\mathbf{H}^H(\mathbf{H}\mathbf{C}_{\mathbf{x}\mathbf{x}}\mathbf{H}^H + \sigma^2\mathbf{I})^{-1}\mathbf{y}.

This is used in every modern wireless receiver β€” from 4G LTE to 5G NR massive MIMO. It balances noise suppression against interference mitigation.

,

Historical Note: Norbert Wiener and the Birth of Optimal Filtering

1941-1949

The LMMSE framework traces back to Norbert Wiener's wartime work on anti-aircraft fire control (classified report, 1942; published as Extrapolation, Interpolation, and Smoothing of Stationary Time Series in 1949). Wiener posed the problem of predicting a signal corrupted by noise, and his solution β€” the Wiener filter β€” is the continuous-time version of the LMMSE estimator. Independently, Andrey Kolmogorov solved a similar problem in 1941. The discrete-time formulation as a matrix equation (the Wiener-Hopf equation) became a cornerstone of signal processing after the work of Levinson and Durbin in the 1940s-50s.

Quick Check

If XX and YY are uncorrelated (but not necessarily independent), what is the LMMSE estimate X^LMMSE\hat{X}_{\text{LMMSE}}?

X^LMMSE=0\hat{X}_{\text{LMMSE}} = 0

X^LMMSE=ΞΌX\hat{X}_{\text{LMMSE}} = \mu_X

X^LMMSE=E[X∣Y]\hat{X}_{\text{LMMSE}} = \mathbb{E}[X|Y]

X^LMMSE=Y\hat{X}_{\text{LMMSE}} = Y

LMMSE Estimation Algorithm

Complexity: O(m3)O(m^3) where m=dim⁑(Y)m = \dim(\mathbf{Y}), dominated by the matrix inversion
Input: Observations Y\mathbf{Y}, prior moments ΞΌX,ΞΌY,CXY,CYY\boldsymbol{\mu}_{\mathbf{X}}, \boldsymbol{\mu}_{\mathbf{Y}}, \mathbf{C}_{\mathbf{XY}}, \mathbf{C}_{\mathbf{YY}}
Output: LMMSE estimate X^LMMSE\hat{\mathbf{X}}_{\text{LMMSE}} and error covariance CX~\mathbf{C}_{\tilde{\mathbf{X}}}
1. Compute the LMMSE gain: K←CXYCYYβˆ’1\mathbf{K} \leftarrow \mathbf{C}_{\mathbf{XY}} \mathbf{C}_{\mathbf{YY}}^{-1}
2. Compute the estimate: X^←μX+K(Yβˆ’ΞΌY)\hat{\mathbf{X}} \leftarrow \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{K}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})
3. Compute the error covariance: CX~←CXXβˆ’CXYCYYβˆ’1CYX\mathbf{C}_{\tilde{\mathbf{X}}} \leftarrow \mathbf{C}_{\mathbf{XX}} - \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}\mathbf{C}_{\mathbf{YX}}
4. return X^,CX~\hat{\mathbf{X}}, \mathbf{C}_{\tilde{\mathbf{X}}}

In practice, solve CYYKT=CYX\mathbf{C}_{\mathbf{YY}} \mathbf{K}^\mathsf{T} = \mathbf{C}_{\mathbf{YX}} via Cholesky factorization rather than explicit inversion.

πŸŽ“CommIT Contribution(2021)

LMMSE Channel Estimation for Massive MIMO-OFDM

K. Ito, G. Caire β€” IEEE Trans. Wireless Communications, vol. 20, no. 12

Ito and Caire developed a structured LMMSE channel estimator for massive MIMO-OFDM that exploits the Kronecker structure of the time-frequency channel covariance. Their approach reduces the complexity from O(Nt2Nsub2)O(N_t^2 N_{\text{sub}}^2) to O(Nt2+Nsub2)O(N_t^2 + N_{\text{sub}}^2) by decomposing the problem into separate spatial and frequency-domain estimation steps. The key insight is that the LMMSE formula CHYCYYβˆ’1\mathbf{C}_{H Y}\mathbf{C}_{YY}^{-1} admits a Kronecker factorization when the channel has separable spatial-frequency statistics.

massive-mimochannel-estimationlmmseofdmView Paper β†’
πŸ”§Engineering Note

Computational Complexity of LMMSE in Practice

The LMMSE estimator requires inverting the mΓ—mm \times m observation covariance matrix CYY\mathbf{C}_{\mathbf{YY}}, costing O(m3)O(m^3). For massive MIMO with hundreds of antennas, this can be prohibitive. Common strategies: (1) Exploit Toeplitz/block-Toeplitz structure (Wiener filtering in frequency domain via FFT). (2) Diagonal approximation: ignore off-diagonal covariance terms. (3) Reduced-rank methods: project onto the dominant eigenvectors of CYY\mathbf{C}_{\mathbf{YY}}. (4) Iterative methods (conjugate gradient) that avoid explicit inversion.

LMMSE (Linear Minimum Mean Square Error)

The affine estimator X^=aTY+b\hat{X} = \mathbf{a}^\mathsf{T}\mathbf{Y} + b that minimizes E[(Xβˆ’X^)2]\mathbb{E}[(X - \hat{X})^2]. Requires only first and second moments. Equals the MMSE estimator when the joint distribution is Gaussian.

Related: Minimum Mean Square Error (MMSE) Estimator, Wiener Filter

Wiener-Hopf Equation

The matrix equation X^=ΞΌX+CXYCYYβˆ’1(Yβˆ’ΞΌY)\hat{\mathbf{X}} = \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}}) that defines the LMMSE estimator. Named after Norbert Wiener and Eberhard Hopf.

Related: LMMSE Estimator (Vector Case: Wiener-Hopf Equation), Wiener Filter