The Linear MMSE Estimator
Why Restrict to Linear Estimators?
The MMSE estimator is optimal, but computing it requires knowing the full conditional density β which is often unavailable or intractable. A pragmatic alternative: restrict the estimator to be a linear (really, affine) function of the observations.
The linear MMSE (LMMSE) estimator only requires the first and second moments: means, variances, and covariances. These are typically easier to estimate from data. The price is sub-optimality β but for jointly Gaussian data, there is no price at all: LMMSE = MMSE.
Definition: Linear MMSE Estimator (Scalar Case)
Linear MMSE Estimator (Scalar Case)
The linear MMSE (LMMSE) estimator of given is
where minimize over all .
The solution is
The LMMSE MSE is
where is the correlation coefficient.
The LMMSE estimator depends on only through first and second moments. No knowledge of the full distribution is needed.
Theorem: Derivation of the LMMSE Estimator
The optimal affine estimator minimizing is given by
Setting the gradient of the MSE to zero gives two linear equations (the normal equations). The slope is the regression coefficient β exactly the slope of the least-squares regression line.
Write the MSE
. Expanding:
Differentiate with respect to $b$
Substitute and differentiate with respect to $a$
Substituting , the MSE becomes a function of centered variables:
Differentiating:
Definition: LMMSE Estimator (Vector Case: Wiener-Hopf Equation)
LMMSE Estimator (Vector Case: Wiener-Hopf Equation)
For a random variable and observation vector , the LMMSE estimator is
where is the cross-covariance (row) vector and is the covariance matrix of .
More generally, for a random vector :
This is the Wiener-Hopf equation.
The matrix plays the role of the regression coefficient from the scalar case.
Theorem: LMMSE Equals MMSE for Jointly Gaussian Random Variables
If is jointly Gaussian, then
The LMMSE estimator is the MMSE estimator β restricting to linear functions costs nothing.
For jointly Gaussian variables, the conditional expectation is already a linear function of (see Example " data-ref-type="example">EConditional Expectation for Jointly Gaussian ). So the linear estimator class already contains the optimal estimator.
Recall the Gaussian conditional distribution
For ,
the conditional distribution is
.
Read off the conditional mean
which is exactly the LMMSE formula.
Example: LMMSE Channel Estimation
A channel coefficient is observed through where is a known pilot symbol and is independent noise. Find the LMMSE estimate of .
Identify the moments
, , , , .
Apply the LMMSE formula
\text{SNR} = |s|^2 \sigma_H^2 / \sigma^2$.
Interpret
At high SNR, (the LS estimate). At low SNR, (shrink toward the prior mean). The LMMSE interpolates smoothly between these extremes. The LMMSE MSE is .
LMMSE MSE vs. SNR for Channel Estimation
Plot the LMMSE mean square error as a function of SNR for channel estimation. Compare with the LS estimator MSE and the prior variance (no observation).
Parameters
Channel variance
Upper end of SNR range
MMSE vs. LMMSE Estimator
| Property | MMSE | LMMSE |
|---|---|---|
| Formula | ||
| Requires | Full conditional density | First and second moments only |
| Optimality | Best among ALL estimators | Best among LINEAR estimators |
| MSE | (scalar) | |
| Gaussian case | Linear (coincides with LMMSE) | Equals MMSE |
| Non-Gaussian case | Generally nonlinear | Sub-optimal but tractable |
Common Mistake: LMMSE Requires Inversion of
Mistake:
Blindly applying the LMMSE formula when is singular or ill-conditioned.
Correction:
If is singular, some observations are linearly dependent and can be removed. In practice, use the pseudo-inverse or regularize: . For high-dimensional problems, Tikhonov regularization is standard.
Why This Matters: From LMMSE to Wiener Filtering and MIMO Detection
The LMMSE estimator is the finite-dimensional version of the Wiener filter. In MIMO detection, the received signal is where is the transmitted signal. The LMMSE detector is
This is used in every modern wireless receiver β from 4G LTE to 5G NR massive MIMO. It balances noise suppression against interference mitigation.
Historical Note: Norbert Wiener and the Birth of Optimal Filtering
1941-1949The LMMSE framework traces back to Norbert Wiener's wartime work on anti-aircraft fire control (classified report, 1942; published as Extrapolation, Interpolation, and Smoothing of Stationary Time Series in 1949). Wiener posed the problem of predicting a signal corrupted by noise, and his solution β the Wiener filter β is the continuous-time version of the LMMSE estimator. Independently, Andrey Kolmogorov solved a similar problem in 1941. The discrete-time formulation as a matrix equation (the Wiener-Hopf equation) became a cornerstone of signal processing after the work of Levinson and Durbin in the 1940s-50s.
Quick Check
If and are uncorrelated (but not necessarily independent), what is the LMMSE estimate ?
Since , the LMMSE coefficient , so . The observation is useless for linear prediction.
LMMSE Estimation Algorithm
Complexity: where , dominated by the matrix inversionIn practice, solve via Cholesky factorization rather than explicit inversion.
LMMSE Channel Estimation for Massive MIMO-OFDM
Ito and Caire developed a structured LMMSE channel estimator for massive MIMO-OFDM that exploits the Kronecker structure of the time-frequency channel covariance. Their approach reduces the complexity from to by decomposing the problem into separate spatial and frequency-domain estimation steps. The key insight is that the LMMSE formula admits a Kronecker factorization when the channel has separable spatial-frequency statistics.
Computational Complexity of LMMSE in Practice
The LMMSE estimator requires inverting the observation covariance matrix , costing . For massive MIMO with hundreds of antennas, this can be prohibitive. Common strategies: (1) Exploit Toeplitz/block-Toeplitz structure (Wiener filtering in frequency domain via FFT). (2) Diagonal approximation: ignore off-diagonal covariance terms. (3) Reduced-rank methods: project onto the dominant eigenvectors of . (4) Iterative methods (conjugate gradient) that avoid explicit inversion.
LMMSE (Linear Minimum Mean Square Error)
The affine estimator that minimizes . Requires only first and second moments. Equals the MMSE estimator when the joint distribution is Gaussian.
Related: Minimum Mean Square Error (MMSE) Estimator, Wiener Filter
Wiener-Hopf Equation
The matrix equation that defines the LMMSE estimator. Named after Norbert Wiener and Eberhard Hopf.
Related: LMMSE Estimator (Vector Case: Wiener-Hopf Equation), Wiener Filter