LMMSE Estimation
Why Restrict to Linear Estimators?
The MMSE estimator is typically nonlinear in and requires knowledge of the full joint distribution of . Both are obstacles in practice: full joint distributions are hard to specify, and nonlinear conditional expectations rarely admit closed forms. If we settle for the best estimator among affine functions , the optimization requires only second-order statistics — means and covariances — and yields a closed-form answer. The price is suboptimality whenever the true posterior mean is nonlinear. When is jointly Gaussian, there is no price at all.
Definition: Linear MMSE (LMMSE) Estimator
Linear MMSE (LMMSE) Estimator
Given a joint distribution of with known first and second moments , the LMMSE estimator is the affine function of minimizing the mean-square error:
Theorem: The LMMSE Formula
Assume is positive definite. The LMMSE estimator is The LMMSE error covariance and total MSE are
The estimator has three parts: start from the prior mean , form the innovation (the observation minus what we'd predict without seeing it), and update by the gain matrix , which is the correlation between and normalized by the spread of .
Apply the orthogonality principle, restricted to the subspace of affine functions of .
Setting : choose it so the residual has zero mean.
Setting : require that the residual is orthogonal to each component of .
Fix the offset $\mathbf{b}$
Write . The residual has mean ; this must be zero for the MSE not to contain a squared-bias term . Hence , giving the affine form .
Orthogonality on the affine subspace
Substitute and (zero-mean versions). The orthogonality principle on the affine subspace requires the residual to be uncorrelated with every linear function :
Solve the normal equations
Expanding yields , i.e. the normal equations . Since , the unique solution is .
Compute the error covariance
The residual has using to simplify. The MSE is the trace.
Vector LMMSE Computation
Complexity: for the Cholesky factorization of .Never form explicitly in code — solve linear systems via Cholesky or LU. The posterior covariance does not depend on the observation , so step 5 can be precomputed once and reused for all subsequent observations.
Theorem: Orthogonality for LMMSE
The LMMSE residual is zero-mean and uncorrelated with every affine function of :
LMMSE is the projection of onto the span of the components of (plus the constant function). The residual is orthogonal to that subspace — hence uncorrelated with linear functions of . Unlike the full MMSE, there is no guarantee that the residual is uncorrelated with nonlinear functions of .
Zero mean
.
Uncorrelated with $\mathbf{Y}$
by the normal equations. Since is deterministic and has zero mean, as well.
Example: Vector Gaussian Observation Model
Let and , with independent of . Find the LMMSE estimator and its error covariance.
Second-order statistics
, , , .
Apply the LMMSE formula
.
Use the matrix inversion lemma
By the Woodbury identity, an equivalent form is This is the regularized least-squares form: the Tikhonov regularizer is precisely the inverse prior covariance.
Error covariance
, again by Woodbury.
LMMSE MSE vs. SNR
For the model with and , compare the LMMSE MSE with the prior-only MSE () and the noise floor () as SNR varies.
Parameters
Numerical Conditioning of
In practice the LMMSE requires solving . When pilots are correlated or the noise is small, can be ill-conditioned, and a naive solve amplifies numerical noise. The standard remedy is a diagonal loading with chosen so that stays within a safe range (e.g., below in double precision). This is equivalent to adding to the assumed noise covariance, which trades a small increase in MSE for large gains in numerical stability.
- •
Double precision loses accuracy when
- •
Real-time systems use Cholesky updates instead of refactoring from scratch
Innovation
The zero-mean quantity (or, more generally, where is a linear predictor of from past information). The innovation carries the "new" information in the observation that could not be predicted from what was already known.
LMMSE ≠ BLUE (in general)
In classical (non-Bayesian) estimation, the best linear unbiased estimator (BLUE) is the minimum-variance estimator among unbiased linear functions of the observation. In the Bayesian setting we do not constrain unbiasedness; the LMMSE can be biased (and usually is, via the shrinkage toward ), trading a little bias for a lot of variance reduction. When the prior is non-informative (), the two coincide.
Quick Check
Which of the following second-order quantities does the LMMSE estimator not need?
The third moment
The LMMSE formula uses only first and second moments — means, covariances, and cross-covariances. It is invariant to all higher-order statistics.