Ferkans — Interactive Telecom Tutor

Why Restrict to Linear Estimators?

The MMSE estimator $\mathbb{E}[X|Y]$ is optimal, but computing it requires knowing the full conditional density $f(x|y)$ — which is often unavailable or intractable. A pragmatic alternative: restrict the estimator to be a linear (really, affine) function of the observations.

The linear MMSE (LMMSE) estimator only requires the first and second moments: means, variances, and covariances. These are typically easier to estimate from data. The price is sub-optimality — but for jointly Gaussian data, there is no price at all: LMMSE = MMSE.

Definition:
Linear MMSE Estimator (Scalar Case)

The linear MMSE (LMMSE) estimator of $X$ given $Y$ is

$\hat{X}_{\text{LMMSE}} = a^* Y + b^*$

where $(a^*, b^*)$ minimize $\mathbb{E}[(X - aY - b)^2]$ over all $a, b \in \mathbb{R}$ .

The solution is

$\hat{X}_{\text{LMMSE}} = \mu_X + \frac{\text{Cov}(X,Y)}{\text{Var}(Y)}(Y - \mu_Y).$

The LMMSE MSE is

$\text{MSE}_{\text{LMMSE}} = \text{Var}(X)\left(1 - \rho_{XY}^2\right)$

where $\rho_{XY} = \text{Cov}(X,Y)/(\sigma_X \sigma_Y)$ is the correlation coefficient.

The LMMSE estimator depends on $Y$ only through first and second moments. No knowledge of the full distribution is needed.

,

Theorem: Derivation of the LMMSE Estimator

The optimal affine estimator $\hat{X} = aY + b$ minimizing $\mathbb{E}[(X - aY - b)^2]$ is given by

$a^* = \frac{\text{Cov}(X,Y)}{\text{Var}(Y)}, \qquad b^* = \mu_X - a^* \mu_Y.$

Setting the gradient of the MSE to zero gives two linear equations (the normal equations). The slope $a^*$ is the regression coefficient — exactly the slope of the least-squares regression line.

Proof

Write the MSE

$\text{MSE}(a,b) = \mathbb{E}[(X - aY - b)^2]$ . Expanding: $= \mathbb{E}[X^2] - 2a\,\mathbb{E}[XY] - 2b\,\mathbb{E}[X] + a^2\,\mathbb{E}[Y^2] + 2ab\,\mathbb{E}[Y] + b^2.$

Differentiate with respect to $b$

$\frac{\partial}{\partial b}\text{MSE} = -2\,\mathbb{E}[X] + 2a\,\mathbb{E}[Y] + 2b = 0$

$\Rightarrow b^* = \mathbb{E}[X] - a\,\mathbb{E}[Y] = \mu_X - a\,\mu_Y.$

Substitute and differentiate with respect to $a$

Substituting $b^* = \mu_X - a\mu_Y$ , the MSE becomes a function of centered variables: $\text{MSE}(a) = \mathbb{E}[(X - \mu_X)^2] - 2a\,\text{Cov}(X,Y) + a^2\,\text{Var}(Y).$

Differentiating: $-2\,\text{Cov}(X,Y) + 2a\,\text{Var}(Y) = 0 \Rightarrow a^* = \text{Cov}(X,Y)/\text{Var}(Y).$ $\blacksquare$

Definition:
LMMSE Estimator (Vector Case: Wiener-Hopf Equation)

For a random variable $X$ and observation vector $\mathbf{Y} \in \mathbb{R}^m$ , the LMMSE estimator is

$\hat{X}_{\text{LMMSE}} = \mu_X + \mathbf{C}_{X\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})$

where $\mathbf{C}_{X\mathbf{Y}} = \mathbb{E}[(X - \mu_X)(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})^\mathsf{T}] \in \mathbb{R}^{1 \times m}$ is the cross-covariance (row) vector and $\mathbf{C}_{\mathbf{Y}\mathbf{Y}}$ is the covariance matrix of $\mathbf{Y}$ .

More generally, for a random vector $\mathbf{X} \in \mathbb{R}^n$ :

$\hat{\mathbf{X}}_{\text{LMMSE}} = \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{X}\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}}).$

This is the Wiener-Hopf equation.

The matrix $\mathbf{C}_{\mathbf{X}\mathbf{Y}} \mathbf{C}_{\mathbf{Y}\mathbf{Y}}^{-1}$ plays the role of the regression coefficient $a^*$ from the scalar case.

,

Theorem: LMMSE Equals MMSE for Jointly Gaussian Random Variables

If $(\mathbf{X}, \mathbf{Y})$ is jointly Gaussian, then

$\hat{\mathbf{X}}_{\text{LMMSE}} = \mathbb{E}[\mathbf{X}|\mathbf{Y}] = \hat{\mathbf{X}}_{\text{MMSE}}.$

The LMMSE estimator is the MMSE estimator — restricting to linear functions costs nothing.

For jointly Gaussian variables, the conditional expectation $\mathbb{E}[\mathbf{X}|\mathbf{Y}]$ is already a linear function of $\mathbf{Y}$ (see Example $(X,Y)$ $(X, Y)$ " data-ref-type="example">EConditional Expectation for Jointly Gaussian $(X,Y)$ ). So the linear estimator class already contains the optimal estimator.

Proof

Recall the Gaussian conditional distribution

For $\begin{pmatrix}\mathbf{X}\\\mathbf{Y}\end{pmatrix} \sim \mathcal{N}\!\left(\begin{pmatrix}\boldsymbol{\mu}_{\mathbf{X}}\\\boldsymbol{\mu}_{\mathbf{Y}}\end{pmatrix}, \begin{pmatrix}\mathbf{C}_{\mathbf{XX}} & \mathbf{C}_{\mathbf{XY}} \\ \mathbf{C}_{\mathbf{YX}} & \mathbf{C}_{\mathbf{YY}}\end{pmatrix}\right)$ ,

the conditional distribution is

$\mathbf{X}|\mathbf{Y} \sim \mathcal{N}(\boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}}), \; \mathbf{C}_{\mathbf{XX}} - \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}\mathbf{C}_{\mathbf{YX}})$ .

Read off the conditional mean

$\mathbb{E}[\mathbf{X}|\mathbf{Y}] = \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})$

which is exactly the LMMSE formula. $\blacksquare$

,

Example: LMMSE Channel Estimation

A channel coefficient $H \sim \mathcal{CN}(0, \sigma_H^2)$ is observed through $Y = H \cdot s + W$ where $s$ is a known pilot symbol and $W \sim \mathcal{CN}(0, \sigma^2)$ is independent noise. Find the LMMSE estimate of $H$ .

Solution

Identify the moments

$\mu_H = 0$ , $\mu_Y = 0$ , $\text{Var}(H) = \sigma_H^2$ , $\text{Cov}(H, Y) = \mathbb{E}[H \cdot Y^*] = \mathbb{E}[H(Hs + W)^*] = \sigma_H^2 s^*$ , $\text{Var}(Y) = |s|^2 \sigma_H^2 + \sigma^2$ .

Apply the LMMSE formula

$\hat{H}_{\text{LMMSE}} = \frac{\sigma_H^2 s^*}{|s|^2 \sigma_H^2 + \sigma^2} \cdot Y = \frac{\text{SNR}}{1 + \text{SNR}} \cdot \frac{Y}{s}$ $where$ \text{SNR} = |s|^2 \sigma_H^2 / \sigma^2$.

Interpret

At high SNR, $\hat{H} \approx Y/s$ (the LS estimate). At low SNR, $\hat{H} \approx 0$ (shrink toward the prior mean). The LMMSE interpolates smoothly between these extremes. The LMMSE MSE is $\sigma_H^2 \cdot \sigma^2/(|s|^2 \sigma_H^2 + \sigma^2) = \sigma_H^2/(1 + \text{SNR})$ .

,

LMMSE MSE vs. SNR for Channel Estimation

Plot the LMMSE mean square error as a function of SNR for channel estimation. Compare with the LS estimator MSE and the prior variance (no observation).

Parameters

\sigma_H^2

1

Channel variance

Max SNR (dB)20

Upper end of SNR range

MMSE vs. LMMSE Estimator

Property	MMSE	LMMSE
Formula	$\mathbb{E}[X\|Y]$	$\mu_X + \mathbf{C}_{XY}\mathbf{C}_{YY}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_Y)$
Requires	Full conditional density	First and second moments only
Optimality	Best among ALL estimators	Best among LINEAR estimators
MSE	$\mathbb{E}[\text{Var}(X\|Y)]$	$\text{Var}(X)(1 - \rho^2)$ (scalar)
Gaussian case	Linear (coincides with LMMSE)	Equals MMSE
Non-Gaussian case	Generally nonlinear	Sub-optimal but tractable

Common Mistake: LMMSE Requires Inversion of $\mathbf{C}_{YY}$

Mistake:

Blindly applying the LMMSE formula $\mathbf{C}_{XY}\mathbf{C}_{YY}^{-1}$ when $\mathbf{C}_{YY}$ is singular or ill-conditioned.

Correction:

If $\mathbf{C}_{YY}$ is singular, some observations are linearly dependent and can be removed. In practice, use the pseudo-inverse or regularize: $\mathbf{C}_{XY}(\mathbf{C}_{YY} + \epsilon \mathbf{I})^{-1}$ . For high-dimensional problems, Tikhonov regularization is standard.

Why This Matters: From LMMSE to Wiener Filtering and MIMO Detection

The LMMSE estimator is the finite-dimensional version of the Wiener filter. In MIMO detection, the received signal is $\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w}$ where $\mathbf{x}$ is the transmitted signal. The LMMSE detector is

$\hat{\mathbf{x}}_{\text{LMMSE}} = \mathbf{C}_{\mathbf{x}\mathbf{x}}\mathbf{H}^H(\mathbf{H}\mathbf{C}_{\mathbf{x}\mathbf{x}}\mathbf{H}^H + \sigma^2\mathbf{I})^{-1}\mathbf{y}.$

This is used in every modern wireless receiver — from 4G LTE to 5G NR massive MIMO. It balances noise suppression against interference mitigation.

,

Historical Note: Norbert Wiener and the Birth of Optimal Filtering

1941-1949

The LMMSE framework traces back to Norbert Wiener's wartime work on anti-aircraft fire control (classified report, 1942; published as Extrapolation, Interpolation, and Smoothing of Stationary Time Series in 1949). Wiener posed the problem of predicting a signal corrupted by noise, and his solution — the Wiener filter — is the continuous-time version of the LMMSE estimator. Independently, Andrey Kolmogorov solved a similar problem in 1941. The discrete-time formulation as a matrix equation (the Wiener-Hopf equation) became a cornerstone of signal processing after the work of Levinson and Durbin in the 1940s-50s.

Quick Check

If $X$ and $Y$ are uncorrelated (but not necessarily independent), what is the LMMSE estimate $\hat{X}_{\text{LMMSE}}$ ?

$\hat{X}_{\text{LMMSE}} = 0$

$\hat{X}_{\text{LMMSE}} = \mu_X$

$\hat{X}_{\text{LMMSE}} = \mathbb{E}[X|Y]$

$\hat{X}_{\text{LMMSE}} = Y$

Correction:

\hat{X}_{\text{LMMSE}} = \mu_X

Since $\text{Cov}(X,Y) = 0$ , the LMMSE coefficient $a^* = 0$ , so $\hat{X}_{\text{LMMSE}} = \mu_X + 0 \cdot (Y - \mu_Y) = \mu_X$ . The observation $Y$ is useless for linear prediction.

LMMSE Estimation Algorithm

Complexity:

O(m^3)

where

m = \dim(\mathbf{Y})

, dominated by the matrix inversion

Input: Observations

\mathbf{Y}

, prior moments

\boldsymbol{\mu}_{\mathbf{X}}, \boldsymbol{\mu}_{\mathbf{Y}}, \mathbf{C}_{\mathbf{XY}}, \mathbf{C}_{\mathbf{YY}}

Output: LMMSE estimate

\hat{\mathbf{X}}_{\text{LMMSE}}

and error covariance

\mathbf{C}_{\tilde{\mathbf{X}}}

1. Compute the LMMSE gain:

\mathbf{K} \leftarrow \mathbf{C}_{\mathbf{XY}} \mathbf{C}_{\mathbf{YY}}^{-1}

2. Compute the estimate:

\hat{\mathbf{X}} \leftarrow \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{K}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})

3. Compute the error covariance:

\mathbf{C}_{\tilde{\mathbf{X}}} \leftarrow \mathbf{C}_{\mathbf{XX}} - \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}\mathbf{C}_{\mathbf{YX}}

4. return

\hat{\mathbf{X}}, \mathbf{C}_{\tilde{\mathbf{X}}}

In practice, solve $\mathbf{C}_{\mathbf{YY}} \mathbf{K}^\mathsf{T} = \mathbf{C}_{\mathbf{YX}}$ via Cholesky factorization rather than explicit inversion.

🎓CommIT Contribution(2021)

LMMSE Channel Estimation for Massive MIMO-OFDM

K. Ito, G. Caire — IEEE Trans. Wireless Communications, vol. 20, no. 12

Ito and Caire developed a structured LMMSE channel estimator for massive MIMO-OFDM that exploits the Kronecker structure of the time-frequency channel covariance. Their approach reduces the complexity from $O(N_t^2 N_{\text{sub}}^2)$ to $O(N_t^2 + N_{\text{sub}}^2)$ by decomposing the problem into separate spatial and frequency-domain estimation steps. The key insight is that the LMMSE formula $\mathbf{C}_{H Y}\mathbf{C}_{YY}^{-1}$ admits a Kronecker factorization when the channel has separable spatial-frequency statistics.

massive-mimochannel-estimationlmmseofdmView Paper →

🔧Engineering Note

Computational Complexity of LMMSE in Practice

The LMMSE estimator requires inverting the $m \times m$ observation covariance matrix $\mathbf{C}_{\mathbf{YY}}$ , costing $O(m^3)$ . For massive MIMO with hundreds of antennas, this can be prohibitive. Common strategies: (1) Exploit Toeplitz/block-Toeplitz structure (Wiener filtering in frequency domain via FFT). (2) Diagonal approximation: ignore off-diagonal covariance terms. (3) Reduced-rank methods: project onto the dominant eigenvectors of $\mathbf{C}_{\mathbf{YY}}$ . (4) Iterative methods (conjugate gradient) that avoid explicit inversion.

LMMSE (Linear Minimum Mean Square Error)

The affine estimator $\hat{X} = \mathbf{a}^\mathsf{T}\mathbf{Y} + b$ that minimizes $\mathbb{E}[(X - \hat{X})^2]$ . Requires only first and second moments. Equals the MMSE estimator when the joint distribution is Gaussian.

Wiener-Hopf Equation

The matrix equation $\hat{\mathbf{X}} = \boldsymbol{\mu}_{\mathbf{X}} + \mathbf{C}_{\mathbf{XY}}\mathbf{C}_{\mathbf{YY}}^{-1}(\mathbf{Y} - \boldsymbol{\mu}_{\mathbf{Y}})$ that defines the LMMSE estimator. Named after Norbert Wiener and Eberhard Hopf.

The Linear MMSE Estimator