LS and MMSE Channel Estimation

The Estimation Problem

After receiving Yp\mathbf{Y}_p, the base station performs channel estimation. We work with the sufficient statistic obtained by correlating with user kk''s pilot:

yk=YpΟ•kβˆ—=puΟ„p Hk+NpΟ•kβˆ—\mathbf{y}_k = \mathbf{Y}_p \boldsymbol{\phi}_k^* = \sqrt{p_u \tau_p}\, \mathbf{H}_{k} + \mathbf{N}_p \boldsymbol{\phi}_k^*

Defining nk=NpΟ•kβˆ—βˆΌCN(0,Ο„pΟƒ2I)\mathbf{n}_k = \mathbf{N}_p \boldsymbol{\phi}_k^* \sim \mathcal{CN}(\mathbf{0}, \tau_p \sigma^2 \mathbf{I}):

yk=puΟ„p Hk+nk\mathbf{y}_k = \sqrt{p_u \tau_p}\, \mathbf{H}_{k} + \mathbf{n}_k

We estimate Hk\mathbf{H}_{k} from this observation. Two classical estimators exist: LS (requires no statistical knowledge) and MMSE (requires knowledge of Rk\mathbf{R}_k).

Definition:

Least-Squares (LS) Channel Estimator

The least-squares estimator minimizes βˆ₯ykβˆ’puΟ„p Hβˆ₯2\|\mathbf{y}_k - \sqrt{p_u \tau_p}\, \mathbf{H}\|^2 with respect to H\mathbf{H}. Since the observation model is linear, the LS solution is:

H^kLS=1puΟ„pyk=Hk+1puΟ„pnk\hat{\mathbf{H}}_k^{\text{LS}} = \frac{1}{\sqrt{p_u \tau_p}} \mathbf{y}_k = \mathbf{H}_{k} + \frac{1}{\sqrt{p_u \tau_p}} \mathbf{n}_k

Properties:

  • Unbiased: E[H^kLS]=Hk\mathbb{E}[\hat{\mathbf{H}}_k^{\text{LS}}] = \mathbf{H}_{k}
  • Error covariance: CkLS=Οƒ2puINt\mathbf{C}_k^{\text{LS}} = \frac{\sigma^2}{p_u} \mathbf{I}_{N_t}
  • MSE: MSEkLS=tr(CkLS)=NtΟƒ2puΟ„p\text{MSE}_k^{\text{LS}} = \text{tr}(\mathbf{C}_k^{\text{LS}}) = \frac{N_t \sigma^2}{p_u \tau_p}
  • Requires no statistical knowledge of Hk\mathbf{H}_{k}
  • Does NOT exploit spatial correlation structure

The LS estimator treats each antenna independently, ignoring the correlation structure encoded in Rk\mathbf{R}_k. Its MSE scales as Nt/puΟ„pN_t/p_u\tau_p β€” proportional to the number of antennas, reflecting that each of NtN_t channel coefficients is estimated independently.

Definition:

MMSE Channel Estimator

Assume Hk∼CN(0,Rk)\mathbf{H}_{k} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R}_k) with known covariance Rk\mathbf{R}_k. The MMSE estimator (= posterior mean for jointly Gaussian signals) is:

H^kMMSE=puΟ„p Rk(puΟ„pRk+Οƒ2I)βˆ’1yk\hat{\mathbf{H}}_k^{\text{MMSE}} = \sqrt{p_u \tau_p}\, \mathbf{R}_k \left(p_u \tau_p \mathbf{R}_k + \sigma^2 \mathbf{I}\right)^{-1} \mathbf{y}_k

Properties:

  • Error covariance: Ck=Rkβˆ’puΟ„pRk(puΟ„pRk+Οƒ2I)βˆ’1Rk\mathbf{C}_k = \mathbf{R}_k - p_u \tau_p \mathbf{R}_k (p_u \tau_p \mathbf{R}_k + \sigma^2 \mathbf{I})^{-1} \mathbf{R}_k
  • MSE: MSEkMMSE=tr(Ck)≀MSEkLS\text{MSE}_k^{\text{MMSE}} = \text{tr}(\mathbf{C}_k) \leq \text{MSE}_k^{\text{LS}} always
  • The estimated channel H^kMMSE∼CN(0,Rkβˆ’Ck)\hat{\mathbf{H}}_k^{\text{MMSE}} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R}_k - \mathbf{C}_k) (the estimate is distributed as a zero-mean Gaussian with covariance equal to the reduction in uncertainty)
  • The estimate and error are uncorrelated: E[H^kH~kH]=0\mathbb{E}[\hat{\mathbf{H}}_k \tilde{\mathbf{H}}_k^H] = \mathbf{0} (orthogonality principle)

Theorem: MMSE Estimator MSE via Eigendecomposition

Let Rk=UkΞ›kUkH\mathbf{R}_k = \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H be the eigendecomposition of the spatial covariance matrix, where Ξ›k=diag(Ξ»1,…,Ξ»Nt)\boldsymbol{\Lambda}_k = \text{diag}(\lambda_1, \ldots, \lambda_{N_t}) with Ξ»1β‰₯β‹―β‰₯Ξ»Ntβ‰₯0\lambda_1 \geq \cdots \geq \lambda_{N_t} \geq 0.

Then the MMSE estimation MSE is:

MSEkMMSE=βˆ‘i=1NtΞ»iΟƒ2puΟ„pΞ»i+Οƒ2\text{MSE}_k^{\text{MMSE}} = \sum_{i=1}^{N_t} \frac{\lambda_i \sigma^2}{p_u \tau_p \lambda_i + \sigma^2}

As puΟ„pβ†’βˆžp_u \tau_p \to \infty, MSEkMMSEβ†’0\text{MSE}_k^{\text{MMSE}} \to 0. As puΟ„pβ†’0p_u \tau_p \to 0, MSEkMMSEβ†’tr(Rk)=E[βˆ₯Hkβˆ₯2]\text{MSE}_k^{\text{MMSE}} \to \text{tr}(\mathbf{R}_k) = \mathbb{E}[\|\mathbf{H}_{k}\|^2].

Each eigendirection is estimated independently with a per-mode SNR puΟ„pΞ»i/Οƒ2p_u \tau_p \lambda_i / \sigma^2. Low-energy eigenmodes (small Ξ»i\lambda_i) contribute little signal and much noise β€” their estimation is dominated by the prior, which shrinks the estimate to near zero. High-energy eigenmodes (large Ξ»i\lambda_i) are estimated with high SNR and nearly perfect accuracy.

LS vs. MMSE Channel Estimation MSE

Compare the normalized MSE of the LS estimator (NtΟƒ2/puΟ„pβˆ‘iΞ»iN_t\sigma^2 / p_u\tau_p\sum_i\lambda_i) and the MMSE estimator (βˆ‘iΞ»iΟƒ2/(puΟ„pΞ»i+Οƒ2)\sum_i\lambda_i\sigma^2/(p_u\tau_p\lambda_i+\sigma^2)) as functions of NtN_t. Use sliders to adjust SNR, pilot length, and channel rank.

Parameters
10
10
10
128

Theorem: MMSE vs. LS MSE Gap

Let rk=rank(Rk)≀Ntr_k = \text{rank}(\mathbf{R}_k) \leq N_t be the effective channel rank. Then:

MSEkLSMSEkMMSEβ‰₯rkNtβ‹…puΟ„pΞ»1/Οƒ2+11\frac{\text{MSE}_k^{\text{LS}}}{\text{MSE}_k^{\text{MMSE}}} \geq \frac{r_k}{N_t} \cdot \frac{p_u \tau_p \lambda_1 / \sigma^2 + 1}{1}

where Ξ»1\lambda_1 is the largest eigenvalue of Rk\mathbf{R}_k. The MSE gap grows with the pilot SNR puΟ„pΞ»1/Οƒ2p_u\tau_p\lambda_1/\sigma^2 and the ratio Nt/rkN_t/r_k. At high SNR, LS is Nt/rkN_t/r_k times worse than MMSE asymptotically.

MMSE knows the channel lives in the rkr_k-dimensional subspace spanned by the eigenvectors of Rk\mathbf{R}_k. It suppresses the Ntβˆ’rkN_t - r_k 'empty' dimensions, receiving only signal there. LS, being unaware of this structure, spreads its resources uniformly across all NtN_t dimensions, wasting effort on directions where there is only noise.

Key Takeaway

MMSE exploits correlation; LS does not. When the channel has low effective rank rkβ‰ͺNtr_k \ll N_t (as occurs with spatial correlation and limited angular spread), MMSE outperforms LS by the factor Nt/rkN_t/r_k at high SNR. In a massive MIMO array with 128 antennas and a typical urban rank of 10–20, this is a 6–11 dB improvement in estimation quality.

Example: MMSE Estimation with the One-Ring Covariance Model

A ULA with Nt=64N_t = 64 antennas, half-wavelength spacing, serves a user at angle ΞΈ=15Β°\theta = 15Β° with angular spread Δθ=5Β°\Delta\theta = 5Β°. The one-ring model gives [Rk]mn=Ξ²ksinc(2Δθ(mβˆ’n)/Ξ»)ej2Ο€d(mβˆ’n)sin⁑θ/Ξ»[\mathbf{R}_k]_{mn} = \beta_k \text{sinc}(2\Delta\theta(m-n)/\lambda) e^{j2\pi d(m-n)\sin\theta/\lambda} (using the approximation from Ch. 2). At pilot SNR puΟ„p/Οƒ2=10p_u\tau_p/\sigma^2 = 10 dB, compare the LS and MMSE MSE.

Common Mistake: MMSE Requires Accurate mathbfRk\\mathbf{R}_k β€” LS Does Not

Mistake:

MMSE always beats LS, so just use MMSE.

Correction:

MMSE requires accurate knowledge of the spatial covariance matrix Rk\mathbf{R}_k. In practice, Rk\mathbf{R}_k must be estimated from data (typically using long-term sample averaging over many coherence intervals). If Rk\mathbf{R}_k is estimated incorrectly β€” e.g., due to limited averaging samples, user mobility, or angular spread mismatch β€” the "MMSE' estimator can actually perform worse than LS in some scenarios. Furthermore, the matrix inversion (puΟ„pRk+Οƒ2I)βˆ’1(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1} has complexity O(Nt3)\mathcal{O}(N_t^{3}), which may be prohibitive for very large arrays.

Rule of thumb: Use MMSE when Rk\mathbf{R}_k is reliably estimated and hardware complexity allows. Use LS or regularized LS when only rough statistical knowledge is available.

LS vs. MMSE Channel Estimator Comparison

PropertyLS EstimatorMMSE Estimator
Required prior knowledgeNonemathbfRk\\mathbf{R}_k, sigma2\\sigma^2
MSE expressionNtsigma2/(putaup)N_t\\sigma^2/(p_u\\tau_p)sumilambdaisigma2/(putauplambdai+sigma2)\\sum_i \\lambda_i\\sigma^2/(p_u\\tau_p\\lambda_i+\\sigma^2)
Estimation biasUnbiasedBiased (shrinks toward prior mean mathbf0\\mathbf{0})
Exploits spatial correlationNoYes (via mathbfRk\\mathbf{R}_k eigenbasis)
Complexity per usermathcalO(Nt)\\mathcal{O}(N_t)mathcalO(Nt3)\\mathcal{O}(N_t^3) (matrix inversion)
MSE at high pilot SNRNtsigma2/(putaup)N_t\\sigma^2/(p_u\\tau_p)rksigma2/(putaup)r_k\\sigma^2/(p_u\\tau_p)
MSE at low pilot SNRNtsigma2/(putaup)N_t\\sigma^2/(p_u\\tau_p)texttr(mathbfRk)\\text{tr}(\\mathbf{R}_k) (prior dominates)

MMSE via Matrix Inversion Lemma

By the Woodbury matrix identity, the MMSE estimator has an equivalent form:

H^kMMSE=(Rkβˆ’1+puΟ„pΟƒ2I)βˆ’1puΟ„pΟƒ2yk\hat{\mathbf{H}}_k^{\text{MMSE}} = \left(\mathbf{R}_k^{-1} + \frac{p_u\tau_p}{\sigma^2}\mathbf{I}\right)^{-1} \frac{\sqrt{p_u\tau_p}}{\sigma^2} \mathbf{y}_k

(valid when Rk\mathbf{R}_k is invertible). This form is sometimes preferred numerically when Rk\mathbf{R}_k has well-conditioned eigenvalues. The second form with (puΟ„pRk+Οƒ2I)βˆ’1(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1} is preferred when Rk\mathbf{R}_k is rank-deficient (as is typical when the effective rank rkβ‰ͺNtr_k \ll N_t).

⚠️Engineering Note

Estimating mathbfRk\\mathbf{R}_k in Practice

The covariance matrix Rk\mathbf{R}_k evolves on a timescale much longer than the channel itself (seconds to minutes, vs. milliseconds for Hk\mathbf{H}_{k}). It can be estimated using a long-term sample average of the instantaneous outer products:

R^k=1Nsβˆ‘n=1Nsyk[n]yk[n]H\hat{\mathbf{R}}_k = \frac{1}{N_s}\sum_{n=1}^{N_s} \mathbf{y}_k[n]\mathbf{y}_k[n]^H

where yk[n]\mathbf{y}_k[n] is the pilot observation in coherence interval nn. Statistical accuracy requires Ns≫NtN_s \gg N_t observations (typically Nsβ‰₯10NtN_s \geq 10N_t for reasonable estimation quality). For Nt=128N_t = 128, this means averaging over ∼1280\sim 1280 coherence intervals β€” feasible for stationary or slow-moving users but challenging for vehicular UEs.

Practical Constraints
  • β€’

    For N_t = 64: at least 640 coherence intervals needed for 10% Frobenius norm error in R_k estimate

  • β€’

    Covariance estimation overhead not counted in pilot overhead but represents long-term cost

  • β€’

    Structured covariance models (one-ring, Kronecker) reduce estimation to few parameters

Quick Check

A user's channel covariance has effective rank rk=8r_k = 8 with equal eigenvalues, and Nt=64N_t = 64. At high pilot SNR, the MMSE MSE is approximately what fraction of the LS MSE?

rk/Nt=1/8r_k/N_t = 1/8

Nt/rk=8N_t/r_k = 8

1 (they are equal)

rk=8r_k = 8

MMSE Estimator

The minimum mean squared error estimator is the conditional mean x^=E[x∣y]\hat{x} = \mathbb{E}[x|y]. For jointly Gaussian signals, it equals the LMMSE estimator and takes the linear form h^k=Rk(puΟ„pRk+Οƒ2I)βˆ’1puΟ„pyk\hat{\mathbf{h}}_k = \mathbf{R}_k(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1}\sqrt{p_u\tau_p}\mathbf{y}_k.

Related: Least-Squares (LS) Channel Estimator, Spatial Covariance Matrix