Ferkans — Interactive Telecom Tutor

The Estimation Problem

After receiving $\mathbf{Y}_p$ , the base station performs channel estimation. We work with the sufficient statistic obtained by correlating with user $k$ ''s pilot:

$\mathbf{y}_k = \mathbf{Y}_p \boldsymbol{\phi}_k^* = \sqrt{p_u \tau_p}\, \mathbf{H}_{k} + \mathbf{N}_p \boldsymbol{\phi}_k^*$

Defining $\mathbf{n}_k = \mathbf{N}_p \boldsymbol{\phi}_k^* \sim \mathcal{CN}(\mathbf{0}, \tau_p \sigma^2 \mathbf{I})$ :

$\mathbf{y}_k = \sqrt{p_u \tau_p}\, \mathbf{H}_{k} + \mathbf{n}_k$

We estimate $\mathbf{H}_{k}$ from this observation. Two classical estimators exist: LS (requires no statistical knowledge) and MMSE (requires knowledge of $\mathbf{R}_k$ ).

Definition:
Least-Squares (LS) Channel Estimator

The least-squares estimator minimizes $\|\mathbf{y}_k - \sqrt{p_u \tau_p}\, \mathbf{H}\|^2$ with respect to $\mathbf{H}$ . Since the observation model is linear, the LS solution is:

$\hat{\mathbf{H}}_k^{\text{LS}} = \frac{1}{\sqrt{p_u \tau_p}} \mathbf{y}_k = \mathbf{H}_{k} + \frac{1}{\sqrt{p_u \tau_p}} \mathbf{n}_k$

Properties:

Unbiased: $\mathbb{E}[\hat{\mathbf{H}}_k^{\text{LS}}] = \mathbf{H}_{k}$
Error covariance: $\mathbf{C}_k^{\text{LS}} = \frac{\sigma^2}{p_u} \mathbf{I}_{N_t}$
MSE: $\text{MSE}_k^{\text{LS}} = \text{tr}(\mathbf{C}_k^{\text{LS}}) = \frac{N_t \sigma^2}{p_u \tau_p}$
Requires no statistical knowledge of $\mathbf{H}_{k}$
Does NOT exploit spatial correlation structure

The LS estimator treats each antenna independently, ignoring the correlation structure encoded in $\mathbf{R}_k$ . Its MSE scales as $N_t/p_u\tau_p$ — proportional to the number of antennas, reflecting that each of $N_t$ channel coefficients is estimated independently.

Definition:
MMSE Channel Estimator

Assume $\mathbf{H}_{k} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R}_k)$ with known covariance $\mathbf{R}_k$ . The MMSE estimator (= posterior mean for jointly Gaussian signals) is:

$\hat{\mathbf{H}}_k^{\text{MMSE}} = \sqrt{p_u \tau_p}\, \mathbf{R}_k \left(p_u \tau_p \mathbf{R}_k + \sigma^2 \mathbf{I}\right)^{-1} \mathbf{y}_k$

Properties:

Error covariance: $\mathbf{C}_k = \mathbf{R}_k - p_u \tau_p \mathbf{R}_k (p_u \tau_p \mathbf{R}_k + \sigma^2 \mathbf{I})^{-1} \mathbf{R}_k$
MSE: $\text{MSE}_k^{\text{MMSE}} = \text{tr}(\mathbf{C}_k) \leq \text{MSE}_k^{\text{LS}}$ always
The estimated channel $\hat{\mathbf{H}}_k^{\text{MMSE}} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R}_k - \mathbf{C}_k)$ (the estimate is distributed as a zero-mean Gaussian with covariance equal to the reduction in uncertainty)
The estimate and error are uncorrelated: $\mathbb{E}[\hat{\mathbf{H}}_k \tilde{\mathbf{H}}_k^H] = \mathbf{0}$ (orthogonality principle)

Theorem: MMSE Estimator MSE via Eigendecomposition

Let $\mathbf{R}_k = \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H$ be the eigendecomposition of the spatial covariance matrix, where $\boldsymbol{\Lambda}_k = \text{diag}(\lambda_1, \ldots, \lambda_{N_t})$ with $\lambda_1 \geq \cdots \geq \lambda_{N_t} \geq 0$ .

Then the MMSE estimation MSE is:

$\text{MSE}_k^{\text{MMSE}} = \sum_{i=1}^{N_t} \frac{\lambda_i \sigma^2}{p_u \tau_p \lambda_i + \sigma^2}$

As $p_u \tau_p \to \infty$ , $\text{MSE}_k^{\text{MMSE}} \to 0$ . As $p_u \tau_p \to 0$ , $\text{MSE}_k^{\text{MMSE}} \to \text{tr}(\mathbf{R}_k) = \mathbb{E}[\|\mathbf{H}_{k}\|^2]$ .

Each eigendirection is estimated independently with a per-mode SNR $p_u \tau_p \lambda_i / \sigma^2$ . Low-energy eigenmodes (small $\lambda_i$ ) contribute little signal and much noise — their estimation is dominated by the prior, which shrinks the estimate to near zero. High-energy eigenmodes (large $\lambda_i$ ) are estimated with high SNR and nearly perfect accuracy.

Show Hint

Use the eigendecomposition to diagonalize the matrix inverse in the error covariance expression.

The error covariance is $\mathbf{C}_k = \mathbf{R}_k - p_u\tau_p \mathbf{R}_k(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1}\mathbf{R}_k$ . Apply the matrix inversion lemma to simplify.

In the eigenbasis, each eigenvalue contributes $\lambda_i \sigma^2 / (p_u\tau_p\lambda_i + \sigma^2)$ to the MSE.

Proof

Eigendecompose the error covariance

Substituting the eigendecomposition $\mathbf{R}_k = \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H$ into $\mathbf{C}_k$ :

$\mathbf{C}_k = \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H - p_u \tau_p \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H \left(p_u \tau_p \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H + \sigma^2 \mathbf{I}\right)^{-1} \mathbf{U}_k \boldsymbol{\Lambda}_k \mathbf{U}_k^H$

Factor out unitary matrices

Since $\mathbf{U}_k$ is unitary, $(p_u\tau_p\mathbf{U}_k\boldsymbol{\Lambda}_k\mathbf{U}_k^H + \sigma^2\mathbf{I})^{-1} = \mathbf{U}_k(p_u\tau_p\boldsymbol{\Lambda}_k + \sigma^2\mathbf{I})^{-1}\mathbf{U}_k^H$ .

Substituting and using $\mathbf{U}_k^H\mathbf{U}_k = \mathbf{I}$ :

$\mathbf{C}_k = \mathbf{U}_k \left[\boldsymbol{\Lambda}_k - p_u\tau_p\boldsymbol{\Lambda}_k(p_u\tau_p\boldsymbol{\Lambda}_k + \sigma^2\mathbf{I})^{-1}\boldsymbol{\Lambda}_k\right]\mathbf{U}_k^H$

Simplify each diagonal entry

Since all matrices inside are diagonal with entries $\lambda_i$ :

$[\mathbf{C}_k]_{ii} = \lambda_i - \frac{p_u\tau_p\lambda_i^2}{p_u\tau_p\lambda_i + \sigma^2} = \frac{\lambda_i \sigma^2}{p_u\tau_p\lambda_i + \sigma^2}$

Taking the trace: $\text{MSE}_k = \text{tr}(\mathbf{C}_k) = \sum_{i=1}^{N_t} \frac{\lambda_i\sigma^2}{p_u\tau_p\lambda_i + \sigma^2}$ . $\blacksquare$

LS vs. MMSE Channel Estimation MSE

Compare the normalized MSE of the LS estimator ( $N_t\sigma^2 / p_u\tau_p\sum_i\lambda_i$ ) and the MMSE estimator ( $\sum_i\lambda_i\sigma^2/(p_u\tau_p\lambda_i+\sigma^2)$ ) as functions of $N_t$ . Use sliders to adjust SNR, pilot length, and channel rank.

Parameters

Pilot SNR

p_u/\\sigma^2

(dB)10

Pilot length

\\tau_p

10

Channel rank

r_k

10

Max antennas

N_t

128

Theorem: MMSE vs. LS MSE Gap

Let $r_k = \text{rank}(\mathbf{R}_k) \leq N_t$ be the effective channel rank. Then:

$\frac{\text{MSE}_k^{\text{LS}}}{\text{MSE}_k^{\text{MMSE}}} \geq \frac{r_k}{N_t} \cdot \frac{p_u \tau_p \lambda_1 / \sigma^2 + 1}{1}$

where $\lambda_1$ is the largest eigenvalue of $\mathbf{R}_k$ . The MSE gap grows with the pilot SNR $p_u\tau_p\lambda_1/\sigma^2$ and the ratio $N_t/r_k$ . At high SNR, LS is $N_t/r_k$ times worse than MMSE asymptotically.

MMSE knows the channel lives in the $r_k$ -dimensional subspace spanned by the eigenvectors of $\mathbf{R}_k$ . It suppresses the $N_t - r_k$ 'empty' dimensions, receiving only signal there. LS, being unaware of this structure, spreads its resources uniformly across all $N_t$ dimensions, wasting effort on directions where there is only noise.

Show Hint

At high SNR, $\text{MSE}_k^\text{MMSE} \approx \sum_i \sigma^2/(p_u\tau_p)$ for the $r_k$ nonzero eigenvalues.

Compare with $\\text{MSE}_k^\\text{LS} = N_t\\sigma^2/(p_u\\tau_p)$ .

Proof

High-SNR MMSE MSE

For $p_u\tau_p\lambda_i \gg \sigma^2$ :

$\text{MSE}_k^{\text{MMSE}} \approx \sum_{i:\lambda_i > 0} \frac{\sigma^2}{p_u\tau_p} = \frac{r_k \sigma^2}{p_u\tau_p}$

High-SNR LS MSE

$\text{MSE}_k^{\text{LS}} = \frac{N_t\sigma^2}{p_u\tau_p}$ $

Take the ratio

$\frac{\text{MSE}_k^{\text{LS}}}{\text{MSE}_k^{\text{MMSE}}} \approx \frac{N_t}{r_k} \geq 1$ $with equality iff$ r_k = N_t $(i.i.d. channel). For a one-ring model with narrow angular spread,$ r_k \ll N_t $and the gain is substantial.$ \blacksquare$

Key Takeaway

MMSE exploits correlation; LS does not. When the channel has low effective rank $r_k \ll N_t$ (as occurs with spatial correlation and limited angular spread), MMSE outperforms LS by the factor $N_t/r_k$ at high SNR. In a massive MIMO array with 128 antennas and a typical urban rank of 10–20, this is a 6–11 dB improvement in estimation quality.

Example: MMSE Estimation with the One-Ring Covariance Model

A ULA with $N_t = 64$ antennas, half-wavelength spacing, serves a user at angle $\theta = 15°$ with angular spread $\Delta\theta = 5°$ . The one-ring model gives $[\mathbf{R}_k]_{mn} = \beta_k \text{sinc}(2\Delta\theta(m-n)/\lambda) e^{j2\pi d(m-n)\sin\theta/\lambda}$ (using the approximation from Ch. 2). At pilot SNR $p_u\tau_p/\sigma^2 = 10$ dB, compare the LS and MMSE MSE.

Solution

Determine effective rank

For a ULA with $N_t = 64$ , half-wavelength spacing, and $\Delta\theta = 5°$ , the effective rank is approximately:

$r_k \approx N_t \cdot 2\Delta\theta / (180°) \approx 64 \times 10/180 \approx 3.6$

So roughly $r_k \approx 4$ dominant eigenvalues concentrate most of the channel energy.

Compute LS MSE

With pilot SNR $\rho_p = p_u\tau_p/\sigma^2 = 10^{10/10} = 10$ :

$\text{MSE}^{\text{LS}} = \frac{N_t}{\rho_p} = \frac{64}{10} = 6.4$

(normalized by $\text{tr}(\mathbf{R}_k) = \beta_k N_t$ , assuming $\beta_k = 1$ )

Compute MMSE MSE

With $r_k \approx 4$ eigenvalues of magnitude $\lambda_i \approx \text{tr}(\mathbf{R}_k)/r_k = 64/4 = 16$ :

$\text{MSE}^{\text{MMSE}} \approx r_k \cdot \frac{1}{\rho_p\lambda_i + 1} \approx 4 \times \frac{1}{160 + 1} \approx 0.025$

Compute the gain

$\frac{\text{MSE}^{\text{LS}}}{\text{MSE}^{\text{MMSE}}} \approx \frac{6.4}{0.025} \approx 256 = 24 \text{ dB}$ $

The MMSE estimator is roughly 24 dB better because it knows the channel lives in a 4-dimensional subspace of the 64-dimensional space.

Common Mistake: MMSE Requires Accurate $\\mathbf{R}_k$ — LS Does Not

Mistake:

MMSE always beats LS, so just use MMSE.

Correction:

MMSE requires accurate knowledge of the spatial covariance matrix $\mathbf{R}_k$ . In practice, $\mathbf{R}_k$ must be estimated from data (typically using long-term sample averaging over many coherence intervals). If $\mathbf{R}_k$ is estimated incorrectly — e.g., due to limited averaging samples, user mobility, or angular spread mismatch — the "MMSE' estimator can actually perform worse than LS in some scenarios. Furthermore, the matrix inversion $(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1}$ has complexity $\mathcal{O}(N_t^{3})$ , which may be prohibitive for very large arrays.

Rule of thumb: Use MMSE when $\mathbf{R}_k$ is reliably estimated and hardware complexity allows. Use LS or regularized LS when only rough statistical knowledge is available.

LS vs. MMSE Channel Estimator Comparison

Property	LS Estimator	MMSE Estimator
Required prior knowledge	None	$\\mathbf{R}_k$ , $\\sigma^2$
MSE expression	$N_t\\sigma^2/(p_u\\tau_p)$	$\\sum_i \\lambda_i\\sigma^2/(p_u\\tau_p\\lambda_i+\\sigma^2)$
Estimation bias	Unbiased	Biased (shrinks toward prior mean $\\mathbf{0}$ )
Exploits spatial correlation	No	Yes (via $\\mathbf{R}_k$ eigenbasis)
Complexity per user	$\\mathcal{O}(N_t)$	$\\mathcal{O}(N_t^3)$ (matrix inversion)
MSE at high pilot SNR	$N_t\\sigma^2/(p_u\\tau_p)$	$r_k\\sigma^2/(p_u\\tau_p)$
MSE at low pilot SNR	$N_t\\sigma^2/(p_u\\tau_p)$	$\\text{tr}(\\mathbf{R}_k)$ (prior dominates)

MMSE via Matrix Inversion Lemma

By the Woodbury matrix identity, the MMSE estimator has an equivalent form:

$\hat{\mathbf{H}}_k^{\text{MMSE}} = \left(\mathbf{R}_k^{-1} + \frac{p_u\tau_p}{\sigma^2}\mathbf{I}\right)^{-1} \frac{\sqrt{p_u\tau_p}}{\sigma^2} \mathbf{y}_k$

(valid when $\mathbf{R}_k$ is invertible). This form is sometimes preferred numerically when $\mathbf{R}_k$ has well-conditioned eigenvalues. The second form with $(p_u\tau_p\mathbf{R}_k + \sigma^2\mathbf{I})^{-1}$ is preferred when $\mathbf{R}_k$ is rank-deficient (as is typical when the effective rank $r_k \ll N_t$ ).

⚠️Engineering Note

Estimating $\\mathbf{R}_k$ in Practice

The covariance matrix $\mathbf{R}_k$ evolves on a timescale much longer than the channel itself (seconds to minutes, vs. milliseconds for $\mathbf{H}_{k}$ ). It can be estimated using a long-term sample average of the instantaneous outer products:

$\hat{\mathbf{R}}_k = \frac{1}{N_s}\sum_{n=1}^{N_s} \mathbf{y}_k[n]\mathbf{y}_k[n]^H$

where $\mathbf{y}_k[n]$ is the pilot observation in coherence interval $n$ . Statistical accuracy requires $N_s \gg N_t$ observations (typically $N_s \geq 10N_t$ for reasonable estimation quality). For $N_t = 128$ , this means averaging over $\sim 1280$ coherence intervals — feasible for stationary or slow-moving users but challenging for vehicular UEs.

Practical Constraints

•
For N_t = 64: at least 640 coherence intervals needed for 10% Frobenius norm error in R_k estimate
•
Covariance estimation overhead not counted in pilot overhead but represents long-term cost
•
Structured covariance models (one-ring, Kronecker) reduce estimation to few parameters

Quick Check

A user's channel covariance has effective rank $r_k = 8$ with equal eigenvalues, and $N_t = 64$ . At high pilot SNR, the MMSE MSE is approximately what fraction of the LS MSE?

$r_k/N_t = 1/8$

$N_t/r_k = 8$

1 (they are equal)

$r_k = 8$