Ferkans — Interactive Telecom Tutor

Pilots, Channels, and Priors

A receiver cannot demodulate unless it knows the channel. The standard mechanism is pilot-aided estimation: the transmitter inserts known symbols into the stream, the receiver compares its observation of the pilots with what it expected, and it estimates the channel response from the mismatch. Classical (frequentist) receivers use least squares, which requires no prior knowledge. Bayesian receivers exploit the channel covariance — typically derived from a scattering or Doppler model — to outperform LS, especially at low SNR and with few pilots.

Definition:
Pilot-Based Channel Estimation Model

The transmitter sends $N_p$ pilot symbols, collected in a matrix $\mathbf{X}_p \in \mathbb{C}^{N_p \times L}$ where $L$ is the channel length. The receiver observes $\mathbf{y} \;=\; \mathbf{X}_p\, \mathbf{h} \;+\; \mathbf{w}, \qquad \mathbf{h} \in \mathbb{C}^L, \quad \mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I}),$ where $\mathbf{h}$ is the channel vector. In the Bayesian model, the receiver also knows the channel prior $\mathbf{h} \sim \mathcal{CN} (\mathbf{0}, \boldsymbol{\Sigma}_{h})$ from a fading model.

Assuming $N_p \geq L$ and $\mathbf{X}_p$ full column rank, the estimation problem is well posed. If $N_p < L$ the channel is under-determined and the prior becomes essential for a unique estimate.

Definition:
Least-Squares (LS) Channel Estimator

Treating $\mathbf{h}$ as a deterministic unknown, the LS estimator minimizes $\|\mathbf{y} - \mathbf{X}_p\mathbf{h}\|^2$ : $\hat{\mathbf{h}}_{\text{LS}} \;=\; (\mathbf{X}_p^H\mathbf{X}_p)^{-1}\mathbf{X}_p^H \,\mathbf{y}.$ It is unbiased ( $\mathbb{E}[\hat{\mathbf{h}}_{\text{LS}}|\mathbf{h}] = \mathbf{h}$ ) and its error covariance is $\text{Cov}(\hat{\mathbf{h}}_{\text{LS}} - \mathbf{h}) = \sigma^2 (\mathbf{X}_p^H\mathbf{X}_p)^{-1}$ .

The LS estimator is the BLUE (best linear unbiased estimator) by the Gauss–Markov theorem. It requires no prior; in particular, it doesn't know the channel's covariance structure.

Theorem: MMSE Channel Estimator

Under the Bayesian model, the LMMSE estimator of $\mathbf{h}$ from $\mathbf{y}$ is $\hat{\mathbf{h}}_{\text{MMSE}} \;=\; \boldsymbol{\Sigma}_{h} \mathbf{X}_p^H \big(\mathbf{X}_p \boldsymbol{\Sigma}_{h} \mathbf{X}_p^H + \sigma^2 \mathbf{I}\big)^{-1} \mathbf{y} .$ Equivalently, $\hat{\mathbf{h}}_{\text{MMSE}} \;=\; \big(\boldsymbol{\Sigma}_{h}^{-1} + {\sigma^2}^{-1} \mathbf{X}_p^H\mathbf{X}_p\big)^{-1} {\sigma^2}^{-1} \mathbf{X}_p^H \mathbf{y} .$ Since $(\mathbf{h}, \mathbf{y})$ is jointly Gaussian, this LMMSE is also the MMSE.

Proof

Compute covariances

$\text{Cov}(\mathbf{y}) = \mathbf{X}_p \boldsymbol{\Sigma}_{h} \mathbf{X}_p^H + \sigma^2\mathbf{I}$ and $\text{Cov}(\mathbf{h}, \mathbf{y}) = \boldsymbol{\Sigma}_{h} \mathbf{X}_p^H$ .

Apply the LMMSE formula

Plugging into $\hat{\mathbf{h}} = \mathbf{m}_h + \mathbf{A} (\mathbf{y} - \mathbf{m}_y)$ with $\mathbf{m}_h = \mathbf{0}$ , $\mathbf{m}_y = \mathbf{0}$ gives the first form.

Use Woodbury for the second form

$\big(\mathbf{X}_p \boldsymbol{\Sigma}_{h} \mathbf{X}_p^H + \sigma^2\mathbf{I}\big)^{-1}$ is rewritten via Woodbury, and simplification yields the regularized-LS expression.

Joint Gaussianity

Both $\mathbf{h}$ and $\mathbf{w}$ are Gaussian, so $\mathbf{y}$ is Gaussian and jointly Gaussian with $\mathbf{h}$ . Hence LMMSE = MMSE by TMMSE = LMMSE for Jointly Gaussian Pairs. $\blacksquare$

Theorem: MMSE Dominates LS in Trace MSE

For every pilot matrix $\mathbf{X}_p$ and every prior $\boldsymbol{\Sigma}_{h} \succ 0$ , $\text{tr}\big(\boldsymbol{\Sigma}_{h|y}^{\text{MMSE}}\big) \;\leq\; \text{tr}\big(\boldsymbol{\Sigma}_{h-\hat{h}}^{\text{LS}}\big),$ with equality in the limit of an uninformative prior $\boldsymbol{\Sigma}_{h} \to \infty$ .

The LS estimator ignores the prior; the MMSE estimator uses it. Since the MMSE minimizes Bayes MSE by construction, it cannot do worse than any other estimator — including LS — averaged over the prior.

Proof

MMSE is Bayes-optimal

The MMSE minimizes $\mathbb{E}[\|\mathbf{h} - \hat{\mathbf{h}}\|^2]$ over all estimators, where the expectation is over the joint distribution of $(\mathbf{h}, \mathbf{y})$ . The LS estimator is a specific (linear) estimator, so its Bayes MSE is at least the MMSE.

Identify the traces

For unbiased linear estimators the Bayes MSE equals the trace of the conditional error covariance, giving the stated inequality.

Recover LS in the prior-free limit

Sending $\boldsymbol{\Sigma}_{h}^{-1} \to \mathbf{0}$ in the regularized-LS form of $\hat{\mathbf{h}}_{\text{MMSE}}$ recovers $\hat{\mathbf{h}}_{\text{LS}}$ , and the MSE gap closes. $\blacksquare$

LS vs. MMSE Channel Estimation

MSE per channel tap for LS and MMSE channel estimation, as SNR varies. The channel prior is modeled as an exponential power delay profile with correlation between taps. At low SNR, MMSE exploits the prior to drastically beat LS; at high SNR, the gap closes.

Parameters

Channel length

L

8

Number of pilots

N_p

12

PDP decay (dB per tap)6

Max SNR (dB)30

Example: Orthogonal Pilots Simplify Everything

Suppose the pilot matrix is chosen so that $\mathbf{X}_p^H\mathbf{X}_p = N_p \mathbf{I}$ (orthogonal columns with equal power) and the channel prior is diagonal: $\boldsymbol{\Sigma}_{h} = \text{diag}(\sigma_1^2, \ldots, \sigma_L^2)$ . Compute the per-tap MMSE.

Solution

Substitute into Woodbury form

$\hat{\mathbf{h}}_{\text{MMSE}} = (\boldsymbol{\Sigma}_{h}^{-1} + \tfrac{N_p}{\sigma^2}\mathbf{I})^{-1}\, \tfrac{1}{\sigma^2}\mathbf{X}_p^H\mathbf{y}$ .

Per-tap gain

The matrix $(\boldsymbol{\Sigma}_{h}^{-1} + \tfrac{N_p}{\sigma^2}\mathbf{I})^{-1}$ is diagonal with entries $\frac{\sigma_\ell^2}{1 + N_p\sigma_\ell^2/\sigma^2}$ . The per-tap MMSE is $\text{MMSE}_\ell = \frac{\sigma_\ell^2 \cdot \sigma^2} {\sigma^2 + N_p\sigma_\ell^2}.$

Compare with LS

The LS per-tap MSE is $\sigma^2/N_p$ , independent of the prior. The ratio MMSE/LS $= \sigma_\ell^2 N_p/(\sigma^2 + N_p\sigma_\ell^2) \in (0,1)$ , strictly less than 1 whenever the prior is informative. Low-energy taps (small $\sigma_\ell^2$ ) see the biggest improvement — MMSE shrinks them toward zero. $\blacksquare$

LMMSE Shrinkage: From Data to Estimate

Visualization of how the LMMSE estimator combines the prior mean and the observation, with the shrinkage factor

\alpha

varying with SNR.

As SNR grows, the LMMSE estimator transitions from returning the prior mean (low SNR,

\alpha \approx 0

) to returning the raw observation (high SNR,

\alpha \approx 1

).

⚠️Engineering Note

Covariance Mismatch in Practice

The MMSE channel estimator assumes perfect knowledge of $\boldsymbol{\Sigma}_{h}$ and $\sigma^2$ . In deployed systems these are estimated from data — typically from long-term channel statistics and noise-only samples. A mismatched $\hat{\boldsymbol\Sigma}_h$ can make the "MMSE" estimator perform worse than LS, especially at high SNR where the prior matters least but mismatch still hurts. The robust engineering practice is a hybrid scheme: use LS at high SNR and MMSE at low SNR, or use a shrinkage estimator $\hat{\boldsymbol\Sigma}_h^{\text{shrink}} = (1-\gamma)\hat{\boldsymbol\Sigma}_h^{\text{sample}} + \gamma \mathbf{I}$ with $\gamma$ tuned to trade bias against variance.

Practical Constraints

•
3GPP NR uses LS with DFT-based smoothing as the baseline; MMSE is left to implementations
•
Typical covariance-matrix update period: tens of ms for pedestrian, a few ms for vehicular

🎓CommIT Contribution(2018)

Bayesian Channel Estimation for Massive MIMO

D. Neumann, T. Wiese, W. Utschick, G. Caire — IEEE Trans. Signal Processing, vol. 66, no. 11

In massive MIMO systems the channel covariance $\boldsymbol{\Sigma}_{h}$ is structured (Toeplitz/block-Toeplitz under one-ring scattering models) but rarely known exactly. This line of work — including the Caire group's contribution cited above — replaces the unknown covariance with a convolutional neural network that learns the MMSE mapping directly from pilot observations to channel estimates. The resulting estimator interpolates between LS (unstructured) and genuine Bayesian MMSE (perfectly known covariance), and is essentially tight when the training set matches the deployment scenario. This is a modern incarnation of the LS-vs-MMSE tradeoff discussed here.

channel-estimationmassive-mimolearned-estimatorsView Paper →

Quick Check

At what SNR does the LS channel estimator coincide with the MMSE estimator up to second order?

Low SNR ( $\sigma^2 \gg \|\mathbf{X}_p\|^2 \sigma_h^2$ )

High SNR ( $\sigma^2 \ll \|\mathbf{X}_p\|^2 \sigma_h^2$ )

When the pilot matrix is orthogonal

Never — they always differ

Correction:

High SNR (

\sigma^2 \ll \|\mathbf{X}_p\|^2 \sigma_h^2

)

At high SNR, the noise floor is small and the prior provides negligible additional information, so the MMSE estimator $\to$ LS estimator. Formally, $\boldsymbol{\Sigma}_{h}^{-1} + {\sigma^2}^{-1}\mathbf{X}_p^H\mathbf{X}_p \approx {\sigma^2}^{-1}\mathbf{X}_p^H\mathbf{X}_p$ when $\sigma^2 \to 0$ .

Shrinkage Estimator

An estimator that combines a data-driven estimate with a prior mean, pulling the answer toward the prior. The LMMSE is the canonical shrinkage estimator, with the shrinkage strength determined by the relative magnitudes of the prior covariance and the observation-noise covariance.

Application: Channel Estimation (LS vs. MMSE)