Ferkans — Interactive Telecom Tutor

Why Channel Estimation Matters

Coherent detection (Section 9.4) assumes perfect CSI, but in practice the channel must be estimated from the received signal. Modern systems embed known pilot symbols (reference signals) into the transmitted frame, and the receiver uses these to estimate the channel. The quality of the channel estimate directly limits the detection performance: estimation errors cause an irreducible SNR floor that no amount of transmit power can overcome. This section develops the two main pilot-based estimation methods (LS and MMSE) and analyses their impact on system performance.

Definition:
Pilot-Based Channel Estimation

In pilot-based channel estimation, the transmitter inserts known symbols $\{x_p[n]\}_{n=1}^{N_p}$ (pilots) at predetermined time-frequency positions. The received pilot observations are

$y_p[n] = h[n]\, x_p[n] + w[n], \qquad n = 1, \ldots, N_p$

In matrix form:

$\mathbf{y}_p = \mathbf{X}_p \mathbf{h} + \mathbf{w}$

where $\mathbf{X}_p = \operatorname{diag}(x_p[1], \ldots, x_p[N_p])$ and $\mathbf{h} = [h[1], \ldots, h[N_p]]^T$ is the channel vector at pilot positions.

The channel at data symbol positions is obtained by interpolation (in time, frequency, or both) from the pilot estimates.

Design considerations:

Pilot density must satisfy the Nyquist criterion in both time ( $\Delta t \leq 1/(2f_D)$ , coherence time) and frequency ( $\Delta f \leq 1/\tau_{\max}$ , coherence bandwidth)
Pilot power vs data power trade-off: more pilot power improves estimation but reduces power available for data
Pilot overhead reduces spectral efficiency by a factor $(1 - N_p/N_{\text{total}})$

Theorem: LS Channel Estimator and Its MSE

The least-squares (LS) channel estimator minimises $\|\mathbf{y}_p - \mathbf{X}_p \mathbf{h}\|^2$ and is given by

$\hat{\mathbf{h}}_{\text{LS}} = (\mathbf{X}_p^H \mathbf{X}_p)^{-1} \mathbf{X}_p^H \mathbf{y}_p = \mathbf{X}_p^{-1} \mathbf{y}_p$

For the diagonal pilot matrix, this simplifies to element-wise division:

$\hat{h}_{\text{LS}}[n] = \frac{y_p[n]}{x_p[n]}, \qquad n = 1, \ldots, N_p$

The MSE per channel coefficient is

$\text{MSE}_{\text{LS}} = E\!\left[|\hat{h}[n] - h[n]|^2\right] = \frac{\sigma_w^2}{|x_p[n]|^2} = \frac{\sigma_w^2}{E_p}$

where $E_p = |x_p[n]|^2$ is the pilot energy.

The LS estimator is simple and unbiased, but it amplifies noise equally at all pilot positions regardless of the channel's statistical properties. It uses no prior knowledge about the channel.

Proof

Derivation

The cost function is $J(\mathbf{h}) = \|\mathbf{y}_p - \mathbf{X}_p\mathbf{h}\|^2$ .

Setting $\nabla_{\mathbf{h}} J = -2\mathbf{X}_p^H(\mathbf{y}_p - \mathbf{X}_p\mathbf{h}) = \mathbf{0}$ :

$\hat{\mathbf{h}}_{\text{LS}} = (\mathbf{X}_p^H\mathbf{X}_p)^{-1}\mathbf{X}_p^H\mathbf{y}_p$

MSE

$\hat{\mathbf{h}}_{\text{LS}} - \mathbf{h} = \mathbf{X}_p^{-1}\mathbf{w}$

$\text{MSE} = E[\|\mathbf{X}_p^{-1}\mathbf{w}\|^2/N_p] = \sigma_w^2 \operatorname{tr}((\mathbf{X}_p^H\mathbf{X}_p)^{-1})/N_p = \sigma_w^2/E_p$

when all pilots have equal power $E_p$ . $\blacksquare$

,

Theorem: MMSE Channel Estimator and Its MSE

The MMSE channel estimator exploits the channel's second-order statistics $\mathbf{R}_{hh} = E[\mathbf{h}\mathbf{h}^H]$ :

$\hat{\mathbf{h}}_{\text{MMSE}} = \mathbf{R}_{hh} \left(\mathbf{R}_{hh} + \frac{\sigma_w^2}{E_p}\mathbf{I}\right)^{-1} \hat{\mathbf{h}}_{\text{LS}}$

Equivalently:

$\hat{\mathbf{h}}_{\text{MMSE}} = \mathbf{R}_{hh}\mathbf{X}_p^H (\mathbf{X}_p\mathbf{R}_{hh}\mathbf{X}_p^H + \sigma_w^2\mathbf{I})^{-1} \mathbf{y}_p$

The MSE matrix is

$\mathbf{C}_e = \mathbf{R}_{hh} - \mathbf{R}_{hh}(\mathbf{R}_{hh} + \sigma_w^2 E_p^{-1}\mathbf{I})^{-1} \mathbf{R}_{hh}$

At high SNR: $\text{MSE}_{\text{MMSE}} \to 0$ (same as LS).

At low SNR: $\text{MSE}_{\text{MMSE}} \to \sigma_h^2$ (reverts to prior), while $\text{MSE}_{\text{LS}} \to \infty$ .

The MSE gain of MMSE over LS is

$\frac{\text{MSE}_{\text{LS}}}{\text{MSE}_{\text{MMSE}}} = 1 + \frac{E_p \sigma_h^2}{\sigma_w^2} = 1 + \text{SNR} \cdot \sigma_h^2$

which is significant at low-to-moderate SNR.

The MMSE estimator applies a Wiener filter to the LS estimate, suppressing noise in directions where the channel has low energy (eigenvalues of $\mathbf{R}_{hh}$ ). It is a regularised version of LS that shrinks the estimate toward zero when the data are noisy.

Proof

Bayesian framework

With $\mathbf{h} \sim \mathcal{CN}(\mathbf{0}, \mathbf{R}_{hh})$ and $\mathbf{y}_p \mid \mathbf{h} \sim \mathcal{CN}(\mathbf{X}_p\mathbf{h}, \sigma_w^2\mathbf{I})$ , the posterior $p(\mathbf{h} \mid \mathbf{y}_p)$ is Gaussian.

The MMSE estimate is the posterior mean: $\hat{\mathbf{h}}_{\text{MMSE}} = E[\mathbf{h} \mid \mathbf{y}_p]$ .

Matrix Wiener filter

Using the standard result for jointly Gaussian vectors:

$\hat{\mathbf{h}}_{\text{MMSE}} = \mathbf{C}_{h,y_p}\mathbf{C}_{y_p}^{-1}\mathbf{y}_p$

$= \mathbf{R}_{hh}\mathbf{X}_p^H(\mathbf{X}_p\mathbf{R}_{hh}\mathbf{X}_p^H + \sigma_w^2\mathbf{I})^{-1}\mathbf{y}_p$

For equal-power pilots with $\mathbf{X}_p^H\mathbf{X}_p = E_p\mathbf{I}$ , this simplifies to the form given above. $\blacksquare$

,

Example: Pilot Design for OFDM

An OFDM system has $N = 1024$ subcarriers, subcarrier spacing $\Delta f = 15$ kHz, and operates over a channel with maximum delay spread $\tau_{\max} = 5\,\mu$ s and maximum Doppler spread $f_D = 300$ Hz.

(a) What is the minimum pilot spacing in frequency?

(b) What is the minimum pilot spacing in time?

(c) What fraction of resources must be devoted to pilots?

(d) What is the pilot overhead penalty on spectral efficiency?

Solution

Frequency pilot spacing

Coherence bandwidth: $B_c \approx 1/\tau_{\max} = 200$ kHz.

Pilot spacing in frequency must satisfy the Nyquist criterion: $\Delta f_p \leq B_c = 200$ kHz.

In subcarriers: $\Delta f_p / \Delta f = 200/15 \approx 13$ subcarriers.

Use every 12th subcarrier for pilots (conservative).

Time pilot spacing

Coherence time: $T_c \approx 1/(2f_D) = 1/600 \approx 1.67$ ms.

OFDM symbol duration: $T_{\text{sym}} = 1/\Delta f + T_{\text{CP}} \approx 71.4\,\mu$ s.

Pilot spacing in time: $\Delta t_p \leq T_c = 1.67$ ms.

In OFDM symbols: $1670/71.4 \approx 23$ symbols.

Use pilots every 14 symbols (matching one slot in 5G NR).

Pilot fraction

Pilots per resource block: $1/12$ in frequency $\times$ $1/14$ in time.

Pilot overhead: $1/(12 \times 14) \approx 0.6$ %.

In practice, 5G NR DMRS occupies roughly 4-8% of resources due to multiple antenna ports and front-loaded patterns.

Spectral efficiency penalty

Assuming 5% pilot overhead: $\eta_{\text{eff}} = (1 - 0.05) \eta_{\text{ideal}} = 0.95\, \eta_{\text{ideal}}$

This is a 0.22 dB penalty in spectral efficiency: a small price for enabling coherent detection. $\blacksquare$

Channel Estimation MSE: LS vs MMSE

Compare the MSE of LS and MMSE channel estimators as a function of SNR. The MMSE estimator exploits channel correlation and significantly outperforms LS at low SNR. The gap decreases at high SNR as both estimators converge. Increase the number of pilots to see both MSE curves shift down.

Parameters

SNR range (dB)

Number of pilots32

Estimator

LMMSE Shrinkage vs LS Estimation

Watch how the LMMSE estimator behaves as SNR sweeps from

-5

dB to

30

dB. At low SNR, the LMMSE aggressively shrinks the estimate toward zero (the prior mean), dramatically reducing noise. At high SNR, the LMMSE converges to the LS estimate as the data become informative enough to override the prior.

Red dots: LS estimates (noisy at low SNR). Green dots: LMMSE estimates (shrunk toward zero). White bars: true channel taps.

Quick Check

At low SNR (0 dB), the MMSE channel estimator significantly outperforms the LS estimator. What is the primary reason?

MMSE exploits prior knowledge of channel statistics to suppress noise

MMSE uses more pilot symbols

MMSE has lower computational complexity

LS requires knowledge of noise variance, which is harder to obtain

Correction:

MMSE exploits prior knowledge of channel statistics to suppress noise

The MMSE estimator uses the channel correlation matrix $\mathbf{R}_{hh}$ to distinguish between signal and noise components. At low SNR, where noise dominates, this prior knowledge is extremely valuable: the MMSE estimator shrinks the estimate toward the prior, greatly reducing MSE.

Common Mistake: Insufficient Pilot Density

Mistake:

Using a pilot spacing wider than the channel's coherence bandwidth (in frequency) or coherence time (in time), causing aliasing in the channel estimate.

Correction:

The channel must be sampled at the Nyquist rate in both time and frequency:

Frequency: $\Delta f_p \leq 1/\tau_{\max}$ (coherence bandwidth)
Time: $\Delta t_p \leq 1/(2f_D)$ (coherence time)

Violating these conditions causes the estimated channel to alias (wrap around), producing a completely wrong estimate. This is equivalent to undersampling a bandlimited signal.

In high-mobility scenarios (high-speed trains: $f_D > 1$ kHz), the coherence time can be shorter than a single OFDM symbol, requiring special pilot patterns or non-pilot-based methods.

Decision-Directed Estimation

Decision-directed (DD) estimation uses detected data symbols as additional "pilots" to refine the channel estimate. After initial pilot-based estimation and detection, the detected symbols $\hat{x}[n]$ replace the unknown $x[n]$ in the estimation problem:

$\hat{h}_{\text{DD}}[n] = y[n] / \hat{x}[n]$

Advantages:

Uses all received symbols (not just pilots) for estimation
Can track slow channel variations without additional pilots
Reduces pilot overhead

Risks:

Error propagation: incorrect decisions produce incorrect channel estimates, which cause more detection errors
Works well only when the initial BER is low (below $\sim 1$ %)
Not suitable for initial acquisition (needs a bootstrap phase with pilots)

DD estimation is widely used in practice as a refinement step after pilot-based initial estimation.

LS vs MMSE Channel Estimation

Aspect	LS Estimator	MMSE Estimator
Formula	$\hat{\mathbf{h}} = \mathbf{X}_p^{-1}\mathbf{y}_p$	$\hat{\mathbf{h}} = \mathbf{R}_{hh}(\mathbf{R}_{hh} + \frac{\sigma_w^2}{E_p}\mathbf{I})^{-1}\hat{\mathbf{h}}_{\text{LS}}$
Prior knowledge	None	Channel correlation $\mathbf{R}_{hh}$ , noise $\sigma_w^2$
Bias	Unbiased	Biased (toward zero)
MSE	$\sigma_w^2/E_p$	$< \sigma_w^2/E_p$ (always lower)
Complexity	$O(N_p)$	$O(N_p^3)$ (matrix inversion)
Low-SNR behaviour	MSE $\to \infty$	MSE $\to \sigma_h^2$ (bounded)
High-SNR behaviour	Converges to MMSE	Approaches LS

Why This Matters: Channel Estimation in 5G NR (DMRS)

5G NR uses Demodulation Reference Signals (DMRS) for channel estimation. DMRS patterns are defined in 3GPP TS 38.211 and have several key design features:

Front-loaded: DMRS is placed early in the slot to minimise decoding latency (the receiver can start channel estimation immediately)
Configurable density: 1 or 2 DMRS symbols per slot, with additional DMRS for high-mobility scenarios
Comb-type in frequency: DMRS occupies every 2nd or 3rd subcarrier, interleaved across antenna ports
Orthogonal across ports: different antenna ports use different DMRS sequences (CDM, FDM, or TDM) for MIMO estimation

The receiver typically applies an LMMSE interpolation filter to the LS pilot estimates, using channel correlation models derived from the power delay profile and Doppler spectrum. This is a direct application of the MMSE estimator of Theorem 9.5 to the OFDM frequency domain.

⚠️Engineering Note

MMSE Estimator Complexity and Practical Approximations

The MMSE channel estimator requires inverting an $N_p \times N_p$ matrix $(\mathbf{R}_{hh} + \frac{\sigma_w^2}{E_p}\mathbf{I})$ , costing $O(N_p^3)$ operations. For 5G NR with large bandwidth parts ( $N_p > 100$ pilot subcarriers), this becomes a bottleneck in real-time baseband processing.

Practical approximations used in deployed systems:

Reduced-rank MMSE: Project $\mathbf{R}_{hh}$ onto its dominant $r \ll N_p$ eigenvectors (from the channel's power delay profile). Complexity drops to $O(rN_p)$ . In 5G NR, $r \leq L$ (channel taps), typically 4-16.
Banded approximation: Approximate $\mathbf{R}_{hh}$ as a banded matrix when the coherence bandwidth is much smaller than the total bandwidth. Enables $O(N_p)$ Cholesky-based inversion.
DFT-based MMSE: Transform to the delay domain where $\mathbf{R}_{hh}$ is diagonal, apply scalar MMSE per tap, transform back. Complexity: $O(N_p \log N_p)$ . This is the most common approach in practical OFDM receivers.

Hardware implementations typically operate at the DFT-based MMSE level, achieving within 0.2-0.5 dB of the full MMSE at a fraction of the complexity.

Practical Constraints

•
Full MMSE: O(N_p^3) — impractical for N_p > 100 in real-time
•
DFT-based MMSE: O(N_p log N_p) — standard in 4G/5G baseband chips
•
Requires knowledge of power delay profile (updated every ~100 ms)

📋 Ref: 3GPP TS 38.211 (DMRS configuration), 3GPP TS 38.214 (CSI framework)

Key Takeaway

The core message of this section in three points:

MMSE uses prior knowledge for better accuracy: by exploiting channel statistics ( $\mathbf{R}_{hh}$ ), the MMSE estimator achieves lower MSE than LS at every SNR, with the largest gain at low SNR where prior knowledge is most valuable.
Pilot density is governed by channel coherence: pilots must sample the channel at the Nyquist rate in both time and frequency. Under-sampling causes aliasing, while over-sampling wastes spectral efficiency.
Estimation error creates an effective SNR ceiling: with imperfect CSI, the effective SNR is approximately $\text{SNR}_{\text{eff}} = \text{SNR}/(1 + \text{SNR} \cdot \sigma_e^2)$ , which saturates at $1/\sigma_e^2$ regardless of transmit power.

Pilot Symbol

A known transmitted symbol inserted into the data stream at predetermined time-frequency positions to enable channel estimation at the receiver. Also called reference signal, training symbol, or preamble (depending on context).

Channel Estimation