Ferkans — Interactive Telecom Tutor

Why the Wiener Filter Still Matters

The Wiener filter was designed in the early 1940s as a solution to the anti-aircraft fire-control problem: given a noisy radar track of an incoming aircraft, produce the best linear estimate of its future position so that the shell and the aircraft arrive at the same point at the same time. Eighty years later, every channel equalizer in a modern wireless modem, every beamformer in a microphone array, every noise-cancelling headphone, and the steady-state limit of every Kalman filter are, at their core, Wiener filters. The filter is a working engineer's most faithful companion.

The operational question is always the same. We observe a signal $Y_n$ that carries information about a desired quantity $X_n$ , corrupted by noise and by the channel. We want to produce the best linear estimate of $X_n$ from the observation sequence. "Best" means minimum mean-square error (MMSE), and "linear" means we restrict ourselves to convolution with a filter. Two questions then arise: what filter coefficients should we use, and do we allow the filter to look into the future of the observation (non-causal) or only into its past (causal)?

This chapter answers both questions. The answers are classical, but the geometry behind them — orthogonality of the error to the observations, spectral factorization as a Cholesky decomposition in the frequency domain, innovations as a whitened version of the observation — is what you will carry into every later chapter of this book.

Definition:
Jointly WSS Processes

Two zero-mean discrete-time random processes $\{X_n\}_{n \in \mathbb{Z}}$ and $\{Y_n\}_{n \in \mathbb{Z}}$ are jointly wide-sense stationary (jointly WSS) if (i) each is individually WSS, with autocorrelations $r_{xx}[k] = \mathbb{E}[X_{n+k} X_n^*]$ and $r_{yy}[k] = \mathbb{E}[Y_{n+k} Y_n^*]$ depending only on the lag $k$ , and (ii) the cross-correlation $r_{xy}[k] = \mathbb{E}[X_{n+k} Y_n^*]$ depends only on $k$ and not on $n$ .

The power spectral densities and cross-PSD are the discrete-time Fourier transforms of the correlation sequences: ${P_x}_{x}(f) = \sum_{k \in \mathbb{Z}} r_{xx}[k] e^{-j 2\pi f k}, \qquad {P_x}_{y}(f) = \sum_{k \in \mathbb{Z}} r_{yy}[k] e^{-j 2\pi f k}, \qquad P_{xy}(f) = \sum_{k \in \mathbb{Z}} r_{xy}[k] e^{-j 2\pi f k},$ for $f \in [-1/2, 1/2]$ .

By the Wiener-Khinchin theorem, $P_x(f)$ and $P_y(f)$ are real and non-negative. The cross-PSD $P_{xy}(f)$ is in general complex; it satisfies the Hermitian relation $P_{xy}(f) = P_{yx}^*(f)$ .

Definition:
The Wiener Filtering Problem

Let $\{X_n\}, \{Y_n\}$ be jointly WSS with known second-order statistics $r_{xx}, r_{yy}, r_{xy}$ . Produce a linear estimate $\hat{X}_n = \sum_{k \in \mathcal{K}} h[k]\, Y_{n-k}$ that minimizes the mean-square error $\sigma_h^2 = \mathbb{E}[|X_n - \hat{X}_n|^2]$ over all choices of the filter coefficients $\{h[k]\}_{k \in \mathcal{K}}$ . Three canonical choices for the index set $\mathcal{K}$ are:

$\mathcal{K}$	Name	Uses
$\mathbb{Z}$ (all lags)	Non-causal Wiener filter	Smoothing, off-line processing
$\mathbb{Z}_{\geq 0}$ (non-negative lags)	Causal Wiener filter	Real-time filtering
$\mathbb{Z}_{\geq 1}$ (positive lags)	Wiener predictor	Forecasting the future

The three problems share a single geometric structure: project $X_n$ onto the closed linear span of $\{Y_m : n - m \in \mathcal{K}\}$ in the Hilbert space of zero-mean finite-variance random variables with inner product $\langle U, V \rangle = \mathbb{E}[U V^*]$ .

Theorem: Orthogonality Principle (Wiener-Hopf Equations)

Let $\hat{X}_n = \sum_{k \in \mathcal{K}} h[k] Y_{n-k}$ be a linear estimator of $X_n$ from $\{Y_m\}$ . Then $\hat{X}_n$ is the MMSE estimator over the class of filters with support $\mathcal{K}$ if and only if the error $E_n = X_n - \hat{X}_n$ is orthogonal to every observation used in the estimate: $\mathbb{E}\big[E_n\, Y_{n-\ell}^*\big] = 0 \quad \text{for every } \ell \in \mathcal{K}.$ Equivalently, the filter coefficients $\{h[k]\}$ satisfy the Wiener-Hopf normal equations: $\sum_{k \in \mathcal{K}} h[k]\, r_{yy}[\ell - k] = r_{xy}[\ell], \qquad \ell \in \mathcal{K}.$

Think of $\hat{X}_n$ as the orthogonal projection of $X_n$ onto the subspace spanned by the observations $\{Y_{n-k} : k \in \mathcal{K}\}$ . The projection is the point in the subspace closest to $X_n$ , and the residual — the error — is orthogonal to the subspace. This is the same picture as the least-squares normal equation $\mathbf{A}^H \mathbf{A} \mathbf{h} = \mathbf{A}^H \mathbf{x}$ in linear algebra, translated to the infinite-dimensional setting of random processes.

Proof

Necessity (orthogonality $\Rightarrow$ optimality)

Suppose $\{h[k]\}$ satisfies the orthogonality condition. Let $\{g[k]\}$ be any other filter with support $\mathcal{K}$ and write the two estimators as $\hat{X}_n^{(h)}$ and $\hat{X}_n^{(g)}$ . The error of $\hat{X}_n^{(g)}$ is $X_n - \hat{X}_n^{(g)} = (X_n - \hat{X}_n^{(h)}) + (\hat{X}_n^{(h)} - \hat{X}_n^{(g)}) = E_n + D_n,$ where $D_n = \sum_{k} (h[k] - g[k]) Y_{n-k}$ is a linear combination of observations in $\mathcal{K}$ . By the orthogonality assumption, $\mathbb{E}[E_n D_n^*] = 0$ . Hence $\mathbb{E}[|X_n - \hat{X}_n^{(g)}|^2] = \mathbb{E}[|E_n|^2] + \mathbb{E}[|D_n|^2] \geq \mathbb{E}[|E_n|^2],$ with equality iff $D_n = 0$ almost surely, i.e., $g = h$ . So $h$ achieves the minimum.

Sufficiency (optimality $\Rightarrow$ orthogonality)

Conversely, if $h$ is MMSE-optimal, then for every $\ell \in \mathcal{K}$ and every $\alpha \in \mathbb{C}$ the perturbed filter $h + \alpha \delta_\ell$ produces an error with larger or equal MSE. Writing the MSE as a function of $\alpha$ and setting the derivative at $\alpha = 0$ to zero yields $\mathbb{E}[E_n Y_{n-\ell}^*] = 0$ .

Wiener-Hopf equations

Substituting $E_n = X_n - \sum_k h[k] Y_{n-k}$ into $\mathbb{E}[E_n Y_{n-\ell}^*] = 0$ and using joint WSS gives $\mathbb{E}[X_n Y_{n-\ell}^*] = \sum_k h[k]\, \mathbb{E}[Y_{n-k} Y_{n-\ell}^*] \;\Longrightarrow\; r_{xy}[\ell] = \sum_{k \in \mathcal{K}} h[k]\, r_{yy}[\ell - k],$ which is the stated normal equation.

The Wiener-Hopf Equation as a Toeplitz System

When $\mathcal{K} = \{0, 1, \ldots, N-1\}$ (finite-length FIR Wiener filter), the Wiener-Hopf equations become a finite linear system: $\mathbf{R}_{yy}\, \mathbf{h} = \mathbf{r}_{xy}$ , where $\mathbf{R}_{yy}$ is a Hermitian Toeplitz matrix with entries $r_{yy}[\ell - k]$ . Solving this system in the finite case is a standard linear algebra exercise. The point is that as $N \to \infty$ , Toeplitz matrices are asymptotically diagonalized by the Fourier basis (this is Szego's theorem), and the infinite Wiener-Hopf equation admits a closed-form frequency-domain solution. The whole architecture of Section 9.2 rests on this passage from finite Toeplitz systems to their frequency-domain limit.

Theorem: MMSE of the Wiener Estimator

If $\hat{X}_n = \sum_{k} h[k] Y_{n-k}$ satisfies the Wiener-Hopf equations, then the MMSE is $\sigma^2 = \mathbb{E}[|X_n|^2] - \sum_{k} h[k]\, r_{xy}^*[k] = r_{xx}[0] - \sum_{k} h[k]\, r_{xy}^*[k].$

The MMSE is the variance of $X_n$ minus the variance of the estimate. The cross term $\sum h[k] r_{xy}^*[k]$ is the inner product of the filter with the cross-correlation — a measure of how much information the observations carry about the signal.

Proof

Expand the MSE

$\sigma^2 = \mathbb{E}[|E_n|^2] = \mathbb{E}[E_n (X_n - \hat{X}_n)^*] = \mathbb{E}[E_n X_n^*] - \mathbb{E}[E_n \hat{X}_n^*]$ .

The second term vanishes by orthogonality

$\mathbb{E}[E_n \hat{X}_n^*] = \mathbb{E}[E_n \sum_k h^*[k] Y_{n-k}^*] = \sum_k h^*[k]\,\mathbb{E}[E_n Y_{n-k}^*] = 0$ since each inner term is zero by the orthogonality principle.

Simplify the first term

$\mathbb{E}[E_n X_n^*] = \mathbb{E}[X_n X_n^*] - \mathbb{E}[\hat{X}_n X_n^*] = r_{xx}[0] - \sum_k h[k] \mathbb{E}[Y_{n-k} X_n^*] = r_{xx}[0] - \sum_k h[k] r_{xy}^*[k]$ .

Example: A 2-Tap FIR Wiener Filter

Let $X_n$ be a zero-mean WSS process with $r_{xx}[0] = 1$ , $r_{xx}[1] = 0.5$ , and let $Y_n = X_n + Z_n$ where $Z_n$ is white noise with variance $0.25$ independent of $X$ . Design the optimal 2-tap causal FIR Wiener filter $\hat{X}_n = h[0] Y_n + h[1] Y_{n-1}$ .

Solution

Compute the required correlations

Since $X$ and $Z$ are independent and $Z$ is white: $r_{yy}[0] = r_{xx}[0] + \sigma_z^2 = 1.25$ , $r_{yy}[1] = r_{xx}[1] = 0.5$ . Also $r_{xy}[k] = \mathbb{E}[X_n Y_{n-k}^*] = \mathbb{E}[X_n(X_{n-k} + Z_{n-k})^*] = r_{xx}[k]$ , so $r_{xy}[0] = 1$ and $r_{xy}[1] = 0.5$ .

Form the Wiener-Hopf system

The normal equations for $\ell \in \{0, 1\}$ read $\begin{bmatrix} r_{yy}[0] & r_{yy}[-1] \\ r_{yy}[1] & r_{yy}[0] \end{bmatrix} \begin{bmatrix} h[0] \\ h[1] \end{bmatrix} = \begin{bmatrix} r_{xy}[0] \\ r_{xy}[1] \end{bmatrix} \;\Longrightarrow\; \begin{bmatrix} 1.25 & 0.5 \\ 0.5 & 1.25 \end{bmatrix} \begin{bmatrix} h[0] \\ h[1] \end{bmatrix} = \begin{bmatrix} 1 \\ 0.5 \end{bmatrix}.$

Solve

The determinant is $1.25^2 - 0.25 = 1.3125$ . Inverting gives $h[0] = (1.25 \cdot 1 - 0.5 \cdot 0.5)/1.3125 = 1.0/1.3125 \approx 0.7619$ , $h[1] = (-0.5 \cdot 1 + 1.25 \cdot 0.5)/1.3125 = 0.125/1.3125 \approx 0.0952$ .

Compute the MMSE

By TMMSE of the Wiener Estimator: $\sigma^2 = 1 - h[0] \cdot r_{xy}^*[0] - h[1] \cdot r_{xy}^*[1] = 1 - 0.7619 - 0.0952 \cdot 0.5 \approx 0.1905.$ For comparison, using just $Y_n$ (a 1-tap filter) gives $h_1 = r_{xy}[0]/r_{yy}[0] = 1/1.25 = 0.8$ and MSE $= 1 - 0.8 = 0.2$ . Adding the second tap reduces the MSE by about 5%.

Common Mistake: Orthogonality Is to the Observations, Not to the Signal

Mistake:

Students sometimes write $\mathbb{E}[E_n X_n^*] = 0$ , confusing the orthogonality principle with a statement about the signal.

Correction:

The error $E_n$ is orthogonal to every observation $Y_{n-\ell}$ used in the estimate, never to the target $X_n$ . In fact $\mathbb{E}[E_n X_n^*] = \sigma^2 \neq 0$ in general — this is precisely the MMSE (see the proof of TMMSE of the Wiener Estimator).

Wiener-Hopf Equation

The linear system $\sum_k h[k] r_{yy}[\ell-k] = r_{xy}[\ell]$ , $\ell \in \mathcal{K}$ , whose solution is the MMSE Wiener filter. When $\mathcal{K} = \mathbb{Z}$ it becomes a convolution equation solvable in the frequency domain; when $\mathcal{K} = \mathbb{Z}_{\geq 0}$ it requires spectral factorization.

The Wiener-Hopf Equation and the Orthogonality Principle