The Wiener-Hopf Equation and the Orthogonality Principle

Why the Wiener Filter Still Matters

The Wiener filter was designed in the early 1940s as a solution to the anti-aircraft fire-control problem: given a noisy radar track of an incoming aircraft, produce the best linear estimate of its future position so that the shell and the aircraft arrive at the same point at the same time. Eighty years later, every channel equalizer in a modern wireless modem, every beamformer in a microphone array, every noise-cancelling headphone, and the steady-state limit of every Kalman filter are, at their core, Wiener filters. The filter is a working engineer's most faithful companion.

The operational question is always the same. We observe a signal YnY_n that carries information about a desired quantity XnX_n, corrupted by noise and by the channel. We want to produce the best linear estimate of XnX_n from the observation sequence. "Best" means minimum mean-square error (MMSE), and "linear" means we restrict ourselves to convolution with a filter. Two questions then arise: what filter coefficients should we use, and do we allow the filter to look into the future of the observation (non-causal) or only into its past (causal)?

This chapter answers both questions. The answers are classical, but the geometry behind them β€” orthogonality of the error to the observations, spectral factorization as a Cholesky decomposition in the frequency domain, innovations as a whitened version of the observation β€” is what you will carry into every later chapter of this book.

Definition:

Jointly WSS Processes

Two zero-mean discrete-time random processes {Xn}n∈Z\{X_n\}_{n \in \mathbb{Z}} and {Yn}n∈Z\{Y_n\}_{n \in \mathbb{Z}} are jointly wide-sense stationary (jointly WSS) if (i) each is individually WSS, with autocorrelations rxx[k]=E[Xn+kXnβˆ—]r_{xx}[k] = \mathbb{E}[X_{n+k} X_n^*] and ryy[k]=E[Yn+kYnβˆ—]r_{yy}[k] = \mathbb{E}[Y_{n+k} Y_n^*] depending only on the lag kk, and (ii) the cross-correlation rxy[k]=E[Xn+kYnβˆ—]r_{xy}[k] = \mathbb{E}[X_{n+k} Y_n^*] depends only on kk and not on nn.

The power spectral densities and cross-PSD are the discrete-time Fourier transforms of the correlation sequences: Pxx(f)=βˆ‘k∈Zrxx[k]eβˆ’j2Ο€fk,Pxy(f)=βˆ‘k∈Zryy[k]eβˆ’j2Ο€fk,Pxy(f)=βˆ‘k∈Zrxy[k]eβˆ’j2Ο€fk,{P_x}_{x}(f) = \sum_{k \in \mathbb{Z}} r_{xx}[k] e^{-j 2\pi f k}, \qquad {P_x}_{y}(f) = \sum_{k \in \mathbb{Z}} r_{yy}[k] e^{-j 2\pi f k}, \qquad P_{xy}(f) = \sum_{k \in \mathbb{Z}} r_{xy}[k] e^{-j 2\pi f k}, for f∈[βˆ’1/2,1/2]f \in [-1/2, 1/2].

By the Wiener-Khinchin theorem, Px(f)P_x(f) and Py(f)P_y(f) are real and non-negative. The cross-PSD Pxy(f)P_{xy}(f) is in general complex; it satisfies the Hermitian relation Pxy(f)=Pyxβˆ—(f)P_{xy}(f) = P_{yx}^*(f).

Definition:

The Wiener Filtering Problem

Let {Xn},{Yn}\{X_n\}, \{Y_n\} be jointly WSS with known second-order statistics rxx,ryy,rxyr_{xx}, r_{yy}, r_{xy}. Produce a linear estimate X^n=βˆ‘k∈Kh[k] Ynβˆ’k\hat{X}_n = \sum_{k \in \mathcal{K}} h[k]\, Y_{n-k} that minimizes the mean-square error Οƒh2=E[∣Xnβˆ’X^n∣2]\sigma_h^2 = \mathbb{E}[|X_n - \hat{X}_n|^2] over all choices of the filter coefficients {h[k]}k∈K\{h[k]\}_{k \in \mathcal{K}}. Three canonical choices for the index set K\mathcal{K} are:

K\mathcal{K} Name Uses
Z\mathbb{Z} (all lags) Non-causal Wiener filter Smoothing, off-line processing
Zβ‰₯0\mathbb{Z}_{\geq 0} (non-negative lags) Causal Wiener filter Real-time filtering
Zβ‰₯1\mathbb{Z}_{\geq 1} (positive lags) Wiener predictor Forecasting the future

The three problems share a single geometric structure: project XnX_n onto the closed linear span of {Ym:nβˆ’m∈K}\{Y_m : n - m \in \mathcal{K}\} in the Hilbert space of zero-mean finite-variance random variables with inner product ⟨U,V⟩=E[UVβˆ—]\langle U, V \rangle = \mathbb{E}[U V^*].

Theorem: Orthogonality Principle (Wiener-Hopf Equations)

Let X^n=βˆ‘k∈Kh[k]Ynβˆ’k\hat{X}_n = \sum_{k \in \mathcal{K}} h[k] Y_{n-k} be a linear estimator of XnX_n from {Ym}\{Y_m\}. Then X^n\hat{X}_n is the MMSE estimator over the class of filters with support K\mathcal{K} if and only if the error En=Xnβˆ’X^nE_n = X_n - \hat{X}_n is orthogonal to every observation used in the estimate: E[En Ynβˆ’β„“βˆ—]=0forΒ everyΒ β„“βˆˆK.\mathbb{E}\big[E_n\, Y_{n-\ell}^*\big] = 0 \quad \text{for every } \ell \in \mathcal{K}. Equivalently, the filter coefficients {h[k]}\{h[k]\} satisfy the Wiener-Hopf normal equations: βˆ‘k∈Kh[k] ryy[β„“βˆ’k]=rxy[β„“],β„“βˆˆK.\sum_{k \in \mathcal{K}} h[k]\, r_{yy}[\ell - k] = r_{xy}[\ell], \qquad \ell \in \mathcal{K}.

Think of X^n\hat{X}_n as the orthogonal projection of XnX_n onto the subspace spanned by the observations {Ynβˆ’k:k∈K}\{Y_{n-k} : k \in \mathcal{K}\}. The projection is the point in the subspace closest to XnX_n, and the residual β€” the error β€” is orthogonal to the subspace. This is the same picture as the least-squares normal equation AHAh=AHx\mathbf{A}^H \mathbf{A} \mathbf{h} = \mathbf{A}^H \mathbf{x} in linear algebra, translated to the infinite-dimensional setting of random processes.

The Wiener-Hopf Equation as a Toeplitz System

When K={0,1,…,Nβˆ’1}\mathcal{K} = \{0, 1, \ldots, N-1\} (finite-length FIR Wiener filter), the Wiener-Hopf equations become a finite linear system: Ryy h=rxy\mathbf{R}_{yy}\, \mathbf{h} = \mathbf{r}_{xy}, where Ryy\mathbf{R}_{yy} is a Hermitian Toeplitz matrix with entries ryy[β„“βˆ’k]r_{yy}[\ell - k]. Solving this system in the finite case is a standard linear algebra exercise. The point is that as Nβ†’βˆžN \to \infty, Toeplitz matrices are asymptotically diagonalized by the Fourier basis (this is Szego's theorem), and the infinite Wiener-Hopf equation admits a closed-form frequency-domain solution. The whole architecture of Section 9.2 rests on this passage from finite Toeplitz systems to their frequency-domain limit.

Theorem: MMSE of the Wiener Estimator

If X^n=βˆ‘kh[k]Ynβˆ’k\hat{X}_n = \sum_{k} h[k] Y_{n-k} satisfies the Wiener-Hopf equations, then the MMSE is Οƒ2=E[∣Xn∣2]βˆ’βˆ‘kh[k] rxyβˆ—[k]=rxx[0]βˆ’βˆ‘kh[k] rxyβˆ—[k].\sigma^2 = \mathbb{E}[|X_n|^2] - \sum_{k} h[k]\, r_{xy}^*[k] = r_{xx}[0] - \sum_{k} h[k]\, r_{xy}^*[k].

The MMSE is the variance of XnX_n minus the variance of the estimate. The cross term βˆ‘h[k]rxyβˆ—[k]\sum h[k] r_{xy}^*[k] is the inner product of the filter with the cross-correlation β€” a measure of how much information the observations carry about the signal.

Example: A 2-Tap FIR Wiener Filter

Let XnX_n be a zero-mean WSS process with rxx[0]=1r_{xx}[0] = 1, rxx[1]=0.5r_{xx}[1] = 0.5, and let Yn=Xn+ZnY_n = X_n + Z_n where ZnZ_n is white noise with variance 0.250.25 independent of XX. Design the optimal 2-tap causal FIR Wiener filter X^n=h[0]Yn+h[1]Ynβˆ’1\hat{X}_n = h[0] Y_n + h[1] Y_{n-1}.

Common Mistake: Orthogonality Is to the Observations, Not to the Signal

Mistake:

Students sometimes write E[EnXnβˆ—]=0\mathbb{E}[E_n X_n^*] = 0, confusing the orthogonality principle with a statement about the signal.

Correction:

The error EnE_n is orthogonal to every observation Ynβˆ’β„“Y_{n-\ell} used in the estimate, never to the target XnX_n. In fact E[EnXnβˆ—]=Οƒ2β‰ 0\mathbb{E}[E_n X_n^*] = \sigma^2 \neq 0 in general β€” this is precisely the MMSE (see the proof of TMMSE of the Wiener Estimator).

Wiener-Hopf Equation

The linear system βˆ‘kh[k]ryy[β„“βˆ’k]=rxy[β„“]\sum_k h[k] r_{yy}[\ell-k] = r_{xy}[\ell], β„“βˆˆK\ell \in \mathcal{K}, whose solution is the MMSE Wiener filter. When K=Z\mathcal{K} = \mathbb{Z} it becomes a convolution equation solvable in the frequency domain; when K=Zβ‰₯0\mathcal{K} = \mathbb{Z}_{\geq 0} it requires spectral factorization.

Related: Orthogonality Principle, Innovations