Linear Prediction and Kolmogorov-Szego
Predicting the Future From the Past
Prediction is the special case of the causal Wiener problem in which the target is the observation itself, shifted forward: for some prediction horizon . The one-step predictor () is foundational β it is the engine inside the innovations representation, and its MSE is a fundamental invariant of the observation process. That invariant has a closed-form expression due to Kolmogorov and Szego, and it is one of the most beautiful formulas in linear estimation theory.
Definition: One-Step Linear Predictor
One-Step Linear Predictor
Let be zero-mean WSS with PSD satisfying Paley-Wiener. The one-step linear predictor is the MMSE estimator of based on the strict past : The prediction error is β exactly the innovations of Section 9.3, up to a scalar normalization.
Theorem: Kolmogorov-Szego Formula
Under the Paley-Wiener condition, the one-step prediction MMSE is This is the geometric mean of the PSD over one period.
The formula is striking: an integral over the whole spectrum collapses to a single number via the logarithm. It says the unpredictability of a process is captured by a geometric (not arithmetic) average of its spectral content. If is flat (white noise) the geometric mean equals the arithmetic mean equals the variance, and prediction is impossible β each sample is truly fresh. If is concentrated on a narrow band, the geometric mean is much smaller than the arithmetic mean, meaning most of the signal's power is predictable and only a small residual is unpredictable.
Write the predictor in the innovations basis
By the innovations representation, where the sequence is the inverse DTFT of . The strict past of equals the strict past of , so the best predictor of from equals the best predictor from .
Predict $Y_n$ from past innovations
. Since is white, the MMSE predictor is , and the residual is .
Compute the prediction variance
Since has unit variance, . But is the value at zero lag, which equals β in fact the correct statement is: The first equality follows from applying Jensen's formula to the minimum-phase factor : for an outer function (in the Hardy-space sense), . The second equality uses .
Example: Kolmogorov-Szego for AR(1)
Let , white with variance , . Verify the Kolmogorov-Szego formula by computing both sides.
Compute the PSD
.
Compute the log-integral
. The second integral is classical: (Jensen's formula applied to , whose single zero lies outside the unit disk for ). So .
Exponentiate
. This matches first principles: the one-step prediction error of an AR(1) driven by innovation is exactly , with variance . The formula holds.
Theorem: -Step Prediction MMSE
The MMSE of the -step predictor using is where are the inverse DTFT coefficients of . In particular, recovers the Kolmogorov-Szego formula.
The -step prediction error is the sum of the first innovations that will occur after time , weighted by the MA coefficients of the minimum-phase representation. As , (the signal variance) β predicting the far future becomes the same as estimating the marginal mean, which for zero-mean processes gives the variance.
Express $Y_{n+d}$ in innovations
.
Project onto causal innovations
The second sum is the MMSE predictor ; the first is the prediction error. Its variance is because is white with unit variance.
Prediction Is a Special Case of Causal Wiener Filtering
In the causal Wiener framework, the -step prediction problem corresponds to taking . The cross-PSD is (a frequency-shifted version of the PSD). Substituting into the causal Wiener formula of TCausal Wiener Filter gives which is the formula on page 126 of Kailath's Linear Estimation. So every result in this section is a corollary of Section 9.4 β a comforting sanity check that the machinery is consistent.
Why This Matters: Channel Prediction in Wireless Systems
Wireless channel coefficients are themselves WSS (approximately, over the coherence time) processes β typically modeled as Rayleigh or Ricean fading with a Jakes-like Doppler spectrum. Predicting the channel one or more symbol periods ahead is essential for closed-loop beamforming, adaptive modulation, and link adaptation in 5G/6G systems. The Kolmogorov-Szego formula gives the fundamental limit on how well the channel can be predicted: for a heavily Doppler-spread channel the geometric mean of the PSD is close to the arithmetic mean and prediction is difficult; for a narrowband Doppler channel the geometric mean is much smaller and several steps of prediction are feasible. This prediction gap drives the choice of feedback rate in multiuser MIMO systems.
Historical Note: Szego's Formula Pre-Dates the Filter
1920The formula appeared in pure mathematics long before it entered signal processing. Gabor Szego (1895-1985) published it in 1920 as a limit theorem for determinants of Toeplitz matrices: as , where is the Toeplitz matrix with symbol . Kolmogorov recognized in 1941 that this same expression is the asymptotic one-step prediction variance. The bridge is Szego's theorem on the asymptotic distribution of Toeplitz eigenvalues, which says the eigenvalues of are asymptotically distributed according to the symbol . The determinant, being the product of eigenvalues, then equals in the limit.
Common Mistake: K-S for White Noise Gives the Variance, Not Zero
Mistake:
Assuming that because white noise is unpredictable, the Kolmogorov-Szego formula returns zero.
Correction:
If constant, then , so β the full variance, not zero. The interpretation: white noise is maximally unpredictable, so its prediction MSE equals its total variance. This is consistent: the best you can do is predict zero, and the MSE is the variance.
Key Takeaway
The one-step prediction MMSE equals the geometric mean of the PSD: . Predictability is captured by how far the geometric mean falls below the arithmetic mean (the variance). A process is predictable to exactly the extent that its spectrum is peaky.
Quick Check
For a WSS process with PSD , what can you say about the one-step prediction MMSE ?
strictly.
because the process is WSS.
because is a smooth function.
By the AM-GM inequality, the geometric mean of is strictly less than the arithmetic mean whenever is non-constant. The arithmetic mean equals (integrating the PSD). So strictly. The process is at least partly predictable.