The Orthogonality Principle
Estimation as Geometric Projection
The previous section proved that the MMSE estimator is the conditional mean by a direct calculation. A more powerful viewpoint recasts estimation as a projection in an inner-product space of random variables. In this geometry, "distance" is root-mean-square error and "perpendicular" means "uncorrelated". The MMSE estimator becomes the orthogonal projection of onto the subspace of functions of , and the optimality condition becomes the statement that the residual is perpendicular to that subspace — the orthogonality principle.
Definition: Inner Product of Random Variables
Inner Product of Random Variables
Let denote the space of (real or complex) random variables with finite second moment. For define With this inner product, is a Hilbert space; two random variables are orthogonal when .
For zero-mean random variables, orthogonality coincides with uncorrelatedness. The norm is precisely the mean-square error, so MMSE = squared distance in this space.
Theorem: Orthogonality Principle (Unrestricted MMSE)
Let be jointly distributed with . Then the MMSE estimator is characterized by the following orthogonality condition: for every (measurable, ) function , That is, the estimation error is uncorrelated with every function of the observation.
Think of the subspace of all square-integrable functions of the observation. The MMSE estimator is the orthogonal projection of onto , so the residual lies in the orthogonal complement.
Sufficiency via the tower property
Set . For any , Inside the conditional expectation, is a known constant, so the expression equals .
Necessity by a perturbation argument
Suppose satisfies the orthogonality condition. For any other candidate estimator , write with . Then The cross term vanishes by hypothesis, leaving , with equality iff . So any estimator satisfying the orthogonality condition is MMSE-optimal, and the optimum is unique (almost surely).
Identify the projection
Combining sufficiency and necessity, the unique MMSE estimator is .
Key Takeaway
The orthogonality principle is a characterization: an estimator is MMSE-optimal if and only if its error is uncorrelated with every function of the observation. This dispenses with having to guess the optimal form — once you verify orthogonality, you have proved optimality.
Pythagoras for Estimation
Because the error is orthogonal to the estimator (both are in ), the Pythagorean identity gives In words, the energy of the parameter decomposes into the energy captured by the estimator plus the residual MMSE. This is exactly the bias–variance decomposition specialized to Bayesian estimation with the mean removed.
Example: Verifying Orthogonality
In the scalar Gaussian example (EMMSE for the Scalar Gaussian Model), verify the orthogonality condition directly.
Compute the error
With , the error is .
Compute the cross-moment
, using independence of and and their zero means.
Check it vanishes
. The error is uncorrelated with , confirming the orthogonality principle for this model.
Orthogonality of the MMSE Residual
Monte Carlo samples of where . Compare the empirical correlation coefficient to zero. The cloud is uncorrelated even though and are not, in general, independent when is non-Gaussian.
Parameters
Orthogonality Principle
The characterization of MMSE estimators: an estimator is MMSE-optimal if and only if the residual is uncorrelated with every (measurable, ) function of the observation.
Related: Minimum Mean-Square Error (MMSE) Estimator, Conditional Expectation