Ferkans — Interactive Telecom Tutor

Estimation as Geometric Projection

The previous section proved that the MMSE estimator is the conditional mean by a direct calculation. A more powerful viewpoint recasts estimation as a projection in an inner-product space of random variables. In this geometry, "distance" is root-mean-square error and "perpendicular" means "uncorrelated". The MMSE estimator becomes the orthogonal projection of $\boldsymbol\theta$ onto the subspace of functions of $\mathbf{Y}$ , and the optimality condition becomes the statement that the residual is perpendicular to that subspace — the orthogonality principle.

Definition:
Inner Product of Random Variables

Let $\mathcal{L}^2$ denote the space of (real or complex) random variables with finite second moment. For $X, Y \in \mathcal{L}^2$ define $\langle X, Y \rangle \;=\; \mathbb{E}[X\, \overline{Y}\,], \qquad \|X\| \;=\; \sqrt{\mathbb{E}[|X|^2]\,}.$ With this inner product, $\mathcal{L}^2$ is a Hilbert space; two random variables are orthogonal when $\mathbb{E}[X\overline{Y}] = 0$ .

For zero-mean random variables, orthogonality coincides with uncorrelatedness. The norm $\|X - \hat{X}\|^2 = \mathbb{E}[|X - \hat{X}|^2]$ is precisely the mean-square error, so MMSE = squared distance in this space.

Theorem: Orthogonality Principle (Unrestricted MMSE)

Let $(\boldsymbol\theta, \mathbf{Y})$ be jointly distributed with $\mathbb{E}[\|\boldsymbol\theta\|^2] < \infty$ . Then the MMSE estimator $\hat\theta^\star(\mathbf{Y})$ is characterized by the following orthogonality condition: for every (measurable, $\mathcal{L}^2$ ) function $\phi: \mathcal{Y} \to \mathbb{R}^n$ , $\boxed{\; \mathbb{E}\!\left[\, \big(\boldsymbol\theta - \hat\theta^\star(\mathbf{Y})\big)^\top \phi(\mathbf{Y})\,\right] \;=\; 0 .\; }$ That is, the estimation error is uncorrelated with every function of the observation.

Think of the subspace $\mathcal{H}_Y = \{\phi(\mathbf{Y}) : \phi \in \mathcal{L}^2\}$ of all square-integrable functions of the observation. The MMSE estimator is the orthogonal projection of $\boldsymbol\theta$ onto $\mathcal{H}_Y$ , so the residual lies in the orthogonal complement.

Proof

Sufficiency via the tower property

Set $\hat\theta^\star(\mathbf{Y}) = \mathbb{E}[\boldsymbol\theta|\mathbf{Y}]$ . For any $\phi$ , $\mathbb{E}\!\left[(\boldsymbol\theta - \mathbb{E}[\boldsymbol\theta|\mathbf{Y}])^\top \phi(\mathbf{Y})\right] = \mathbb{E}\!\left[\, \mathbb{E}\!\left[(\boldsymbol\theta - \mathbb{E}[\boldsymbol\theta|\mathbf{Y}])^\top \phi(\mathbf{Y}) \,\middle|\, \mathbf{Y}\right]\right].$ Inside the conditional expectation, $\phi(\mathbf{Y})$ is a known constant, so the expression equals $(\mathbb{E}[\boldsymbol\theta|\mathbf{Y}] - \mathbb{E}[\boldsymbol\theta|\mathbf{Y}])^\top \phi(\mathbf{Y}) = 0$ .

Necessity by a perturbation argument

Suppose $g(\mathbf{Y})$ satisfies the orthogonality condition. For any other candidate estimator $\tilde g(\mathbf{Y})$ , write $\tilde g = g + \phi$ with $\phi = \tilde g - g$ . Then $\mathbb{E}\|\boldsymbol\theta - \tilde g(\mathbf{Y})\|^2 = \mathbb{E}\|\boldsymbol\theta - g(\mathbf{Y})\|^2 - 2\,\mathbb{E}[(\boldsymbol\theta - g(\mathbf{Y}))^\top \phi(\mathbf{Y})] + \mathbb{E}\|\phi(\mathbf{Y})\|^2 .$ The cross term vanishes by hypothesis, leaving $\mathbb{E}\|\boldsymbol\theta - \tilde g\|^2 = \mathbb{E}\|\boldsymbol\theta - g\|^2 + \mathbb{E}\|\phi\|^2 \geq \mathbb{E}\|\boldsymbol\theta - g\|^2$ , with equality iff $\phi \equiv 0$ . So any estimator satisfying the orthogonality condition is MMSE-optimal, and the optimum is unique (almost surely).

Identify the projection

Combining sufficiency and necessity, the unique MMSE estimator is $\mathbb{E}[\boldsymbol\theta|\mathbf{Y}]$ . $\blacksquare$

Key Takeaway

The orthogonality principle is a characterization: an estimator is MMSE-optimal if and only if its error is uncorrelated with every function of the observation. This dispenses with having to guess the optimal form — once you verify orthogonality, you have proved optimality.

Pythagoras for Estimation

Because the error is orthogonal to the estimator (both are in $\mathcal{L}^2$ ), the Pythagorean identity gives $\mathbb{E}[\|\boldsymbol\theta\|^2] \;=\; \mathbb{E}[\|\hat\theta^\star(\mathbf{Y})\|^2] \;+\; \mathbb{E}[\|\boldsymbol\theta - \hat\theta^\star(\mathbf{Y})\|^2].$ In words, the energy of the parameter decomposes into the energy captured by the estimator plus the residual MMSE. This is exactly the bias–variance decomposition specialized to Bayesian estimation with the mean removed.

Example: Verifying Orthogonality

In the scalar Gaussian example (EMMSE for the Scalar Gaussian Model), verify the orthogonality condition $\mathbb{E}[(\theta - \hat\theta_{\text{MMSE}}(Y)) \cdot Y] = 0$ directly.

Solution

Compute the error

With $\alpha = \sigma_\theta^2/(\sigma_\theta^2 + \sigma_w^2)$ , the error is $e = \theta - \alpha Y = \theta - \alpha(\theta + W) = (1-\alpha)\theta - \alpha W$ .

Compute the cross-moment

$\mathbb{E}[e\,Y] = \mathbb{E}[((1-\alpha)\theta - \alpha W)(\theta + W)] = (1-\alpha)\sigma_\theta^2 - \alpha\sigma_w^2$ , using independence of $\theta$ and $W$ and their zero means.

Check it vanishes

$(1-\alpha)\sigma_\theta^2 - \alpha\sigma_w^2 = \tfrac{\sigma_w^2}{\sigma_\theta^2+\sigma_w^2}\cdot \sigma_\theta^2 - \tfrac{\sigma_\theta^2}{\sigma_\theta^2+\sigma_w^2}\cdot \sigma_w^2 = 0$ . The error is uncorrelated with $Y$ , confirming the orthogonality principle for this model. $\blacksquare$

Orthogonality of the MMSE Residual

Monte Carlo samples of $(Y, e)$ where $e = \theta - \hat\theta_{\text{MMSE}}(Y)$ . Compare the empirical correlation coefficient to zero. The cloud is uncorrelated even though $e$ and $Y$ are not, in general, independent when $\theta$ is non-Gaussian.

Parameters

Prior on

\theta

Noise std

\sigma_w

1

Orthogonality Principle

The characterization of MMSE estimators: an estimator $\hat\theta(\mathbf{Y})$ is MMSE-optimal if and only if the residual $\boldsymbol\theta - \hat\theta(\mathbf{Y})$ is uncorrelated with every (measurable, $\mathcal{L}^2$ ) function of the observation.

Common Mistake: Uncorrelated ≠ Independent

Mistake:

"If the MMSE residual is uncorrelated with $\\mathbf{Y}$ , it must be independent of $\\mathbf{Y}$ — so I can treat it as noise."

Correction:

Uncorrelated means $\mathbb{E}[(\boldsymbol\theta - \hat\theta)^\top \phi(\mathbf{Y})] = 0$ for every function $\phi$ . Higher-order statistics (e.g., the conditional variance $\text{Var}(e|\mathbf{Y}=\mathbf{y})$ ) can still depend on $\mathbf{Y}$ . Only in the jointly Gaussian case is the residual actually independent of $\mathbf{Y}$ . For the binary-signal example, $\text{Var}(e|Y=y)$ varies dramatically with $y$ (it is large near $y=0$ where the posterior is ambiguous, and small for $|y| \gg 1$ where the posterior is nearly concentrated).

The Orthogonality Principle