Ferkans — Interactive Telecom Tutor

ex-ch07-01

Easy

Let $\theta \sim \text{Uniform}(-a, a)$ for some $a > 0$ and $Y = \theta$ (noiseless observation). Compute $\hat\theta_{\text{MMSE}}(y)$ , $\hat\theta_{\text{MAP}}(y)$ , and the MMSE.

Show Hint

A noiseless observation gives a degenerate posterior.

The posterior is a point mass at $y$ .

Solution

Posterior

Given $Y=y$ , we know $\theta=y$ exactly, so $f_{\theta|Y}(\theta|y) = \delta(\theta - y)$ .

Estimators

Both estimators equal $y$ : the mean of a point mass at $y$ is $y$ , and its mode is $y$ . The MMSE is zero. $\blacksquare$

ex-ch07-02

Easy

Show that if $\hat\theta(Y)$ satisfies $\mathbb{E}[(\theta - \hat\theta(Y))\phi(Y)] = 0$ for every bounded measurable $\phi$ , then $\hat\theta(Y) = \mathbb{E}[\theta|Y]$ almost surely.

Show Hint

Consider $\phi(Y) = \mathbb{1}\{Y \in A\}$ for an arbitrary measurable set $A$ .

Relate this to the definition of conditional expectation.

Solution

Set up an indicator test

Choosing $\phi(Y) = \mathbb{1}\{Y \in A\}$ , the hypothesis becomes $\mathbb{E}[(\theta - \hat\theta(Y))\mathbb{1}\{Y \in A\}] = 0$ for every measurable $A$ .

Compare with the definition

Equivalently, $\mathbb{E}[\theta\mathbb{1}\{Y \in A\}] = \mathbb{E}[\hat\theta(Y)\mathbb{1}\{Y \in A\}]$ for all $A$ . This is exactly the defining property of the conditional expectation: a $\sigma(Y)$ -measurable random variable whose integral over any $\sigma(Y)$ -set equals that of $\theta$ . Hence $\hat\theta(Y) = \mathbb{E}[\theta|Y]$ a.s. $\blacksquare$

ex-ch07-03

Easy

Let $\theta \sim \text{Exponential}(\lambda)$ with density $f_\theta(\theta) = \lambda e^{-\lambda\theta}$ for $\theta \geq 0$ . Given $N$ iid observations $Y_i | \theta \sim \text{Poisson}(\theta)$ , compute the posterior density and the MMSE estimator.

Show Hint

Write out the joint density and normalize.

The posterior is a Gamma distribution.

Solution

Likelihood

$f_{Y|\theta}(\mathbf{y}|\theta) = \prod_{i=1}^N \frac{\theta^{y_i}e^{-\theta}}{y_i!}$ .

Posterior

The posterior is proportional to $\theta^{\sum y_i} e^{-(N+\lambda)\theta}$ , i.e. a $\text{Gamma}(\sum y_i + 1, N + \lambda)$ density.

MMSE estimator

$\hat\theta_{\text{MMSE}} = \mathbb{E}[\theta|\mathbf{y}] = \frac{\sum y_i + 1}{N + \lambda}$ , a shrinkage of the MLE $\overline{y} = (\sum y_i)/N$ toward the prior mean $1/\lambda$ . $\blacksquare$

ex-ch07-04

Medium

Derive the LMMSE estimator directly from the orthogonality principle without using the completing-the-square argument of TThe LMMSE Formula.

Show Hint

Write $\hat\theta = \mathbf{A}\mathbf{Y} + \mathbf{b}$ .

Require orthogonality against the constant function and against $\mathbf{Y}$ .

Solution

Two orthogonality conditions

The residual $\mathbf{e} = \boldsymbol\theta - \mathbf{A}\mathbf{Y} - \mathbf{b}$ must be orthogonal to $\phi(\mathbf{Y}) = \mathbf{1}$ and to $\phi(\mathbf{Y}) = \mathbf{Y}$ .

Zero-mean condition

$\mathbb{E}[\mathbf{e}] = \mathbf{m}_\theta - \mathbf{A}\mathbf{m}_y - \mathbf{b} = \mathbf{0} \Rightarrow \mathbf{b} = \mathbf{m}_\theta - \mathbf{A}\mathbf{m}_y$ .

Covariance condition

$\mathbb{E}[\mathbf{e}\mathbf{Y}^\top] = \mathbb{E}[(\boldsymbol\theta - \mathbf{A}\mathbf{Y})\mathbf{Y}^\top] - \mathbf{b}\mathbf{m}_y^\top$ . Using $\mathbf{b}$ from the previous step and simplifying, $\boldsymbol\Sigma_{\theta y} - \mathbf{A}\boldsymbol\Sigma_y = \mathbf{0}$ , so $\mathbf{A} = \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1}$ . $\blacksquare$

ex-ch07-05

Medium

Let $(\theta, Y)$ be a zero-mean jointly Gaussian pair with variances $\sigma_\theta^2, \sigma_y^2$ and correlation coefficient $\rho$ . Compute $\hat\theta_{\text{MMSE}}(y)$ , the MMSE, and the conditional variance $\text{Var}(\theta|Y=y)$ .

Show Hint

Use $\boldsymbol\Sigma_{\theta y} = \rho\sigma_\theta\sigma_y$ .

Solution

Estimator

$\hat\theta_{\text{MMSE}}(y) = \frac{\rho\sigma_\theta\sigma_y}{\sigma_y^2} y = \rho\frac{\sigma_\theta}{\sigma_y}y$ .

Conditional variance

$\text{Var}(\theta|Y=y) = \sigma_\theta^2 - \frac{\rho^2\sigma_\theta^2\sigma_y^2}{\sigma_y^2} = \sigma_\theta^2(1-\rho^2)$ .

MMSE

The MMSE equals the (constant) conditional variance: $\text{MMSE} = \sigma_\theta^2(1-\rho^2)$ . As $|\rho| \to 1$ , the MMSE tends to zero (perfect prediction); as $\rho \to 0$ , it returns to the prior variance. $\blacksquare$

ex-ch07-06

Medium

Consider $Y = \theta + W$ with $\theta \sim \mathcal{N}(0,\sigma_\theta^2)$ and $W \sim \text{Laplace}(0, b)$ (density $\frac{1}{2b}e^{-|w|/b}$ ) independent of $\theta$ . Compute $\hat\theta_{\text{MAP}}(y)$ in closed form. Is $\hat\theta_{\text{MMSE}}(y)$ still affine in $y$ ?

Show Hint

Maximize $\log f_\theta(\theta) + \log f_{Y|\theta}(y|\theta)$ in $\theta$ .

The objective is piecewise quadratic.

Solution

Posterior log-density

$\log f_{\theta|Y}(\theta|y) = -\frac{\theta^2}{2\sigma_\theta^2} - \frac{|y-\theta|}{b} + \text{const}$ .

Optimality condition

Differentiating in $\theta$ (on either side of $y$ ) and setting the derivative to zero: $-\theta/\sigma_\theta^2 + \text{sign}(y-\theta)/b = 0$ .

Two cases

If $\theta < y$ : $\theta = \sigma_\theta^2/b$ ; if $\theta > y$ : $\theta = -\sigma_\theta^2/b$ . Combining with the constraint that the maximizer is the one closest to $y$ , $\hat\theta_{\text{MAP}}(y) = \text{sign}(y)\cdot \max(0, |y| - \sigma_\theta^2/b) \cdot \mathbb{1}\{|y| > \sigma_\theta^2/b\} + \frac{\sigma_\theta^2}{b}\text{sign}(y) \cdot \mathbb{1}\{|y| \leq \sigma_\theta^2/b\}$ . More cleanly, the MAP is a soft-thresholding of $y$ . Because the Laplace density is not Gaussian, the posterior is not Gaussian and $\hat\theta_{\text{MMSE}}(y)$ is not affine in $y$ . $\blacksquare$

ex-ch07-07

Medium

Let $\boldsymbol\theta \sim \mathcal{N}(\mathbf{0},\mathbf{I}_n)$ and $\mathbf{Y} = \mathbf{H}\boldsymbol\theta + \mathbf{W}$ with $\mathbf{W} \sim \mathcal{N}(\mathbf{0}, \sigma_w^2\mathbf{I}_m)$ independent. Show that the LMMSE estimator can be written as $\hat\theta_{\text{LMMSE}} = (\mathbf{H}^\top\mathbf{H} + \sigma_w^2\mathbf{I})^{-1}\mathbf{H}^\top\mathbf{Y}$ .

Show Hint

Start from $\boldsymbol\Sigma_\theta\mathbf{H}^\top(\mathbf{H}\boldsymbol\Sigma_\theta\mathbf{H}^\top + \sigma_w^2\mathbf{I})^{-1}\mathbf{Y}$ .

Apply the push-through identity $\mathbf{H}^\top(\mathbf{H}\mathbf{H}^\top + c\mathbf{I})^{-1} = (\mathbf{H}^\top\mathbf{H} + c\mathbf{I})^{-1}\mathbf{H}^\top$ .

Solution

Substitute $\boldsymbol\Sigma_\theta = \mathbf{I}$

The LMMSE formula becomes $\hat\theta_{\text{LMMSE}} = \mathbf{H}^\top(\mathbf{H}\mathbf{H}^\top + \sigma_w^2\mathbf{I})^{-1}\mathbf{Y}$ .

Push-through

Multiplying out, $\mathbf{H}^\top(\mathbf{H}\mathbf{H}^\top + \sigma_w^2\mathbf{I}) = (\mathbf{H}^\top\mathbf{H} + \sigma_w^2\mathbf{I})\mathbf{H}^\top$ , so $\mathbf{H}^\top(\mathbf{H}\mathbf{H}^\top + \sigma_w^2\mathbf{I})^{-1} = (\mathbf{H}^\top\mathbf{H} + \sigma_w^2\mathbf{I})^{-1}\mathbf{H}^\top$ .

Conclude

Hence $\hat\theta_{\text{LMMSE}} = (\mathbf{H}^\top\mathbf{H} + \sigma_w^2\mathbf{I})^{-1}\mathbf{H}^\top\mathbf{Y}$ , the ridge regression form. $\blacksquare$

ex-ch07-08

Medium

Show that for any Bayesian model, the posterior mean $\hat\theta_{\text{MMSE}} (\mathbf{Y})$ is unconditionally unbiased: $\mathbb{E}[\hat\theta_{\text{MMSE}}(\mathbf{Y})] = \mathbb{E}[\boldsymbol\theta]$ . Give an example where it is not conditionally unbiased, i.e. $\mathbb{E}[\hat\theta_{\text{MMSE}}(\mathbf{Y})|\boldsymbol\theta] \neq \boldsymbol\theta$ .

Show Hint

Use the tower property for the first part.

Look at the scalar Gaussian model and fix $\theta$ .

Solution

Unconditional unbiasedness

$\mathbb{E}[\hat\theta_{\text{MMSE}}(\mathbf{Y})] = \mathbb{E}[\mathbb{E}[\boldsymbol\theta|\mathbf{Y}]] = \mathbb{E}[\boldsymbol\theta]$ by the tower property.

Counterexample to conditional unbiasedness

In the scalar Gaussian model, $\hat\theta_{\text{MMSE}}(Y) = \alpha Y$ with $\alpha < 1$ . Given $\theta$ , $\mathbb{E}[\alpha Y|\theta] = \alpha\theta \neq \theta$ whenever $\theta \neq 0$ . The MMSE is conditionally biased toward the prior mean — a feature, not a bug. $\blacksquare$

ex-ch07-09

Medium

Verify that in the Gaussian model of EComplex Gaussian Signal in Gaussian Noise, the posterior mean $\hat{\mathbf{X}}_{\text{MMSE}}$ and the posterior error $\mathbf{e} = \mathbf{X} - \hat{\mathbf{X}}_{\text{MMSE}}$ are independent (not just uncorrelated).

Show Hint

Both $\hat{\mathbf{X}}$ and $\mathbf{e}$ are affine functions of the Gaussian vector $(\mathbf{X},\mathbf{Z})$ .

Uncorrelated jointly Gaussian variables are independent.

Solution

Joint Gaussianity

$\hat{\mathbf{X}} = \mathbf{A}\mathbf{Y}$ (with $\mathbf{A} = \boldsymbol\Sigma_x\mathbf{H}^H\boldsymbol\Sigma_y^{-1}$ ) and $\mathbf{e} = \mathbf{X} - \mathbf{A}\mathbf{Y}$ are both affine in the Gaussian pair $(\mathbf{X}, \mathbf{Z})$ , hence jointly Gaussian.

Uncorrelated

$\mathbb{E}[\hat{\mathbf{X}}\mathbf{e}^H] = \mathbf{0}$ by the orthogonality principle (the residual is uncorrelated with any linear function of $\mathbf{Y}$ , and $\hat{\mathbf{X}}$ is one).

Conclude independence

For jointly Gaussian vectors, zero cross-covariance implies independence. $\blacksquare$

ex-ch07-10

Medium

In the pilot-based channel estimation model of DPilot-Based Channel Estimation Model, compute the Bayesian CRLB — the lower bound on $\mathbb{E}[\|\mathbf{h} - \hat{\mathbf{h}}\|^2]$ over all estimators $\hat{\mathbf{h}}$ . Verify that the MMSE estimator achieves it.

Show Hint

For Gaussian priors and Gaussian likelihoods, the Bayesian CRLB is tight.

The Bayesian Fisher information is $\boldsymbol{\Sigma}_{h}^{-1} + {\sigma^2}^{-1}\mathbf{X}_p^H\mathbf{X}_p$ .

Solution

Bayesian information

The Bayesian Fisher information matrix (prior + data) is $\mathbf{J}_B = \boldsymbol{\Sigma}_{h}^{-1} + {\sigma^2}^{-1}\mathbf{X}_p^H\mathbf{X}_p$ .

Lower bound

The Bayesian CRLB gives $\text{Cov}(\mathbf{h} - \hat{\mathbf{h}}) \succeq \mathbf{J}_B^{-1}$ , so $\text{tr}(\text{Cov}(\cdot)) \geq \text{tr}(\mathbf{J}_B^{-1})$ .

MMSE attains the bound

From TMMSE Channel Estimator, the MMSE posterior covariance is exactly $\mathbf{J}_B^{-1}$ . Hence the MMSE estimator attains the Bayesian CRLB — a special property of Gaussian models. $\blacksquare$

ex-ch07-11

Medium

Let $\theta \sim \text{Beta}(a,b)$ and $Y_1, \ldots, Y_N | \theta \sim \text{iid Bernoulli}(\theta)$ . Compute the posterior, the MAP, and the MMSE of $\theta$ given $\mathbf{Y}$ . What happens as $a,b \to 0$ ?

Show Hint

Beta is conjugate to Bernoulli.

The posterior is Beta with updated parameters.

Solution

Posterior

With $S = \sum Y_i$ , $f_{\theta|\mathbf{Y}}(\theta) \propto \theta^{a+S-1}(1-\theta)^{b+N-S-1}$ , i.e. $\text{Beta}(a+S, b+N-S)$ .

MAP and MMSE

MMSE = posterior mean = $(a+S)/(a+b+N)$ . MAP = posterior mode = $(a+S-1)/(a+b+N-2)$ (assuming $a+S > 1$ and $b+N-S > 1$ ).

Limit

As $a,b \to 0$ , both tend to the MLE $S/N$ . The Beta $(0,0)$ prior is an improper prior (Haldane's prior); it serves as the non-informative limit. $\blacksquare$

ex-ch07-12

Hard

Show that for any estimator $\hat\theta(\mathbf{Y})$ with finite MSE, $\mathbb{E}[\|\boldsymbol\theta - \hat\theta(\mathbf{Y})\|^2] \;=\; \text{MMSE} \;+\; \mathbb{E}[\|\hat\theta(\mathbf{Y}) - \mathbb{E}[\boldsymbol\theta|\mathbf{Y}]\|^2] .$ In words, the excess MSE is the average squared deviation of the estimator from the conditional mean.

Show Hint

Add and subtract $\mathbb{E}[\boldsymbol\theta|\mathbf{Y}]$ .

Use the orthogonality principle for the cross term.

Solution

Decompose

Writing $\mathbf{m}^\star = \mathbb{E}[\boldsymbol\theta|\mathbf{Y}]$ , $\boldsymbol\theta - \hat\theta = (\boldsymbol\theta - \mathbf{m}^\star) + (\mathbf{m}^\star - \hat\theta)$ .

Expand the squared norm

$\|\boldsymbol\theta - \hat\theta\|^2 = \|\boldsymbol\theta - \mathbf{m}^\star\|^2 + 2(\boldsymbol\theta - \mathbf{m}^\star)^\top(\mathbf{m}^\star - \hat\theta) + \|\mathbf{m}^\star - \hat\theta\|^2$ .

Take expectations

Since $\mathbf{m}^\star$ and $\hat\theta$ are functions of $\mathbf{Y}$ , the cross term has expectation zero by the orthogonality principle. Hence $\mathbb{E}[\|\boldsymbol\theta - \hat\theta\|^2] = \text{MMSE} + \mathbb{E}[\|\mathbf{m}^\star - \hat\theta\|^2]$ . $\blacksquare$

ex-ch07-13

Hard

A transmitter sends $\theta \in \{+1, -1\}$ equiprobably through a fading channel: $Y = H\theta + W$ , where $H \sim \mathcal{CN}(0,1)$ and $W \sim \mathcal{CN}(0, \sigma_w^2)$ , with $H,W,\theta$ independent. Compute $\hat\theta_{\text{MMSE}}(y)$ , i.e. the non-coherent MMSE estimator.

Show Hint

Marginalize over $H$ to get the likelihood $f_{Y|\theta}(y|\theta)$ .

The symmetry $H \mapsto -H$ implies $f_{Y|\theta}(y|+1) = f_{Y|\theta}(y|-1)$ .

Solution

Marginal likelihood

$Y | \theta \sim \mathcal{CN}(0, |\theta|^2 + \sigma_w^2) = \mathcal{CN}(0, 1 + \sigma_w^2)$ , independent of the sign of $\theta$ .

Posterior

By Bayes, $\Pr(\theta=+1|Y=y) = \Pr(\theta=-1|Y=y) = 1/2$ for every $y$ , because the likelihoods are identical.

MMSE

$\hat\theta_{\text{MMSE}}(y) = (+1)(1/2) + (-1)(1/2) = 0$ for all $y$ . Without a phase reference, the receiver cannot distinguish $+1$ from $-1$ and the MMSE collapses to the prior mean. This motivates differential encoding or pilot-aided coherent detection. $\blacksquare$

ex-ch07-14

Hard

Let $\theta$ have a Gaussian mixture prior: $f_\theta(\theta) = \sum_{k=1}^K \pi_k \mathcal{N}(\theta; \mu_k, \sigma_k^2)$ with weights $\pi_k > 0$ and $\sum \pi_k = 1$ . The observation is $Y = \theta + W$ , $W \sim \mathcal{N}(0,\sigma_w^2)$ independent. Compute $\hat\theta_{\text{MMSE}}(y)$ in closed form.

Show Hint

Each component's posterior is Gaussian.

The full posterior is a mixture of Gaussians with updated weights.

Solution

Component posteriors

Conditional on component $k$ , the posterior is $\mathcal{N}(\mu_k^{\text{post}}, \sigma_k^{\text{post},2})$ with $\mu_k^{\text{post}} = \frac{\sigma_k^2 y + \sigma_w^2\mu_k}{\sigma_k^2 + \sigma_w^2}$ and $\sigma_k^{\text{post},2} = \sigma_k^2\sigma_w^2/ (\sigma_k^2+\sigma_w^2)$ .

Updated weights

Posterior weights $\tilde\pi_k(y) \propto \pi_k \mathcal{N}(y; \mu_k, \sigma_k^2 + \sigma_w^2)$ , normalized to sum to one.

MMSE

$\hat\theta_{\text{MMSE}}(y) = \sum_k \tilde\pi_k(y)\mu_k^{\text{post}}$ . This is a smooth soft-max between the per-component shrinkage estimates, with weights determined by how well $y$ matches each mixture component. $\blacksquare$

ex-ch07-15

Hard

Suppose the assumed channel covariance $\tilde{\boldsymbol{\Sigma}}_h$ differs from the true $\boldsymbol{\Sigma}_{h}$ . Compute the mean-square error of the mismatched MMSE estimator $\hat{\mathbf{h}}(\mathbf{y}) = \tilde{\boldsymbol{\Sigma}}_h \mathbf{X}_p^H(\mathbf{X}_p\tilde{\boldsymbol{\Sigma}}_h\mathbf{X}_p^H + \sigma^2 \mathbf{I})^{-1}\mathbf{y}$ under the true statistics. Show that the correctly-matched MMSE is always at least as good.

Show Hint

Write the estimator as $\hat{\mathbf{h}} = \tilde{\mathbf{G}}\mathbf{y}$ .

Compute $\mathbb{E}[\|\mathbf{h} - \tilde{\mathbf{G}}\mathbf{y}\|^2]$ with $\mathbf{y} = \mathbf{X}_p\mathbf{h} + \mathbf{w}$ .

Solution

Bias and variance

$\mathbf{h} - \tilde{\mathbf{G}}\mathbf{y} = (\mathbf{I} - \tilde{\mathbf{G}}\mathbf{X}_p)\mathbf{h} - \tilde{\mathbf{G}}\mathbf{w}$ . Taking expectations over $\mathbf{h} \sim \mathcal{CN}(\mathbf{0}, \boldsymbol{\Sigma}_{h})$ and $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I})$ independent: $\mathbb{E}\|\mathbf{h} - \tilde{\mathbf{G}}\mathbf{y}\|^2 = \text{tr}\!\left[(\mathbf{I} - \tilde{\mathbf{G}}\mathbf{X}_p) \boldsymbol{\Sigma}_{h}(\mathbf{I} - \tilde{\mathbf{G}}\mathbf{X}_p)^H\right] + \sigma^2\,\text{tr}(\tilde{\mathbf{G}}\tilde{\mathbf{G}}^H).$

Minimizer

This is minimized over $\tilde{\mathbf{G}}$ by setting $\tilde{\mathbf{G}} = \mathbf{G}^\star = \boldsymbol{\Sigma}_{h}\mathbf{X}_p^H (\mathbf{X}_p\boldsymbol{\Sigma}_{h}\mathbf{X}_p^H + \sigma^2\mathbf{I})^{-1}$ — the true-covariance MMSE estimator.

Conclude

Any $\tilde{\mathbf{G}} \neq \mathbf{G}^\star$ gives a larger MSE. Mismatching the prior costs. $\blacksquare$

ex-ch07-16

Medium

Show that the LMMSE error covariance satisfies $\boldsymbol\Sigma_{\theta|y} \preceq \boldsymbol\Sigma_\theta$ in the positive-semidefinite ordering, with equality iff $\boldsymbol\Sigma_{\theta y} = \mathbf{0}$ .

Show Hint

$\boldsymbol\Sigma_{\theta|y} = \boldsymbol\Sigma_\theta - \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1}\boldsymbol\Sigma_{y\theta}$ .

The correction term is PSD.

Solution

PSD correction

$\boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1}\boldsymbol\Sigma_{y\theta} = \boldsymbol\Sigma_{\theta y}\boldsymbol\Sigma_y^{-1/2}(\boldsymbol\Sigma_{\theta y} \boldsymbol\Sigma_y^{-1/2})^H \succeq \mathbf{0}$ .

Subtract

Hence $\boldsymbol\Sigma_{\theta|y} = \boldsymbol\Sigma_\theta - \text{(PSD)} \preceq \boldsymbol\Sigma_\theta$ . Equality iff the correction is zero, i.e. $\boldsymbol\Sigma_{\theta y} = \mathbf{0}$ — the observation is uncorrelated with the parameter. $\blacksquare$

ex-ch07-17

Hard

Prove the "orthogonality $\Rightarrow$ optimality" half of the orthogonality principle as an inequality, without the perturbation argument: for any $g(\mathbf{Y})$ with $\mathbb{E}[(\boldsymbol\theta - g(\mathbf{Y}))^\top\phi(\mathbf{Y})] = 0$ for every $\phi$ , and any other estimator $\tilde g(\mathbf{Y})$ , $\mathbb{E}\|\boldsymbol\theta - g(\mathbf{Y})\|^2 \leq \mathbb{E}\|\boldsymbol\theta - \tilde g(\mathbf{Y})\|^2$ .

Show Hint

Write $\tilde g = g + \phi$ where $\phi = \tilde g - g$ .

Expand $\|\boldsymbol\theta - \tilde g\|^2$ .

Solution

Add and subtract

$\boldsymbol\theta - \tilde g = (\boldsymbol\theta - g) - \phi$ .

Expand

$\|\boldsymbol\theta - \tilde g\|^2 = \|\boldsymbol\theta - g\|^2 - 2(\boldsymbol\theta - g)^\top\phi + \|\phi\|^2$ .

Take expectation

The middle term vanishes by hypothesis, leaving $\mathbb{E}\|\boldsymbol\theta - \tilde g\|^2 = \mathbb{E}\|\boldsymbol\theta - g\|^2 + \mathbb{E}\|\phi\|^2 \geq \mathbb{E}\|\boldsymbol\theta - g\|^2$ . $\blacksquare$

ex-ch07-18

Medium

Let $Y = \sqrt{\text{SNR}}\,\theta + W$ with $\theta,W \sim \mathcal{N} (0,1)$ independent. Express the MMSE as a function of SNR and verify the I-MMSE identity: $\frac{d}{d\,\text{SNR}}\,I(\theta;Y) = \frac{1}{2}\text{MMSE}(\text{SNR})$ for this Gaussian case.

Show Hint

Compute MMSE $(\text{SNR}) = 1/(1+\text{SNR})$ .

$I(\theta;Y) = \tfrac{1}{2}\log(1+\text{SNR})$ .

Solution

MMSE

$\hat\theta_{\text{MMSE}}(y) = \frac{\sqrt{\text{SNR}}}{1+\text{SNR}}y$ and the posterior variance is $\frac{1}{1+\text{SNR}}$ . So $\text{MMSE}(\text{SNR}) = \frac{1}{1+\text{SNR}}$ .

Mutual information

$I(\theta;Y) = \tfrac{1}{2}\log(1+\text{SNR})$ nats (scalar Gaussian capacity).

Verify the identity

$\frac{d}{d\,\text{SNR}}\tfrac{1}{2}\log(1+\text{SNR}) = \frac{1}{2(1+\text{SNR})} = \tfrac{1}{2}\text{MMSE}(\text{SNR})$ . The Guo–Shamai–Verdú I-MMSE identity holds and is tight for Gaussian inputs. $\blacksquare$

ex-ch07-19

Challenge

Let $\theta$ be uniform on $[0,1]$ and $Y | \theta \sim \text{Bernoulli} (\theta)$ . Compute $\hat\theta_{\text{MMSE}}(y)$ for $y \in \{0,1\}$ and the resulting MMSE.

Show Hint

Posterior: $\theta|Y=y \sim \text{Beta}(y+1, 2-y)$ .

Mean of $\text{Beta}(a,b) = a/(a+b)$ .

Solution

Posterior

For $y=1$ : $f_{\theta|Y}(\theta|1) \propto \theta$ on $[0,1]$ , i.e. $\text{Beta}(2,1)$ . For $y=0$ : $f_{\theta|Y}(\theta|0) \propto 1-\theta$ , i.e. $\text{Beta}(1,2)$ .

MMSE estimator

$\hat\theta_{\text{MMSE}}(1) = 2/3$ , $\hat\theta_{\text{MMSE}}(0) = 1/3$ .

MMSE value

$\text{Var}(\theta|Y=y) = ab/[(a+b)^2(a+b+1)]$ . For each $y$ this equals $2/(9\cdot 4) = 1/18$ . So $\text{MMSE} = 1/18 \approx 0.056$ (versus the prior variance $1/12 \approx 0.083$ ). $\blacksquare$

ex-ch07-20

Challenge

Derive the Bayesian CRLB (Van Trees inequality) for scalar $\theta$ with prior $f_\theta$ and likelihood $f_{Y|\theta}$ : $\mathbb{E}[(\theta - \hat\theta(Y))^2] \;\geq\; \frac{1}{J_B}, \qquad J_B = J_D + J_P,$ where $J_D = \mathbb{E}[J(\theta)]$ is the expected Fisher information and $J_P = \mathbb{E}\!\left[-\frac{d^2}{d\theta^2}\log f_\theta(\theta)\right]$ is the prior information.

Show Hint

Apply Cauchy–Schwarz to $\mathbb{E}[(\theta-\hat\theta)\cdot S]$ with $S = \frac{\partial}{\partial \theta}\log f_{\theta,Y}(\theta,Y)$ .

Show that $\mathbb{E}[S^2] = J_B$ .

Solution

Joint score

Define the joint score $S = \frac{\partial}{\partial\theta}\log f_{\theta,Y}(\theta,Y) = \frac{\partial}{\partial\theta} \log f_\theta(\theta) + \frac{\partial}{\partial\theta}\log f_{Y|\theta}(Y|\theta)$ , so $S = S_P + S_D$ (prior and data scores).

Cross-term vanishes

Under regularity, $\mathbb{E}[S_P S_D] = \mathbb{E}[S_P \cdot \mathbb{E}[S_D|\theta]] = 0$ because the expected data score given $\theta$ is zero.

Covariance with residual

Using integration by parts, $\mathbb{E}[(\theta-\hat\theta(Y))\cdot S] = 1$ .

Cauchy–Schwarz

$1 = \mathbb{E}[(\theta-\hat\theta)\cdot S]^2 \leq \mathbb{E}[(\theta-\hat\theta)^2]\cdot\mathbb{E}[S^2] = \mathbb{E}[(\theta-\hat\theta)^2]\cdot J_B$ .

Van Trees bound

Rearranging, $\mathbb{E}[(\theta-\hat\theta)^2] \geq 1/J_B$ . This is the Bayesian CRLB, tight (with equality) in the Gaussian-Gaussian model. $\blacksquare$

Exercises

ex-ch07-01

Posterior

Estimators

ex-ch07-02

Set up an indicator test

Compare with the definition

ex-ch07-03

Likelihood

Posterior

MMSE estimator

ex-ch07-04

Two orthogonality conditions

Zero-mean condition

Covariance condition

ex-ch07-05

Estimator

Conditional variance

MMSE

ex-ch07-06

Posterior log-density

Optimality condition

Two cases

ex-ch07-07

Substitute $\boldsymbol\Sigma_\theta = \mathbf{I}$

Push-through

Conclude

ex-ch07-08

Unconditional unbiasedness

Counterexample to conditional unbiasedness

ex-ch07-09

Joint Gaussianity

Uncorrelated

Conclude independence

ex-ch07-10

Bayesian information

Lower bound

MMSE attains the bound

ex-ch07-11

Posterior

MAP and MMSE

Limit

ex-ch07-12

Decompose

Expand the squared norm

Take expectations

ex-ch07-13

Marginal likelihood

Posterior

MMSE

ex-ch07-14

Component posteriors

Updated weights

MMSE

ex-ch07-15

Bias and variance

Minimizer

Conclude

ex-ch07-16

PSD correction

Subtract

ex-ch07-17

Add and subtract

Expand

Take expectation

ex-ch07-18

MMSE

Mutual information

Verify the identity

ex-ch07-19

Posterior

MMSE estimator

MMSE value

ex-ch07-20

Joint score

Cross-term vanishes

Covariance with residual

Cauchy–Schwarz

Van Trees bound