Exercises
ex-ch07-01
EasyLet for some and (noiseless observation). Compute , , and the MMSE.
A noiseless observation gives a degenerate posterior.
The posterior is a point mass at .
Posterior
Given , we know exactly, so .
Estimators
Both estimators equal : the mean of a point mass at is , and its mode is . The MMSE is zero.
ex-ch07-02
EasyShow that if satisfies for every bounded measurable , then almost surely.
Consider for an arbitrary measurable set .
Relate this to the definition of conditional expectation.
Set up an indicator test
Choosing , the hypothesis becomes for every measurable .
Compare with the definition
Equivalently, for all . This is exactly the defining property of the conditional expectation: a -measurable random variable whose integral over any -set equals that of . Hence a.s.
ex-ch07-03
EasyLet with density for . Given iid observations , compute the posterior density and the MMSE estimator.
Write out the joint density and normalize.
The posterior is a Gamma distribution.
Likelihood
.
Posterior
The posterior is proportional to , i.e. a density.
MMSE estimator
, a shrinkage of the MLE toward the prior mean .
ex-ch07-04
MediumDerive the LMMSE estimator directly from the orthogonality principle without using the completing-the-square argument of TThe LMMSE Formula.
Write .
Require orthogonality against the constant function and against .
Two orthogonality conditions
The residual must be orthogonal to and to .
Zero-mean condition
.
Covariance condition
. Using from the previous step and simplifying, , so .
ex-ch07-05
MediumLet be a zero-mean jointly Gaussian pair with variances and correlation coefficient . Compute , the MMSE, and the conditional variance .
Use .
Estimator
.
Conditional variance
.
MMSE
The MMSE equals the (constant) conditional variance: . As , the MMSE tends to zero (perfect prediction); as , it returns to the prior variance.
ex-ch07-06
MediumConsider with and (density ) independent of . Compute in closed form. Is still affine in ?
Maximize in .
The objective is piecewise quadratic.
Posterior log-density
.
Optimality condition
Differentiating in (on either side of ) and setting the derivative to zero: .
Two cases
If : ; if : . Combining with the constraint that the maximizer is the one closest to , . More cleanly, the MAP is a soft-thresholding of . Because the Laplace density is not Gaussian, the posterior is not Gaussian and is not affine in .
ex-ch07-07
MediumLet and with independent. Show that the LMMSE estimator can be written as .
Start from .
Apply the push-through identity .
Substitute $\boldsymbol\Sigma_\theta = \mathbf{I}$
The LMMSE formula becomes .
Push-through
Multiplying out, , so .
Conclude
Hence , the ridge regression form.
ex-ch07-08
MediumShow that for any Bayesian model, the posterior mean is unconditionally unbiased: . Give an example where it is not conditionally unbiased, i.e. .
Use the tower property for the first part.
Look at the scalar Gaussian model and fix .
Unconditional unbiasedness
by the tower property.
Counterexample to conditional unbiasedness
In the scalar Gaussian model, with . Given , whenever . The MMSE is conditionally biased toward the prior mean — a feature, not a bug.
ex-ch07-09
MediumVerify that in the Gaussian model of EComplex Gaussian Signal in Gaussian Noise, the posterior mean and the posterior error are independent (not just uncorrelated).
Both and are affine functions of the Gaussian vector .
Uncorrelated jointly Gaussian variables are independent.
Joint Gaussianity
(with ) and are both affine in the Gaussian pair , hence jointly Gaussian.
Uncorrelated
by the orthogonality principle (the residual is uncorrelated with any linear function of , and is one).
Conclude independence
For jointly Gaussian vectors, zero cross-covariance implies independence.
ex-ch07-10
MediumIn the pilot-based channel estimation model of DPilot-Based Channel Estimation Model, compute the Bayesian CRLB — the lower bound on over all estimators . Verify that the MMSE estimator achieves it.
For Gaussian priors and Gaussian likelihoods, the Bayesian CRLB is tight.
The Bayesian Fisher information is .
Bayesian information
The Bayesian Fisher information matrix (prior + data) is .
Lower bound
The Bayesian CRLB gives , so .
MMSE attains the bound
From TMMSE Channel Estimator, the MMSE posterior covariance is exactly . Hence the MMSE estimator attains the Bayesian CRLB — a special property of Gaussian models.
ex-ch07-11
MediumLet and . Compute the posterior, the MAP, and the MMSE of given . What happens as ?
Beta is conjugate to Bernoulli.
The posterior is Beta with updated parameters.
Posterior
With , , i.e. .
MAP and MMSE
MMSE = posterior mean = . MAP = posterior mode = (assuming and ).
Limit
As , both tend to the MLE . The Beta prior is an improper prior (Haldane's prior); it serves as the non-informative limit.
ex-ch07-12
HardShow that for any estimator with finite MSE, In words, the excess MSE is the average squared deviation of the estimator from the conditional mean.
Add and subtract .
Use the orthogonality principle for the cross term.
Decompose
Writing , .
Expand the squared norm
.
Take expectations
Since and are functions of , the cross term has expectation zero by the orthogonality principle. Hence .
ex-ch07-13
HardA transmitter sends equiprobably through a fading channel: , where and , with independent. Compute , i.e. the non-coherent MMSE estimator.
Marginalize over to get the likelihood .
The symmetry implies .
Marginal likelihood
, independent of the sign of .
Posterior
By Bayes, for every , because the likelihoods are identical.
MMSE
for all . Without a phase reference, the receiver cannot distinguish from and the MMSE collapses to the prior mean. This motivates differential encoding or pilot-aided coherent detection.
ex-ch07-14
HardLet have a Gaussian mixture prior: with weights and . The observation is , independent. Compute in closed form.
Each component's posterior is Gaussian.
The full posterior is a mixture of Gaussians with updated weights.
Component posteriors
Conditional on component , the posterior is with and .
Updated weights
Posterior weights , normalized to sum to one.
MMSE
. This is a smooth soft-max between the per-component shrinkage estimates, with weights determined by how well matches each mixture component.
ex-ch07-15
HardSuppose the assumed channel covariance differs from the true . Compute the mean-square error of the mismatched MMSE estimator under the true statistics. Show that the correctly-matched MMSE is always at least as good.
Write the estimator as .
Compute with .
Bias and variance
. Taking expectations over and independent:
Minimizer
This is minimized over by setting — the true-covariance MMSE estimator.
Conclude
Any gives a larger MSE. Mismatching the prior costs.
ex-ch07-16
MediumShow that the LMMSE error covariance satisfies in the positive-semidefinite ordering, with equality iff .
.
The correction term is PSD.
PSD correction
.
Subtract
Hence . Equality iff the correction is zero, i.e. — the observation is uncorrelated with the parameter.
ex-ch07-17
HardProve the "orthogonality optimality" half of the orthogonality principle as an inequality, without the perturbation argument: for any with for every , and any other estimator , .
Write where .
Expand .
Add and subtract
.
Expand
.
Take expectation
The middle term vanishes by hypothesis, leaving .
ex-ch07-18
MediumLet with independent. Express the MMSE as a function of SNR and verify the I-MMSE identity: for this Gaussian case.
Compute MMSE.
.
MMSE
and the posterior variance is . So .
Mutual information
nats (scalar Gaussian capacity).
Verify the identity
. The Guo–Shamai–Verdú I-MMSE identity holds and is tight for Gaussian inputs.
ex-ch07-19
ChallengeLet be uniform on and . Compute for and the resulting MMSE.
Posterior: .
Mean of .
Posterior
For : on , i.e. . For : , i.e. .
MMSE estimator
, .
MMSE value
. For each this equals . So (versus the prior variance ).
ex-ch07-20
ChallengeDerive the Bayesian CRLB (Van Trees inequality) for scalar with prior and likelihood : where is the expected Fisher information and is the prior information.
Apply Cauchy–Schwarz to with .
Show that .
Joint score
Define the joint score , so (prior and data scores).
Cross-term vanishes
Under regularity, because the expected data score given is zero.
Covariance with residual
Using integration by parts, .
Cauchy–Schwarz
.
Van Trees bound
Rearranging, . This is the Bayesian CRLB, tight (with equality) in the Gaussian-Gaussian model.