Exercises
ex-ch22-01
EasyCompute the support of the Marchenko–Pastur distribution for .
Just substitute into the formula.
Compute
- : , support .
- : , support .
- : , support .
- : , support .
Observation
As the left edge approaches zero, signalling the onset of rank deficiency.
ex-ch22-02
EasyDerive the ridge estimator by differentiating the ridge objective and setting the gradient to zero.
The objective is .
Use .
Differentiate
Let . Then .
Set to zero
. The Hessian confirms strict convexity.
ex-ch22-03
EasyCompute the soft-thresholding operator for and .
.
Compute
- : .
- : , so .
- : .
- : .
- : .
ex-ch22-04
EasyFor , , compute the shrinkage factor of the James–Stein estimator when . What happens when ?
Apply the formula directly; note when the factor goes negative.
Case $\|\mathbf{y}\|^2=10$
Factor . Shrink each coordinate by .
Case $\|\mathbf{y}\|^2=2$
Factor . The classical JS would flip the sign. The positive-part variant clamps at zero instead, yielding .
ex-ch22-05
EasyState what "inadmissible" means in two sentences, without using formulas.
Focus on the comparison to another estimator.
Answer
An estimator is inadmissible if another estimator has risk no larger for every parameter value and strictly smaller for at least one. Inadmissibility says there is a better alternative, but does not construct it.
ex-ch22-06
MediumShow that the LMMSE estimator for the model with , coincides with the ridge estimator at .
Write the LMMSE estimator as .
Use the push-through identity .
Compute covariances
, .
LMMSE
.
Push-through
Apply the push-through identity: with . Hence .
ex-ch22-07
MediumDerive the KKT conditions for the LASSO and interpret them coordinate-by-coordinate.
The LASSO objective is convex but non-smooth.
The sub-differential of is if and if .
Stationarity
. Componentwise: when , and when .
Interpretation
A coordinate is active () iff the residual correlates with the -th column at level exactly ; inactive coordinates have residual correlation below . This is the "equicorrelation" condition that underlies the LARS algorithm.
ex-ch22-08
MediumCompute the divergence and verify that it equals for .
Compute using the quotient rule.
Single partial
.
Sum
Summing over : .
ex-ch22-09
MediumGiven and the positive-part JS estimator , argue (informally) why it should dominate the ordinary JS estimator.
When the JS shrinkage factor is negative, it flips the sign of .
Sign flip is wasteful
When , the JS shrinkage factor is negative, producing an estimator that points opposite to the observation. Setting this factor to zero (positive-part) always improves risk, because has risk , which is typically smaller than the risk of for positive .
Dominance
A careful Stein-identity calculation (see Baranchik 1964) formalises this intuition. JS dominates JS, confirming that JS itself is inadmissible (though minimax).
ex-ch22-10
MediumProve that the Bayes estimator under squared-error loss is the posterior mean .
Minimise over .
Pointwise minimisation
The Bayes risk is . It suffices to minimise pointwise.
Quadratic in $c$
Differentiating in : . Setting to zero yields .
ex-ch22-11
MediumVerify the minimax = maximin duality for the scalar Gaussian mean problem , . Specifically, compute both sides and check equality.
The MLE has constant risk .
A Gaussian prior gives Bayes risk ; let .
Minimax side
. A lower bound of follows from taking the prior to be a Gaussian of growing variance (next step).
Maximin side
Under prior , the Bayes estimator is with Bayes risk as . Hence .
Conclude
Both sides equal , so duality holds and the MLE is minimax.
ex-ch22-12
MediumDerive the optimal linear shrinkage for the sample covariance matrix: . Find the that minimises .
Expand the squared-Frobenius error and minimise over .
Expand
. Since is unbiased, the cross term vanishes in expectation.
Optimise
. Interpret: shrink more toward when the sample covariance is noisy and the true covariance is close to identity.
ex-ch22-13
HardUsing the Marchenko–Pastur law, show that converges to the Stieltjes transform , and derive a closed-form expression for as a function of and .
The Stieltjes transform of the MP law satisfies a quadratic equation.
Specifically, solves .
Quadratic equation
From the MP density one derives the Stieltjes-transform identity . Substituting : .
Solve
Solving the quadratic and taking the branch that is positive on the negative real axis: .
Consistency check
At and , , matching the OLS risk. For the limit is finite and positive — ridge remains well-defined.
ex-ch22-14
HardProve the upper bound of TMinimax Rate for Sparse Estimation: best-subset selection achieves MSE .
Use a union bound over supports.
For each support, OLS risk is ; multiply by the union-bound factor.
Oracle risk per support
For the true support , restricted OLS has risk . The estimator selects the support minimising the residual sum of squares.
Union bound
For each candidate , the Gaussian concentration gives . Summing over supports inflates the risk by a multiplicative factor.
Combine
. The constant can be sharpened via Slepian-type Gaussian comparison arguments.
ex-ch22-15
HardDerive the risk of the James–Stein estimator when shrinking toward an arbitrary fixed vector rather than zero. Conclude that the JS phenomenon is independent of the anchor.
Apply the Stein-lemma argument to the translated estimator .
Translation
Let and . Then and applying classical JS to gives an estimator of that dominates for .
Translate back
Adding back, the anchored-JS estimator has risk . Dominance over the MLE holds for every .
Consequence
The anchor is a free parameter. Shrinking toward the grand mean (as Efron–Morris did) or toward a physics-informed prior (e.g., zero steering in beamforming) both improve upon the MLE.
ex-ch22-16
HardShow that LASSO with penalty satisfies the minimax rate for sparse recovery. (Upper bound only.)
Use the 'basic inequality' .
Then bound the linear term and use w.h.p.
Basic inequality
From optimality of and a convex-analysis expansion, .
Dual-norm bound
Bound the noise term: . For i.i.d. Gaussian and , with high probability .
Restricted eigenvalue + rate
Under a restricted-eigenvalue condition on (satisfied w.h.p. for i.i.d. Gaussian matrices with ), one derives , matching the minimax rate up to constants.
ex-ch22-17
ChallengeDerive the fixed-point equation for the asymptotic MSE of LASSO in the proportional regime (Bayati–Montanari state evolution). Reproduce the statement: there exist and such that the LASSO MSE equals at the fixed point.
State evolution iterates where is drawn from the empirical distribution of .
At the fixed point, and are linked by the LASSO threshold.
State evolution recursion
AMP with soft-threshold denoiser has scalar state evolution drawn from the empirical distribution of the true signal, .
Calibration to LASSO
The connection to LASSO is via the "Onsager" correction: the LASSO solution coincides with the AMP fixed point when and satisfy the scalar calibration .
Fixed point MSE
At the joint fixed point , the LASSO per-coord MSE is (for ) or (for ). The formal proof is Bayati–Montanari (2012).
ex-ch22-18
ChallengeFor a massive-MIMO uplink with antennas, single-antenna users, and pilot symbols, design a shrinkage-based channel estimator that minimises the worst-case MSE over users with bounded channel norm . Compare with LMMSE assuming a mismatched prior.
Minimax over a ball: the problem has known structure (Pinsker-type).
Derive the optimal shrinkage factor as a function of , , , and .
Problem setup
For each user, the pilot estimate is after matched filtering, with . Estimate under the constraint .
Minimax shrinkage
The minimax estimator on a ball is a linear shrinker with . The worst-case MSE is .
Comparison
LMMSE with true uses , which is optimal only for the specific . If the true norm is mismatched (e.g., the prior was trained on a different environment), LMMSE risk can exceed minimax by a multiplicative factor of up to . The minimax estimator is safer; LMMSE is better when the prior is accurate. In production, one often uses empirical-Bayes: estimate online from recent pilots.