Chapter Summary

Key Points

1.
Proportional asymptotics is the correct regime. When $N$ and $M$ are both large but comparable ( $\gamma=N/M=\Theta(1)$ ), classical consistency results break down. The Marchenko–Pastur law describes the limiting eigenvalue distribution of $\tfrac1M\mathbf{A}^T\mathbf{A}$ and drives every risk calculation in the chapter.
2.
OLS risk blows up. In the proportional regime, the per-coordinate OLS risk is $\gamma\sigma^2/(1-\gamma)$ and diverges as $\gamma\to 1$ . The MLE fails not because it is a bad estimator but because the problem becomes ill-conditioned.
3.
Ridge has a closed-form optimal regularization. Under a Gaussian prior, optimal ridge is $\lambda^*=1/\text{SNR}$ — a universally applicable heuristic that coincides with LMMSE. Ridge is finite even for $\gamma\geq 1$ where OLS is undefined.
4.
LASSO promotes sparsity and is convex. The $\ell_1$ penalty produces sparse solutions by geometric virtue of the diamond-shaped sub-level sets. ISTA/FISTA/AMP solve it efficiently.
5.
James–Stein dominates the MLE for $N\geq 3$ . Shrinkage toward any fixed anchor reduces risk uniformly — a purely frequentist guarantee that requires no prior. The empirical-Bayes interpretation makes the shrinkage rule concrete: learn the prior variance from the data and shrink accordingly.
6.
Minimax rates characterise sample complexity. For $s$ -sparse signals the minimax rate is $\Theta(\sigma^2 s\log(N/s)/M)$ — the information-theoretic floor for sparse estimation. The $\log(N/s)$ factor is the price of not knowing the support.
7.
All of this is convex. Ridge, LASSO, elastic net, and the Bayes estimators under log-concave priors are convex programmes. The convexity reflex — flag convex problems immediately — applies throughout.

Looking Ahead

Chapter 23 replaces the Gaussian noise assumption with robust and non-parametric alternatives (Huber, RKHS, Gaussian processes, deep learning). Chapter 24 completes the estimation-theoretic picture with the Van Trees and Ziv–Zakai bounds, and the MMSE–mutual-information identity that connects estimation to information theory. The high-dimensional machinery built here underpins every realistic estimation problem in modern wireless systems, from massive-MIMO channel estimation to compressed-sensing radar to grant-free mMTC.

Minimax Estimation Exercises