References & Further Reading

References

W. James and C. Stein, Estimation with Quadratic Loss, 1961
The original paper exhibiting the explicit shrinkage estimator that dominates the MLE in dimension N >= 3 under squared-error loss. Turns Stein's 1956 inadmissibility result into a constructive estimator.
C. Stein, Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution, 1956
The original announcement of the Stein phenomenon, proving that the MLE is inadmissible in dimension 3 and above by a non-constructive argument.
B. Efron and C. Morris, Stein's Estimation Rule and Its Competitors — An Empirical Bayes Approach, 1973
Re-derives the James-Stein estimator as an empirical-Bayes shrinkage rule and provides the famous batting-average example comparing JS to MLE on real data.
B. Efron and C. Morris, Data Analysis Using Stein's Estimator and Its Generalizations, 1975
The canonical applied paper on James-Stein shrinkage, including the 18-player baseball study that became the standard illustration of the Stein phenomenon.
D. L. Donoho and I. M. Johnstone, Minimax Risk over l_p-Balls for l_q-Error, 1994
Establishes the minimax rates for sparse estimation over l_p-balls, showing that soft-thresholding achieves the minimax rate up to constants.
D. L. Donoho and I. M. Johnstone, Minimax Estimation via Wavelet Shrinkage, 1998
Proves that wavelet-domain soft-thresholding is asymptotically minimax over broad Besov function classes. The statistical ancestor of LASSO and compressed sensing.
R. Tibshirani, Regression Shrinkage and Selection via the Lasso, 1996
Introduces the LASSO estimator, demonstrating that the l_1-penalty produces sparse solutions and selects variables simultaneously with estimation.
H. Zou and T. Hastie, Regularization and Variable Selection via the Elastic Net, 2005
Proposes the elastic-net penalty combining l_1 and l_2 terms, which retains LASSO's sparsity while handling correlated predictors through ridge-like grouping.
A. E. Hoerl and R. W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, 1970
The original ridge-regression paper. Introduces l_2-regularised least squares as a remedy for ill-conditioned design matrices.
V. A. Marchenko and L. A. Pastur, Distribution of Eigenvalues for Some Sets of Random Matrices, 1967
The foundational random-matrix-theory paper establishing the limiting spectral density of Wishart matrices. The edge singularity at gamma=1 is the origin of the OLS blow-up.
Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, 2010
Standard reference on large-dimensional random matrix theory, including Marchenko-Pastur, Stieltjes transforms, and proportional-asymptotic analysis of sample covariance matrices.
M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, 2019
Comprehensive modern textbook covering concentration, regularised M-estimators, minimax bounds, and sparse recovery guarantees. Primary reference for the non-asymptotic theory used throughout this chapter.
A. B. Tsybakov, Introduction to Nonparametric Estimation, 2009
Standard graduate-level text on minimax estimation, Le Cam's lemma, Fano's inequality, and two-point bounds. Core reference for the minimax-game perspective in s04.
E. Candes and T. Tao, The Dantzig Selector: Statistical Estimation when p is Much Larger than n, 2007
Establishes oracle inequalities for sparse estimation via l_1-minimisation, matching the minimax rate s log(N/s) / M up to constants under restricted isometry conditions.
P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous Analysis of Lasso and Dantzig Selector, 2009
Unified oracle-inequality analysis of LASSO and Dantzig selector, proving both achieve the minimax rate for sparse recovery under the restricted eigenvalue condition.
G. Caire, W. Zhang, Covariance Shrinkage for Massive-MIMO ISAC with Few Pilot Snapshots, 2021

Exercises Ch 23 →