References & Further Reading
References
- W. James and C. Stein, Estimation with Quadratic Loss, 1961
The original paper exhibiting the explicit shrinkage estimator that dominates the MLE in dimension N >= 3 under squared-error loss. Turns Stein's 1956 inadmissibility result into a constructive estimator.
- C. Stein, Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution, 1956
The original announcement of the Stein phenomenon, proving that the MLE is inadmissible in dimension 3 and above by a non-constructive argument.
- B. Efron and C. Morris, Stein's Estimation Rule and Its Competitors — An Empirical Bayes Approach, 1973
Re-derives the James-Stein estimator as an empirical-Bayes shrinkage rule and provides the famous batting-average example comparing JS to MLE on real data.
- B. Efron and C. Morris, Data Analysis Using Stein's Estimator and Its Generalizations, 1975
The canonical applied paper on James-Stein shrinkage, including the 18-player baseball study that became the standard illustration of the Stein phenomenon.
- D. L. Donoho and I. M. Johnstone, Minimax Risk over l_p-Balls for l_q-Error, 1994
Establishes the minimax rates for sparse estimation over l_p-balls, showing that soft-thresholding achieves the minimax rate up to constants.
- D. L. Donoho and I. M. Johnstone, Minimax Estimation via Wavelet Shrinkage, 1998
Proves that wavelet-domain soft-thresholding is asymptotically minimax over broad Besov function classes. The statistical ancestor of LASSO and compressed sensing.
- R. Tibshirani, Regression Shrinkage and Selection via the Lasso, 1996
Introduces the LASSO estimator, demonstrating that the l_1-penalty produces sparse solutions and selects variables simultaneously with estimation.
- H. Zou and T. Hastie, Regularization and Variable Selection via the Elastic Net, 2005
Proposes the elastic-net penalty combining l_1 and l_2 terms, which retains LASSO's sparsity while handling correlated predictors through ridge-like grouping.
- A. E. Hoerl and R. W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, 1970
The original ridge-regression paper. Introduces l_2-regularised least squares as a remedy for ill-conditioned design matrices.
- V. A. Marchenko and L. A. Pastur, Distribution of Eigenvalues for Some Sets of Random Matrices, 1967
The foundational random-matrix-theory paper establishing the limiting spectral density of Wishart matrices. The edge singularity at gamma=1 is the origin of the OLS blow-up.
- Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, 2010
Standard reference on large-dimensional random matrix theory, including Marchenko-Pastur, Stieltjes transforms, and proportional-asymptotic analysis of sample covariance matrices.
- M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, 2019
Comprehensive modern textbook covering concentration, regularised M-estimators, minimax bounds, and sparse recovery guarantees. Primary reference for the non-asymptotic theory used throughout this chapter.
- A. B. Tsybakov, Introduction to Nonparametric Estimation, 2009
Standard graduate-level text on minimax estimation, Le Cam's lemma, Fano's inequality, and two-point bounds. Core reference for the minimax-game perspective in s04.
- E. Candes and T. Tao, The Dantzig Selector: Statistical Estimation when p is Much Larger than n, 2007
Establishes oracle inequalities for sparse estimation via l_1-minimisation, matching the minimax rate s log(N/s) / M up to constants under restricted isometry conditions.
- P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous Analysis of Lasso and Dantzig Selector, 2009
Unified oracle-inequality analysis of LASSO and Dantzig selector, proving both achieve the minimax rate for sparse recovery under the restricted eigenvalue condition.
- G. Caire, W. Zhang, Covariance Shrinkage for Massive-MIMO ISAC with Few Pilot Snapshots, 2021