References & Further Reading

References

  1. W. James and C. Stein, Estimation with Quadratic Loss, 1961

    The original paper exhibiting the explicit shrinkage estimator that dominates the MLE in dimension N >= 3 under squared-error loss. Turns Stein's 1956 inadmissibility result into a constructive estimator.

  2. C. Stein, Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution, 1956

    The original announcement of the Stein phenomenon, proving that the MLE is inadmissible in dimension 3 and above by a non-constructive argument.

  3. B. Efron and C. Morris, Stein's Estimation Rule and Its Competitors — An Empirical Bayes Approach, 1973

    Re-derives the James-Stein estimator as an empirical-Bayes shrinkage rule and provides the famous batting-average example comparing JS to MLE on real data.

  4. B. Efron and C. Morris, Data Analysis Using Stein's Estimator and Its Generalizations, 1975

    The canonical applied paper on James-Stein shrinkage, including the 18-player baseball study that became the standard illustration of the Stein phenomenon.

  5. D. L. Donoho and I. M. Johnstone, Minimax Risk over l_p-Balls for l_q-Error, 1994

    Establishes the minimax rates for sparse estimation over l_p-balls, showing that soft-thresholding achieves the minimax rate up to constants.

  6. D. L. Donoho and I. M. Johnstone, Minimax Estimation via Wavelet Shrinkage, 1998

    Proves that wavelet-domain soft-thresholding is asymptotically minimax over broad Besov function classes. The statistical ancestor of LASSO and compressed sensing.

  7. R. Tibshirani, Regression Shrinkage and Selection via the Lasso, 1996

    Introduces the LASSO estimator, demonstrating that the l_1-penalty produces sparse solutions and selects variables simultaneously with estimation.

  8. H. Zou and T. Hastie, Regularization and Variable Selection via the Elastic Net, 2005

    Proposes the elastic-net penalty combining l_1 and l_2 terms, which retains LASSO's sparsity while handling correlated predictors through ridge-like grouping.

  9. A. E. Hoerl and R. W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, 1970

    The original ridge-regression paper. Introduces l_2-regularised least squares as a remedy for ill-conditioned design matrices.

  10. V. A. Marchenko and L. A. Pastur, Distribution of Eigenvalues for Some Sets of Random Matrices, 1967

    The foundational random-matrix-theory paper establishing the limiting spectral density of Wishart matrices. The edge singularity at gamma=1 is the origin of the OLS blow-up.

  11. Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, 2010

    Standard reference on large-dimensional random matrix theory, including Marchenko-Pastur, Stieltjes transforms, and proportional-asymptotic analysis of sample covariance matrices.

  12. M. J. Wainwright, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, 2019

    Comprehensive modern textbook covering concentration, regularised M-estimators, minimax bounds, and sparse recovery guarantees. Primary reference for the non-asymptotic theory used throughout this chapter.

  13. A. B. Tsybakov, Introduction to Nonparametric Estimation, 2009

    Standard graduate-level text on minimax estimation, Le Cam's lemma, Fano's inequality, and two-point bounds. Core reference for the minimax-game perspective in s04.

  14. E. Candes and T. Tao, The Dantzig Selector: Statistical Estimation when p is Much Larger than n, 2007

    Establishes oracle inequalities for sparse estimation via l_1-minimisation, matching the minimax rate s log(N/s) / M up to constants under restricted isometry conditions.

  15. P. J. Bickel, Y. Ritov, and A. B. Tsybakov, Simultaneous Analysis of Lasso and Dantzig Selector, 2009

    Unified oracle-inequality analysis of LASSO and Dantzig selector, proving both achieve the minimax rate for sparse recovery under the restricted eigenvalue condition.

  16. G. Caire, W. Zhang, Covariance Shrinkage for Massive-MIMO ISAC with Few Pilot Snapshots, 2021