References & Further Reading
References
- P. J. Huber, Robust Estimation of a Location Parameter, 1964
Founding paper of robust statistics; introduces the Huber loss and proves its minimax property over epsilon-contamination neighborhoods.
- P. J. Huber and E. M. Ronchetti, Robust Statistics, Wiley, 2nd ed., 2009
Comprehensive modern reference on M-estimators, influence functions, and breakdown points.
- F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, 1986
Definitive treatment of the influence-function approach to robustness.
- P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, Wiley, 1987
High-breakdown estimators (LMS, LTS, S-estimators) and the breakdown-point concept for regression.
- E. Parzen, On Estimation of a Probability Density Function and Mode, 1962
Formalizes kernel density estimation and establishes consistency.
- M. Rosenblatt, Remarks on Some Nonparametric Estimates of a Density Function, 1956
Earliest kernel density estimator; impossibility result for unbiased density estimation.
- B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986
Standard textbook on KDE, bandwidth selection, and the rule of thumb.
- E. A. Nadaraya, On Estimating Regression, 1964
Kernel regression estimator as a weighted local average.
- G. S. Watson, Smooth Regression Analysis, 1964
Independently proposes the Nadaraya-Watson estimator and establishes its asymptotics.
- J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, 1996
Treatment of local polynomial regression; corrects the boundary bias of Nadaraya-Watson.
- N. Aronszajn, Theory of Reproducing Kernels, 1950
Founding paper of RKHS theory.
- B. Scholkopf, R. Herbrich, and A. J. Smola, A Generalized Representer Theorem, 2001
Proves the representer theorem for any monotone regularizer on the RKHS norm.
- B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002
Textbook on kernel methods spanning SVMs, kernel PCA, and Gaussian processes.
- V. N. Vapnik, Statistical Learning Theory, Wiley, 1998
SVMs, VC dimension, and structural risk minimization.
- C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006
Definitive text on GP regression, hyperparameter learning, and sparse approximations.
- I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016
Standard reference on neural networks, regularization, and training.
- J. R. Hershey, J. Le Roux, and F. Weninger, Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures, 2014
Coins the term 'deep unfolding' and formalizes the model-to-network mapping.
- K. Gregor and Y. LeCun, Learning Fast Approximations of Sparse Coding, 2010
Introduces LISTA: unrolled ISTA with learned per-layer weights.
- V. Monga, Y. Li, and Y. C. Eldar, Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing, 2021
Comprehensive tutorial on deep unfolding in signal processing.
- S. Haghighatshoar, G. Caire, Low-Complexity Massive MIMO Subspace Estimation and Tracking From Low-Dimensional Projections, 2018
- H. Sarieddeen, G. Caire, Data-Driven Recovery in RF Imaging via Unfolded Orthogonal AMP, 2021