References & Further Reading

References

P. J. Huber, Robust Estimation of a Location Parameter, 1964
Founding paper of robust statistics; introduces the Huber loss and proves its minimax property over epsilon-contamination neighborhoods.
P. J. Huber and E. M. Ronchetti, Robust Statistics, Wiley, 2nd ed., 2009
Comprehensive modern reference on M-estimators, influence functions, and breakdown points.
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions, Wiley, 1986
Definitive treatment of the influence-function approach to robustness.
P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, Wiley, 1987
High-breakdown estimators (LMS, LTS, S-estimators) and the breakdown-point concept for regression.
E. Parzen, On Estimation of a Probability Density Function and Mode, 1962
Formalizes kernel density estimation and establishes consistency.
M. Rosenblatt, Remarks on Some Nonparametric Estimates of a Density Function, 1956
Earliest kernel density estimator; impossibility result for unbiased density estimation.
B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, 1986
Standard textbook on KDE, bandwidth selection, and the rule of thumb.
E. A. Nadaraya, On Estimating Regression, 1964
Kernel regression estimator as a weighted local average.
G. S. Watson, Smooth Regression Analysis, 1964
Independently proposes the Nadaraya-Watson estimator and establishes its asymptotics.
J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, 1996
Treatment of local polynomial regression; corrects the boundary bias of Nadaraya-Watson.
N. Aronszajn, Theory of Reproducing Kernels, 1950
Founding paper of RKHS theory.
B. Scholkopf, R. Herbrich, and A. J. Smola, A Generalized Representer Theorem, 2001
Proves the representer theorem for any monotone regularizer on the RKHS norm.
B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, 2002
Textbook on kernel methods spanning SVMs, kernel PCA, and Gaussian processes.
V. N. Vapnik, Statistical Learning Theory, Wiley, 1998
SVMs, VC dimension, and structural risk minimization.
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006
Definitive text on GP regression, hyperparameter learning, and sparse approximations.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016
Standard reference on neural networks, regularization, and training.
J. R. Hershey, J. Le Roux, and F. Weninger, Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures, 2014
Coins the term 'deep unfolding' and formalizes the model-to-network mapping.
K. Gregor and Y. LeCun, Learning Fast Approximations of Sparse Coding, 2010
Introduces LISTA: unrolled ISTA with learned per-layer weights.
V. Monga, Y. Li, and Y. C. Eldar, Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing, 2021
Comprehensive tutorial on deep unfolding in signal processing.
S. Haghighatshoar, G. Caire, Low-Complexity Massive MIMO Subspace Estimation and Tracking From Low-Dimensional Projections, 2018
H. Sarieddeen, G. Caire, Data-Driven Recovery in RF Imaging via Unfolded Orthogonal AMP, 2021

Exercises Ch 24 →