References & Further Reading

References

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm, 1977
The foundational paper that crystallized EM as a general algorithm and proved monotonicity.
C. F. J. Wu, On the Convergence Properties of the EM Algorithm, 1983
The definitive convergence analysis of EM — stationary-point convergence under regularity.
G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, Wiley, 2nd ed., 2008
Comprehensive monograph covering variants, acceleration, and applications.
G. J. McLachlan and D. Peel, Finite Mixture Models, Wiley, 2000
Standard reference for GMMs and related mixture-model fitting.
C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Chapter 9 gives a clear ELBO-based derivation of EM; Chapter 10 covers variational EM.
K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012
Chapter 11 treats EM for mixtures, HMMs, and factor analyzers.
R. M. Neal and G. E. Hinton, A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants, 1998
The free-energy / coordinate-ascent reformulation that made variational EM possible.
L. E. Baum, T. Petrie, G. Soules, and N. Weiss, A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains, 1970
The Baum-Welch algorithm — EM for HMMs, pre-dating Dempster-Laird-Rubin.
L. R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, 1989
The definitive engineering tutorial on HMMs and Baum-Welch training.
M. E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, 2001
Original SBL paper — EM on a hierarchical Gaussian prior with automatic relevance determination.
D. P. Wipf and B. D. Rao, Sparse Bayesian Learning for Basis Selection, 2004
Connects SBL to $\ell_0$-penalized problems and establishes global-optimum conditions.
M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, Compressive Sensing-Based Adaptive Active User Detection and Channel Estimation: Massive Access Meets Massive MIMO, 2020
SBL/EM applied to massive random access with massive MIMO receivers.
S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, 1993
Chapter 7 covers EM for frequency estimation and other signal-processing problems.
K. Pearson, Contributions to the Mathematical Theory of Evolution, 1894
Historical first Gaussian-mixture fit, via method of moments, long before EM existed.