Chapter Summary
Chapter Summary
Key Points
- 1.
Latent-variable models make direct MLE hard. When the observed-data likelihood involves a marginalization, its logarithm becomes a log-sum-of-exponentials with no closed-form maximizer. EM turns this into a sequence of tractable optimizations.
- 2.
Jensen's inequality yields the ELBO. For any distribution , , with equality exactly when . The ELBO is the single expression that unifies EM, variational inference, and free-energy minimization.
- 3.
EM is coordinate ascent on the ELBO. The E-step sets (closing the Jensen gap); the M-step maximizes . The incomplete-data log-likelihood is monotonically non-decreasing, and fixed points of EM are stationary points of the likelihood.
- 4.
The GMM is the prototypical EM application. The E-step computes responsibilities ; the M-step performs a responsibility-weighted Gaussian MLE. K-means is the hard-assignment limit, Baum-Welch is the HMM counterpart, and SBL is the linear-Gaussian latent-coefficient version with automatic relevance determination.
- 5.
Practical EM requires care. Local maxima and saddle points are the rule rather than the exception — always run multiple restarts. Unconstrained GMM covariances can collapse onto data points (likelihood unbounded); regularization is mandatory. Label switching is an artifact of permutation symmetry, not a convergence failure.
- 6.
EM is the ancestor of modern variational inference. Variational EM restricts to a tractable family; gradient-based variational EM (VAEs) replaces the per-sample E-step by an amortized encoder. All of these methods are instances of lower-bound maximization on the same ELBO.
Looking Ahead
Chapter 9 introduces the discrete-time Wiener filter and its time-domain realization via the Yule-Walker equations. We will see Levinson-Durbin recursion turn a Toeplitz linear system into an recursive procedure — a counterpart, in linear estimation, of the computational parsimony that EM brings to non-linear estimation. Chapter 10 then takes the state-space leap: the Kalman filter, which combines Wiener-style linear MMSE with the recursive structure of a Markov model. Readers interested in how EM itself extends to dynamical models should note that Baum-Welch (from this chapter) and Kalman smoothing (next chapter) combine to give the EM algorithm for state-space parameter estimation.