Prerequisites & Notation
Before You Begin
The EM algorithm sits at the intersection of maximum-likelihood estimation, convex analysis, and Bayesian inference. The following items are assumed. If any of them feels unfamiliar, revisit the linked chapter before proceeding.
- Maximum likelihood estimation: score function, Fisher information, asymptotic normality(Review ch07)
Self-check: Can you write down the MLE for the mean and variance of a univariate Gaussian from i.i.d. samples?
- Multivariate Gaussian distribution: joint/marginal/conditional, precision matrix, log-density(Review ch02)
Self-check: Can you compute and identify the quadratic term?
- Jensen's inequality and convex functions; concavity of the logarithm
Self-check: Can you state Jensen's inequality and identify which direction applies to ?
- Kullback-Leibler divergence and its non-negativity(Review ch02)
Self-check: Can you prove with equality iff almost everywhere?
- Basic posterior computation: Bayes' rule, conjugate updates(Review ch06)
Self-check: Given a likelihood and prior, can you compute the unnormalized posterior and identify its normalizing constant?
Notation for This Chapter
Symbols used throughout Chapter 8. We write for the observed (incomplete) data, for the latent (hidden) variables, and for the parameters to be estimated.
| Symbol | Meaning | Introduced |
|---|---|---|
| Observed (incomplete) data; the sample we actually see | s01 | |
| Latent / hidden / missing variables (never observed) | s01 | |
| Parameter vector to be estimated | s01 | |
| Complete-data joint density | s01 | |
| Incomplete-data (marginal) density: | s01 | |
| Incomplete-data log-likelihood (the objective we want to maximize) | s01 | |
| Parameter estimate at iteration | s03 | |
| EM auxiliary function: | s03 | |
| Free energy / evidence lower bound (ELBO) | s02 | |
| Responsibility: posterior probability that sample belongs to component | s04 | |
| Mixing weight of component in a mixture model | s04 | |
| Number of mixture components (or HMM states) | s04 |