Prerequisites & Notation

Before You Begin

The EM algorithm sits at the intersection of maximum-likelihood estimation, convex analysis, and Bayesian inference. The following items are assumed. If any of them feels unfamiliar, revisit the linked chapter before proceeding.

Maximum likelihood estimation: score function, Fisher information, asymptotic normality(Review ch07)
Self-check: Can you write down the MLE for the mean and variance of a univariate Gaussian from $n$ i.i.d. samples?
Multivariate Gaussian distribution: joint/marginal/conditional, precision matrix, log-density(Review ch02)
Self-check: Can you compute $\log \mathcal{N}(\mathbf{y};\boldsymbol{\mu},\boldsymbol{\Sigma})$ and identify the quadratic term?
Jensen's inequality and convex functions; concavity of the logarithm
Self-check: Can you state Jensen's inequality and identify which direction applies to $\log$ ?
Kullback-Leibler divergence and its non-negativity(Review ch02)
Self-check: Can you prove $D(p \Vert q) \geq 0$ with equality iff $p = q$ almost everywhere?
Basic posterior computation: Bayes' rule, conjugate updates(Review ch06)
Self-check: Given a likelihood and prior, can you compute the unnormalized posterior and identify its normalizing constant?

Notation for This Chapter

Symbols used throughout Chapter 8. We write $\mathbf{Y}$ for the observed (incomplete) data, $\mathbf{Z}$ for the latent (hidden) variables, and $\boldsymbol{\theta}$ for the parameters to be estimated.

Symbol	Meaning	Introduced
$\mathbf{Y}$	Observed (incomplete) data; the sample we actually see	s01
$\mathbf{Z}$	Latent / hidden / missing variables (never observed)	s01
$\boldsymbol{\theta}$	Parameter vector to be estimated	s01
$p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})$	Complete-data joint density	s01
$p(\mathbf{y};\boldsymbol{\theta})$	Incomplete-data (marginal) density: $\int p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})\,d\mathbf{z}$	s01
$\ell(\boldsymbol{\theta}) = \log p(\mathbf{y};\boldsymbol{\theta})$	Incomplete-data log-likelihood (the objective we want to maximize)	s01
$\boldsymbol{\theta}^{(t)}$	Parameter estimate at iteration $t$	s03
$Q(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{(t)})$	EM auxiliary function: $\mathbb{E}_{\mathbf{Z}\mid\mathbf{Y},\boldsymbol{\theta}^{(t)}}[\log p(\mathbf{Y},\mathbf{Z};\boldsymbol{\theta})]$	s03
$\mathcal{F}(q,\boldsymbol{\theta})$	Free energy / evidence lower bound (ELBO)	s02
$\gamma_{ik}$	Responsibility: posterior probability that sample $i$ belongs to component $k$	s04
$\pi_k$	Mixing weight of component $k$ in a mixture model	s04
$K$	Number of mixture components (or HMM states)	s04

← Ch 7 Motivation: Missing Data and Latent Variables