Prerequisites & Notation

Before You Begin

The EM algorithm sits at the intersection of maximum-likelihood estimation, convex analysis, and Bayesian inference. The following items are assumed. If any of them feels unfamiliar, revisit the linked chapter before proceeding.

  • Maximum likelihood estimation: score function, Fisher information, asymptotic normality(Review ch07)

    Self-check: Can you write down the MLE for the mean and variance of a univariate Gaussian from nn i.i.d. samples?

  • Multivariate Gaussian distribution: joint/marginal/conditional, precision matrix, log-density(Review ch02)

    Self-check: Can you compute logN(y;μ,Σ)\log \mathcal{N}(\mathbf{y};\boldsymbol{\mu},\boldsymbol{\Sigma}) and identify the quadratic term?

  • Jensen's inequality and convex functions; concavity of the logarithm

    Self-check: Can you state Jensen's inequality and identify which direction applies to log\log?

  • Kullback-Leibler divergence and its non-negativity(Review ch02)

    Self-check: Can you prove D(pq)0D(p \Vert q) \geq 0 with equality iff p=qp = q almost everywhere?

  • Basic posterior computation: Bayes' rule, conjugate updates(Review ch06)

    Self-check: Given a likelihood and prior, can you compute the unnormalized posterior and identify its normalizing constant?

Notation for This Chapter

Symbols used throughout Chapter 8. We write Y\mathbf{Y} for the observed (incomplete) data, Z\mathbf{Z} for the latent (hidden) variables, and θ\boldsymbol{\theta} for the parameters to be estimated.

SymbolMeaningIntroduced
Y\mathbf{Y}Observed (incomplete) data; the sample we actually sees01
Z\mathbf{Z}Latent / hidden / missing variables (never observed)s01
θ\boldsymbol{\theta}Parameter vector to be estimateds01
p(y,z;θ)p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})Complete-data joint densitys01
p(y;θ)p(\mathbf{y};\boldsymbol{\theta})Incomplete-data (marginal) density: p(y,z;θ)dz\int p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})\,d\mathbf{z}s01
(θ)=logp(y;θ)\ell(\boldsymbol{\theta}) = \log p(\mathbf{y};\boldsymbol{\theta})Incomplete-data log-likelihood (the objective we want to maximize)s01
θ(t)\boldsymbol{\theta}^{(t)}Parameter estimate at iteration tts03
Q(θθ(t))Q(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{(t)})EM auxiliary function: EZY,θ(t)[logp(Y,Z;θ)]\mathbb{E}_{\mathbf{Z}\mid\mathbf{Y},\boldsymbol{\theta}^{(t)}}[\log p(\mathbf{Y},\mathbf{Z};\boldsymbol{\theta})]s03
F(q,θ)\mathcal{F}(q,\boldsymbol{\theta})Free energy / evidence lower bound (ELBO)s02
γik\gamma_{ik}Responsibility: posterior probability that sample ii belongs to component kks04
πk\pi_kMixing weight of component kk in a mixture models04
KKNumber of mixture components (or HMM states)s04