Ferkans — Interactive Telecom Tutor

ex-ch08-01

Easy

State Jensen's inequality for a concave function $\varphi$ and a random variable $X$ . Apply it to $\varphi = \log$ to derive the bound $\log \mathbb{E}[X] \geq \mathbb{E}[\log X]$ for a positive random variable $X$ .

Show Hint

Concave $\Leftrightarrow$ $-\varphi$ is convex.

Remember that for convex $\psi$ , $\mathbb{E}[\psi(X)] \geq \psi(\mathbb{E}[X])$ .

Solution

Statement

If $\varphi$ is concave and $X$ is a random variable with $\mathbb{E}|X| < \infty$ , then $\varphi(\mathbb{E}[X]) \geq \mathbb{E}[\varphi(X)]$ .

Apply to $\log$

The logarithm is concave on $(0,\infty)$ . Hence $\log(\mathbb{E}[X]) \geq \mathbb{E}[\log X]$ for positive $X$ . Equality holds iff $X$ is almost surely constant. $\blacksquare$

ex-ch08-02

Medium

Show directly (without invoking the ELBO identity) that for any valid $q$ , $\log f(\mathbf{y};\boldsymbol{\theta}) - \mathcal{F}(q,\boldsymbol{\theta}) = D(q \Vert p(\cdot\mid\mathbf{y};\boldsymbol{\theta})).$

Show Hint

Write $p(\mathbf{y},\mathbf{z};\boldsymbol{\theta}) = p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})\,f(\mathbf{y};\boldsymbol{\theta})$ and insert into $\mathcal{F}$ .

Solution

Expand $\mathcal{F}$

$\mathcal{F}(q,\boldsymbol{\theta}) = \int q(\mathbf{z})\log\frac{p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})}{q(\mathbf{z})}\,d\mathbf{z} = \int q(\mathbf{z})\log\frac{p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})\,f(\mathbf{y};\boldsymbol{\theta})}{q(\mathbf{z})}\,d\mathbf{z}$ .

Separate terms

$= \log f(\mathbf{y};\boldsymbol{\theta}) - \int q(\mathbf{z})\log\frac{q(\mathbf{z})}{p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})}\,d\mathbf{z} = \log f(\mathbf{y};\boldsymbol{\theta}) - D(q\Vert p(\cdot\mid\mathbf{y};\boldsymbol{\theta}))$ . Rearranging gives the stated identity. $\blacksquare$

ex-ch08-03

Easy

In GMM-EM, show that $\sum_k \gamma_{ik}^{(t)} = 1$ for each $i$ .

Show Hint

Look at the definition of the responsibility — the denominator is the same for all $k$ .

Solution

Sum the numerators

$\sum_k \gamma_{ik}^{(t)} = \frac{\sum_k \pi_k^{(t)}\,\mathcal{N}(\mathbf{y}_i;\boldsymbol{\mu}_k^{(t)},\boldsymbol{\Sigma}_{k}^{(t)})} {\sum_j \pi_j^{(t)}\,\mathcal{N}(\mathbf{y}_i;\boldsymbol{\mu}_j^{(t)},\boldsymbol{\Sigma}_{j}^{(t)})} = 1$ . $\blacksquare$

ex-ch08-04

Medium

Derive the M-step update for the mixing weights $\pi_k$ in a GMM by maximizing $\sum_k N_k \log \pi_k$ subject to $\sum_k \pi_k = 1$ , $\pi_k \geq 0$ . Identify the Lagrange multiplier.

Show Hint

Form the Lagrangian and differentiate.

The multiplier will be determined by the sum constraint.

Solution

Lagrangian

$\mathcal{L}(\boldsymbol{\pi},\lambda) = \sum_k N_k \log\pi_k - \lambda\!\left(\sum_k \pi_k - 1\right)$ .

Stationarity

$\partial \mathcal{L}/\partial \pi_k = N_k/\pi_k - \lambda = 0$ $\Rightarrow$ $\pi_k = N_k/\lambda$ .

Constraint

$\sum_k \pi_k = 1 \Rightarrow \lambda = \sum_k N_k = n$ . Hence $\pi_k^{(t+1)} = N_k/n$ . Non-negativity holds automatically since $N_k \geq 0$ . $\blacksquare$

ex-ch08-05

Medium

Consider a two-component 1-D GMM with known equal variances $\sigma^2 = 1$ and known mixing weights $\pi_1 = \pi_2 = 1/2$ . Write out the E-step and M-step updates for $(\mu_1, \mu_2)$ explicitly. Show that the mapping is symmetric under swapping $(\mu_1,\mu_2) \leftrightarrow (\mu_2,\mu_1)$ .

Show Hint

Responsibility is a sigmoid of the difference $y_i - (\mu_1+\mu_2)/2$ .

Solution

E-step

$\gamma_{i1}^{(t)} = \frac{\exp(-(y_i-\mu_1^{(t)})^2/2)}{\exp(-(y_i-\mu_1^{(t)})^2/2)+\exp(-(y_i-\mu_2^{(t)})^2/2)}$ $= \sigma\!\big((\mu_1^{(t)}-\mu_2^{(t)})\,y_i - \tfrac12((\mu_1^{(t)})^2-(\mu_2^{(t)})^2)\big)$ (logistic sigmoid); $\gamma_{i2}^{(t)} = 1 - \gamma_{i1}^{(t)}$ .

M-step

$\mu_k^{(t+1)} = \sum_i \gamma_{ik}^{(t)} y_i / \sum_i \gamma_{ik}^{(t)}$ for $k=1,2$ .

Symmetry

Swapping $(\mu_1,\mu_2)$ swaps $(\gamma_{i1},\gamma_{i2})$ for every $i$ , hence swaps the M-step outputs. The symmetric fixed point $\mu_1 = \mu_2$ is stationary — breaking symmetry in initialization is essential. $\blacksquare$

ex-ch08-06

Hard

Prove Fisher's identity: for a latent-variable model with smooth densities, $\nabla_{\boldsymbol{\theta}} \log f(\mathbf{y};\boldsymbol{\theta}) = \mathbb{E}_{\mathbf{Z}\mid\mathbf{y},\boldsymbol{\theta}}\!\big[\nabla_{\boldsymbol{\theta}}\log p(\mathbf{y},\mathbf{Z};\boldsymbol{\theta})\big].$ State the regularity conditions needed to interchange differentiation and integration.

Show Hint

Differentiate $f(\mathbf{y};\boldsymbol{\theta}) = \int p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})\,d\mathbf{z}$ under the integral sign.

Solution

Differentiate the marginal

Assume dominated convergence applies (bound on $|\nabla_{\boldsymbol{\theta}} p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})|$ uniform in $\boldsymbol{\theta}$ locally). Then $\nabla f(\mathbf{y};\boldsymbol{\theta}) = \int \nabla p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})\,d\mathbf{z}$ .

Log-derivative identity

$\nabla \log f(\mathbf{y};\boldsymbol{\theta}) = \frac{\nabla f(\mathbf{y};\boldsymbol{\theta})}{f(\mathbf{y};\boldsymbol{\theta})} = \int \frac{p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})}{f(\mathbf{y};\boldsymbol{\theta})} \nabla \log p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})\,d\mathbf{z}$ .

Recognize the posterior

The fraction equals $p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})$ , so the right-hand side is the posterior expectation of the complete score, as claimed. $\blacksquare$

ex-ch08-07

Hard

(Louis's formula.) Show that the observed Fisher information can be written as $-\nabla_{\boldsymbol{\theta}}^2 \log f(\mathbf{y};\boldsymbol{\theta}) = -\mathbb{E}_{\mathbf{Z}\mid\mathbf{y},\boldsymbol{\theta}}[\nabla^2 \log p(\mathbf{y},\mathbf{Z};\boldsymbol{\theta})] - \operatorname{Cov}_{\mathbf{Z}\mid\mathbf{y},\boldsymbol{\theta}}(\nabla \log p(\mathbf{y},\mathbf{Z};\boldsymbol{\theta})).$

Show Hint

Differentiate Fisher's identity once more.

Use the fact that the score has zero posterior mean at the MLE.

Solution

Differentiate Fisher's identity

Let $s(\boldsymbol{\theta},\mathbf{z}) = \nabla \log p(\mathbf{y},\mathbf{z};\boldsymbol{\theta})$ . Then $\nabla \ell(\boldsymbol{\theta}) = \int p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})s(\boldsymbol{\theta},\mathbf{z})\,d\mathbf{z}$ . Differentiating once more using the product rule: $\nabla^2 \ell = \int \nabla p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})\,s^\top\,d\mathbf{z} + \int p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})\,\nabla s\,d\mathbf{z}$ .

Score identity for the posterior

$\nabla p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta}) = p(\mathbf{z}\mid\mathbf{y};\boldsymbol{\theta})\, (s(\boldsymbol{\theta},\mathbf{z}) - \nabla \ell(\boldsymbol{\theta}))$ .

Combine

Plugging in and simplifying yields Louis's formula. The first term is the expected complete-data Hessian; the second is (minus) the posterior covariance of the complete-data score. $\blacksquare$

ex-ch08-08

Medium

Consider $n$ i.i.d. samples $y_i$ with $y_i = \theta + z_i$ where $z_i \sim \mathcal{N}(0,1)$ . Suppose, however, that only $y_i^+ = \max(y_i,0)$ is observed (the data are censored at zero). Set up the EM algorithm for estimating $\theta$ . Identify what the latent variable is and write out the E-step.

Show Hint

The latent variable is $y_i$ (the uncensored value) for those $i$ with $y_i^+ = 0$ .

The E-step requires $\mathbb{E}[y_i \mid y_i \leq 0, \theta^{(t)}]$ — a truncated Gaussian mean.

Solution

Complete data

For each censored $i$ (i.e. $y_i^+ = 0$ ), the latent variable $y_i$ satisfies $y_i \leq 0$ and $y_i \sim \mathcal{N}(\theta,1)$ .

E-step

For censored $i$ : $\hat{y}_i^{(t)} = \mathbb{E}[y_i \mid y_i \leq 0, \theta^{(t)}] = \theta^{(t)} - \frac{\phi(-\theta^{(t)})}{\Phi(-\theta^{(t)})}$ (the mean of a Gaussian truncated to $(-\infty,0]$ ), where $\phi,\Phi$ are the standard normal pdf/cdf. For uncensored $i$ : $\hat{y}_i = y_i^+$ .

M-step

$\theta^{(t+1)} = \frac{1}{n}\sum_i \hat{y}_i^{(t)}$ — simple sample mean over the imputed values.

ex-ch08-09

Easy

True or false: if the EM iterates $\boldsymbol{\theta}^{(t)}$ converge to some $\boldsymbol{\theta}^{\star}$ , then $\boldsymbol{\theta}^{\star}$ is a global maximum of the log-likelihood. Justify.

Show Hint

Think about non-convex optimization in general.

Solution

Answer

False. EM converges to a stationary point (local max, saddle point, or sometimes a boundary). The log-likelihood for mixtures and most latent-variable models is non-concave, so local maxima are generic. Multi-start is the standard remedy.

ex-ch08-10

Hard

Show that for a GMM with full covariances and $n$ samples, the log-likelihood is unbounded above. Construct an explicit sequence of parameters along which it diverges.

Show Hint

Let one component's mean equal $\mathbf{y}_1$ and send its covariance to zero.

Solution

Construction

Fix $\pi_1 = 1/2$ (say), $\boldsymbol{\mu}_1 = \mathbf{y}_1$ , $\boldsymbol{\Sigma}_{1} = \varepsilon \mathbf{I}$ , and keep the other component with any finite parameters covering all data.

Divergence

$\mathcal{N}(\mathbf{y}_1;\mathbf{y}_1,\varepsilon \mathbf{I}) = (2\pi\varepsilon)^{-d/2}\to\infty$ as $\varepsilon\to 0$ . Hence $\log f(\mathbf{y}_1;\boldsymbol{\theta}) \geq \log(\pi_1 (2\pi\varepsilon)^{-d/2})\to\infty$ , and the total log-likelihood (a sum over samples) also diverges. Remedy: regularize the covariances. $\blacksquare$

ex-ch08-11

Medium

For K-means, show that Lloyd's algorithm monotonically decreases the distortion $J = \sum_i \|\mathbf{y}_i - \boldsymbol{\mu}_{c_i}\|^2$ .

Show Hint

Both the assignment step and the update step separately decrease $J$ .

Solution

Assignment step

Holding $\{\boldsymbol{\mu}_k\}$ fixed, reassigning each $i$ to $\arg\min_k \|\mathbf{y}_i - \boldsymbol{\mu}_k\|^2$ minimizes its contribution to $J$ , so $J$ does not increase.

Update step

Holding $\{c_i\}$ fixed, $J$ becomes a separable sum of quadratics in $\boldsymbol{\mu}_k$ ; each is minimized by the mean of its cluster. So the update does not increase $J$ either. Combining, $J$ is monotonically non-increasing. Since $J \geq 0$ and there are finitely many possible assignments, Lloyd's algorithm converges in finitely many iterations. $\blacksquare$

ex-ch08-12

Medium

In the SBL model from Section 8.5, derive the M-step update $\alpha_j^{(t+1)} = 1/(\mu_j^{(t),2} + \boldsymbol{\Sigma}_{jj}^{(t)})$ .

Show Hint

The complete-data log-likelihood (viewing $\mathbf{x}$ as latent) contains $-\tfrac12 \alpha_j x_j^2 + \tfrac12 \log \alpha_j$ .

Take the posterior expectation, then maximize over $\alpha_j$ .

Solution

Expected log-prior

$\mathbb{E}_{\mathbf{x}\mid\mathbf{y};\boldsymbol{\alpha}^{(t)}}[\tfrac12 \log\alpha_j - \tfrac12 \alpha_j x_j^2] = \tfrac12 \log\alpha_j - \tfrac12 \alpha_j (\mu_j^{(t),2} + \boldsymbol{\Sigma}_{jj}^{(t)})$ .

Maximize

Differentiating with respect to $\alpha_j$ : $1/(2\alpha_j) - \tfrac12(\mu_j^{(t),2} + \boldsymbol{\Sigma}_{jj}^{(t)}) = 0$ $\Rightarrow \alpha_j^{(t+1)} = 1/(\mu_j^{(t),2} + \boldsymbol{\Sigma}_{jj}^{(t)})$ . $\blacksquare$

ex-ch08-13

Easy

Why must responsibilities be strictly positive during EM iterations? What happens if $\gamma_{ik}^{(t)} = 0$ for all $i$ and some $k$ ?

Show Hint

Look at $N_k^{(t)}$ in the M-step.

Solution

Answer

If $N_k^{(t)} = \sum_i \gamma_{ik}^{(t)} = 0$ , the M-step divisions $\boldsymbol{\mu}_k^{(t+1)}, \boldsymbol{\Sigma}_{k}^{(t+1)}$ become $0/0$ and the component is dead — it cannot recover because zero responsibility gives zero prior in the next E-step. In practice, maintain $\gamma_{ik}^{(t)} \geq \epsilon$ by regularizing $\pi_k$ , or drop the collapsed component and restart with $K-1$ .

ex-ch08-14

Challenge

(Wu 1983.) Let $\ell(\boldsymbol{\theta})$ be the incomplete-data log-likelihood, continuous on a closed bounded set $\Lambda_0 \subset \Lambda$ . Show that if $\{\boldsymbol{\theta}^{(t)}\} \subset \Lambda_0$ is produced by EM, then every limit point is a stationary point of $\ell$ provided the map $\boldsymbol{\theta} \mapsto Q(\boldsymbol{\theta}'\mid\boldsymbol{\theta})$ is continuous in both arguments.

Show Hint

Use monotonicity + boundedness to get convergence of $\ell$ .

Convert fixed-point property of a limit point into a stationarity condition via Fisher's identity.

Solution

Convergence of likelihood

$\ell(\boldsymbol{\theta}^{(t)})$ is non-decreasing (Theorem 8.2) and bounded on $\Lambda_0$ , hence converges to some $\ell^{\star}$ .

Fixed-point of a limit point

Let $\boldsymbol{\theta}^{\star}$ be a limit point, $\boldsymbol{\theta}^{(t_k)}\to\boldsymbol{\theta}^{\star}$ . By continuity of the EM map and uniqueness of the accumulation's log-likelihood $\ell^{\star}$ , $\boldsymbol{\theta}^{\star}$ is a fixed point: $\boldsymbol{\theta}^{\star} \in \arg\max Q(\cdot\mid\boldsymbol{\theta}^{\star})$ .

Stationarity

Interior fixed points satisfy $\nabla_{\boldsymbol{\theta}} Q(\boldsymbol{\theta}^{\star}\mid\boldsymbol{\theta}^{\star}) = 0$ . Fisher's identity gives $\nabla_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta}^{\star}) = 0$ . $\blacksquare$

ex-ch08-15

Medium

For a mixture of two known-variance Gaussians with known means $\mu_1, \mu_2$ and unknown mixing weight $\pi \in (0,1)$ , write the EM update for $\pi$ . Interpret the update as a weighted count.

Show Hint

Responsibility is $\gamma_i^{(t)} = \pi^{(t)}\,\phi(y_i;\mu_1)/(\pi^{(t)}\,\phi(y_i;\mu_1) + (1-\pi^{(t)})\,\phi(y_i;\mu_2))$ .

Solution

E-step

$\gamma_i^{(t)} = \Pr(Z_i = 1\mid y_i;\pi^{(t)})$ as above.

M-step

$\pi^{(t+1)} = \frac{1}{n}\sum_{i=1}^n \gamma_i^{(t)}$ — the average soft count of samples assigned to component 1.

ex-ch08-16

Hard

Consider EM for estimating the mean $\boldsymbol{\theta}$ of a Gaussian $\mathcal{N}(\boldsymbol{\theta},\mathbf{I})$ from i.i.d. observations, where each observation $y_{ij}$ is censored independently: we observe $\tilde{y}_{ij} = y_{ij}$ if $|y_{ij}|\leq c$ , else $\tilde{y}_{ij} = \text{censored}$ . Derive the EM update for $\boldsymbol{\theta}$ .

Show Hint

Treat uncensored $y$ 's as observed; censored ones as latent.

For censored $i$ : $\mathbb{E}[y \mid |y|>c, \theta^{(t)}]$ is a truncated-Gaussian integral.

Solution

Imputation

For censored $j$ of observation $i$ , let $\hat{y}_{ij}^{(t)} = \mathbb{E}[y_{ij}\mid |y_{ij}|>c, \theta_j^{(t)}] = \theta_j^{(t)} + \frac{\phi(c-\theta_j^{(t)}) - \phi(-c-\theta_j^{(t)})}{1 - \Phi(c-\theta_j^{(t)}) + \Phi(-c-\theta_j^{(t)})}$ .

M-step

$\boldsymbol{\theta}^{(t+1)} = \frac{1}{n}\sum_i \hat{\mathbf{y}}_i^{(t)}$ (component-wise, using imputed or observed values).

ex-ch08-17

Medium

Show that the EM updates for a GMM can equivalently be derived by maximizing the ELBO $\mathcal{F}(q,\boldsymbol{\theta})$ by alternating coordinate ascent, where $q$ factorizes as $q(\mathbf{z}) = \prod_i q_i(z_i)$ (which it does exactly for a GMM).

Show Hint

Optimizing $\mathcal{F}$ over $q_i$ at fixed $\boldsymbol{\theta}$ yields $q_i^{\star}(k) = \gamma_{ik}$ .

Solution

q-step

$\max_{q_i}$ of $\mathcal{F}$ has unconstrained solution $q_i^{\star}(k) \propto \pi_k\,\mathcal{N}(\mathbf{y}_i;\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_{k}) = \gamma_{ik}$ .

$\boldsymbol{\theta}$-step

$\max_{\boldsymbol{\theta}}$ of $\mathcal{F}$ with $q = \prod_i q_i$ reduces to maximizing $Q(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{(t)})$ , which recovers the Theorem 8.4 updates. $\blacksquare$

ex-ch08-18

Easy

A student claims: "EM always decreases KL divergence between $q$ and the posterior." Is this correct? If yes, justify; if no, correct it.

Show Hint

The E-step sends $D(q\Vert p(\cdot\mid\mathbf{y};\boldsymbol{\theta}^{(t)}))$ to zero, but the M-step changes $\boldsymbol{\theta}$ .

Solution

Answer

Partially correct. The E-step sets $q^{(t+1)} = p(\cdot\mid\mathbf{y};\boldsymbol{\theta}^{(t)})$ , so the KL drops to zero with respect to the current $\boldsymbol{\theta}^{(t)}$ . The M-step then moves $\boldsymbol{\theta}$ , which may increase the KL to the new posterior. The quantity that monotonically increases is $\ell(\boldsymbol{\theta})$ , not the negative KL.

ex-ch08-19

Hard

(Speed of convergence.) Near a local maximum $\boldsymbol{\theta}^{\star}$ , show that EM has linear convergence rate equal to the largest eigenvalue of $\mathbf{I} - \mathbf{I}_c^{-1}(\boldsymbol{\theta}^{\star})\mathbf{I}_o(\boldsymbol{\theta}^{\star})$ , where $\mathbf{I}_c$ is the expected complete-data Fisher information and $\mathbf{I}_o$ is the observed Fisher information. Interpret the ratio $\mathbf{I}_o/\mathbf{I}_c$ as "fraction of information not missing."

Show Hint

Linearize the EM map around $\boldsymbol{\theta}^{\star}$ ; use Louis's formula for the observed Fisher information.

Solution

Linearize

Write the EM map as $\boldsymbol{\theta}^{(t+1)} = M(\boldsymbol{\theta}^{(t)})$ . At the fixed point $\boldsymbol{\theta}^{\star}$ , the Jacobian is $DM(\boldsymbol{\theta}^{\star}) = \mathbf{I} - \mathbf{I}_c^{-1}(\boldsymbol{\theta}^{\star})\mathbf{I}_o(\boldsymbol{\theta}^{\star})$ (Dempster-Laird-Rubin 1977, §3.2).

Rate

The convergence rate is the spectral radius of $DM(\boldsymbol{\theta}^{\star})$ . When little information is missing ( $\mathbf{I}_o \approx \mathbf{I}_c$ ) the rate is small (fast convergence). When much is missing ( $\mathbf{I}_o \ll \mathbf{I}_c$ ) the rate approaches 1 (slow convergence) — which motivates acceleration methods.

ex-ch08-20

Medium

You are given 1-D data from a 3-component GMM. You fit it with $K=3$ , $K=5$ , and $K=10$ components using EM and pick the one with the highest log-likelihood on the fitting data. Why is this procedure flawed? Which criterion should you use instead?

Show Hint

More components = more parameters = higher training likelihood, always.

Solution

Flaw

Log-likelihood on the training data increases monotonically with $K$ (more flexibility $\Rightarrow$ more overfitting). In the limit $K=n$ , each component is a Dirac at a data point and the likelihood diverges.

Remedies

Use a criterion that penalizes complexity: BIC $= -2\ell + p\log n$ (where $p$ is the number of free parameters), AIC $= -2\ell + 2p$ , or cross-validated likelihood. For GMMs, BIC is standard.

Exercises

ex-ch08-01

Statement

Apply to $\log$

ex-ch08-02

Expand $\mathcal{F}$

Separate terms

ex-ch08-03

Sum the numerators

ex-ch08-04

Lagrangian

Stationarity

Constraint

ex-ch08-05

E-step

M-step

Symmetry

ex-ch08-06

Differentiate the marginal

Log-derivative identity

Recognize the posterior

ex-ch08-07

Differentiate Fisher's identity

Score identity for the posterior

Combine

ex-ch08-08

Complete data

E-step

M-step

ex-ch08-09

Answer

ex-ch08-10

Construction

Divergence

ex-ch08-11

Assignment step

Update step

ex-ch08-12

Expected log-prior

Maximize

ex-ch08-13

Answer

ex-ch08-14

Convergence of likelihood

Fixed-point of a limit point

Stationarity

ex-ch08-15

E-step

M-step

ex-ch08-16

Imputation

M-step

ex-ch08-17

q-step

$\boldsymbol{\theta}$-step

ex-ch08-18

Answer

ex-ch08-19

Linearize

Rate

ex-ch08-20

Flaw

Remedies