Chapter Summary

Chapter 6 Summary: Maximum Likelihood Estimation

Key Points

1.
The MLE maximizes the likelihood (or log-likelihood) of the observed data: $g_{\text{ml}}(\mathbf{y}) = \arg\max_\theta \ell_n(\theta)$ . At interior maxima it solves the score equation $\nabla_\theta \ell_n = \mathbf{0}$ .
2.
Under regularity, the MLE is consistent ( $\hat\theta_n \xrightarrow{p} \theta_0$ ), asymptotically normal ( $\sqrt{n}(\hat\theta_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, J_1(\theta_0)^{-1})$ ), and asymptotically efficient (achieves the CRLB).
3.
The invariance property: for any function $u$ , $\hat{u(\theta)}_{\text{ml}} = u(\hat\theta_{\text{ml}})$ . This lets us transport MLEs across parameterizations without re-deriving.
4.
Support-dependent models (uniform on $[0,\theta]$ and similar) break regularity: the MLE converges at rate $n$ instead of $\sqrt{n}$ , the limit is non-Gaussian, and the CRLB does not apply.
5.
For Gaussian linear models $\mathbf{y} = \mathbf{A}\boldsymbol{\theta} + \mathbf{w}$ , the MLE is the closed-form weighted least squares $(\mathbf{A}^\mathsf{T}\boldsymbol{\Sigma}^{-1}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}\boldsymbol{\Sigma}^{-1}\mathbf{y}$ and achieves the CRLB exactly.
6.
Newton-Raphson uses the observed Hessian and converges quadratically locally but can diverge; Fisher scoring replaces the Hessian by the FIM and is always positive-definite. The two coincide for exponential families.
7.
Periodogram = MLE of frequency for a single complex sinusoid in AWGN; matched filter peak = MLE of delay; DOA MLE reduces to MUSIC/ESPRIT in the multi-source high-SNR regime.
8.
Iterative ML in practice needs multi-start initialization for non-concave likelihoods, Armijo damping for Newton, and log-domain arithmetic for numerical stability.

Looking Ahead

Chapter 7 develops Bayesian estimation: when a prior $p(\theta)$ is available, the MAP estimator replaces the MLE as the natural point estimate, and the MMSE/LMMSE estimators minimize expected squared error. Chapter 8 develops the EM algorithm for ML with latent variables, turning the intractable marginal likelihood of hidden-variable models into a sequence of tractable complete-data M-steps. The signal processing MLEs introduced here reappear throughout Part III (linear estimation, Kalman filtering, channel estimation) and Part IV (sparse recovery, massive random access).

MLE for Signal Processing Exercises