Chapter Summary

Chapter 6 Summary: Maximum Likelihood Estimation

Key Points

  • 1.

    The MLE maximizes the likelihood (or log-likelihood) of the observed data: gml(y)=argmaxθn(θ)g_{\text{ml}}(\mathbf{y}) = \arg\max_\theta \ell_n(\theta). At interior maxima it solves the score equation θn=0\nabla_\theta \ell_n = \mathbf{0}.

  • 2.

    Under regularity, the MLE is consistent (θ^npθ0\hat\theta_n \xrightarrow{p} \theta_0), asymptotically normal (n(θ^nθ0)dN(0,J1(θ0)1)\sqrt{n}(\hat\theta_n - \theta_0) \xrightarrow{d} \mathcal{N}(0, J_1(\theta_0)^{-1})), and asymptotically efficient (achieves the CRLB).

  • 3.

    The invariance property: for any function uu, u(θ)^ml=u(θ^ml)\hat{u(\theta)}_{\text{ml}} = u(\hat\theta_{\text{ml}}). This lets us transport MLEs across parameterizations without re-deriving.

  • 4.

    Support-dependent models (uniform on [0,θ][0,\theta] and similar) break regularity: the MLE converges at rate nn instead of n\sqrt{n}, the limit is non-Gaussian, and the CRLB does not apply.

  • 5.

    For Gaussian linear models y=Aθ+w\mathbf{y} = \mathbf{A}\boldsymbol{\theta} + \mathbf{w}, the MLE is the closed-form weighted least squares (ATΣ1A)1ATΣ1y(\mathbf{A}^\mathsf{T}\boldsymbol{\Sigma}^{-1}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}\boldsymbol{\Sigma}^{-1}\mathbf{y} and achieves the CRLB exactly.

  • 6.

    Newton-Raphson uses the observed Hessian and converges quadratically locally but can diverge; Fisher scoring replaces the Hessian by the FIM and is always positive-definite. The two coincide for exponential families.

  • 7.

    Periodogram = MLE of frequency for a single complex sinusoid in AWGN; matched filter peak = MLE of delay; DOA MLE reduces to MUSIC/ESPRIT in the multi-source high-SNR regime.

  • 8.

    Iterative ML in practice needs multi-start initialization for non-concave likelihoods, Armijo damping for Newton, and log-domain arithmetic for numerical stability.

Looking Ahead

Chapter 7 develops Bayesian estimation: when a prior p(θ)p(\theta) is available, the MAP estimator replaces the MLE as the natural point estimate, and the MMSE/LMMSE estimators minimize expected squared error. Chapter 8 develops the EM algorithm for ML with latent variables, turning the intractable marginal likelihood of hidden-variable models into a sequence of tractable complete-data M-steps. The signal processing MLEs introduced here reappear throughout Part III (linear estimation, Kalman filtering, channel estimation) and Part IV (sparse recovery, massive random access).