Prerequisites & Notation

Before You Begin

Maximum likelihood is the most widely used estimation principle in engineering statistics. Before studying its properties and computation, make sure you are comfortable with the following material.

Fisher information and the Cramer-Rao bound(Review ch05)
Self-check: State the Cramer-Rao inequality for a scalar unbiased estimator and define the Fisher information.
Score function and regularity conditions(Review ch05)
Self-check: Show that the expected score is zero under regularity.
Exponential family and sufficient statistics(Review ch05)
Self-check: Identify the natural parameter and sufficient statistic of the Gaussian family with unknown mean.
Convergence in probability and in distribution, WLLN, CLT(Review ch04)
Self-check: State the CLT for i.i.d. random variables with finite variance.
Multivariate Gaussian density, quadratic forms(Review ch02)
Self-check: Write the log-density of a multivariate Gaussian as a function of the inverse covariance.
Unconstrained optimization, gradient, Hessian
Self-check: Write the Newton update for a twice-differentiable scalar objective.

Notation Used in This Chapter

This chapter uses the estimation notation from Chapter 5 and adds iteration-related symbols. Function arguments appear after the notation tokens.

Symbol	Meaning	Introduced
$\theta, \theta_0 \in \Lambda$	Unknown parameter (generic, true value); parameter domain.	s01
$\mathbf{y} \in \mathcal{Y}$	Observation vector $\mathbf{y} = (y_1, \ldots, y_n)$ .	s01
$f_\theta(\mathbf{y})$	Likelihood function, the density of the observation at parameter $\theta$ .	s01
$\ell_n(\theta)$	Log-likelihood $\ell_n(\theta) = \log f_\theta(\mathbf{y})$ .	s01
$s(\theta; y)$	Score $s(\theta; y) = \partial \log f_\theta(y)/\partial \theta$ .	s01
$g_{\text{ml}}(\mathbf{y})$	Maximum likelihood estimator.	s01
$J(\theta)$	Fisher information (scalar parameter).	s02
$\mathbf{J}(\boldsymbol{\theta})$	Fisher information matrix (vector parameter).	s02
$J_1(\theta)$	Per-sample Fisher information, with $J(\theta) = n J_1(\theta)$ in the i.i.d. case.	s02
$\xrightarrow{p}, \xrightarrow{d}$	Convergence in probability, convergence in distribution.	s02
$\theta^{(k)}$	$k$ -th iterate of Newton-Raphson or Fisher scoring.	s03
$D(f_{\theta_0} \\| f_\theta)$	KL divergence between the true density and the candidate density.	s02

← Ch 5 The ML Principle