Prerequisites & Notation

Before You Begin

Maximum likelihood is the most widely used estimation principle in engineering statistics. Before studying its properties and computation, make sure you are comfortable with the following material.

  • Fisher information and the Cramer-Rao bound(Review ch05)

    Self-check: State the Cramer-Rao inequality for a scalar unbiased estimator and define the Fisher information.

  • Score function and regularity conditions(Review ch05)

    Self-check: Show that the expected score is zero under regularity.

  • Exponential family and sufficient statistics(Review ch05)

    Self-check: Identify the natural parameter and sufficient statistic of the Gaussian family with unknown mean.

  • Convergence in probability and in distribution, WLLN, CLT(Review ch04)

    Self-check: State the CLT for i.i.d. random variables with finite variance.

  • Multivariate Gaussian density, quadratic forms(Review ch02)

    Self-check: Write the log-density of a multivariate Gaussian as a function of the inverse covariance.

  • Unconstrained optimization, gradient, Hessian

    Self-check: Write the Newton update for a twice-differentiable scalar objective.

Notation Used in This Chapter

This chapter uses the estimation notation from Chapter 5 and adds iteration-related symbols. Function arguments appear after the notation tokens.

SymbolMeaningIntroduced
θ,θ0Λ\theta, \theta_0 \in \LambdaUnknown parameter (generic, true value); parameter domain.s01
yY\mathbf{y} \in \mathcal{Y}Observation vector y=(y1,,yn)\mathbf{y} = (y_1, \ldots, y_n).s01
fθ(y)f_\theta(\mathbf{y})Likelihood function, the density of the observation at parameter θ\theta.s01
n(θ)\ell_n(\theta)Log-likelihood n(θ)=logfθ(y)\ell_n(\theta) = \log f_\theta(\mathbf{y}).s01
s(θ;y)s(\theta; y)Score s(θ;y)=logfθ(y)/θs(\theta; y) = \partial \log f_\theta(y)/\partial \theta.s01
gml(y)g_{\text{ml}}(\mathbf{y})Maximum likelihood estimator.s01
J(θ)J(\theta)Fisher information (scalar parameter).s02
J(θ)\mathbf{J}(\boldsymbol{\theta})Fisher information matrix (vector parameter).s02
J1(θ)J_1(\theta)Per-sample Fisher information, with J(θ)=nJ1(θ)J(\theta) = n J_1(\theta) in the i.i.d. case.s02
p,d\xrightarrow{p}, \xrightarrow{d}Convergence in probability, convergence in distribution.s02
θ(k)\theta^{(k)}kk-th iterate of Newton-Raphson or Fisher scoring.s03
D(fθ0fθ)D(f_{\theta_0} \| f_\theta)KL divergence between the true density and the candidate density.s02