Prerequisites & Notation

What You Should Know Before Reading This Chapter

The Kalman filter lives at the intersection of three ideas we have already developed: linear MMSE estimation in jointly Gaussian models (Chapter 8), Wiener filtering and innovations (Chapter 9), and the Markov structure that arises from a discrete-time dynamical system. If any of those three feels hazy, this chapter will feel like magic rather than mathematics. Revisit the marked topics before proceeding.

  • Conditional Gaussian distribution and the block-matrix inverse(Review ch08)

    Self-check: For jointly Gaussian (X,Y)(\mathbf{X},\mathbf{Y}), write E[X∣Y]\mathbb{E}[\mathbf{X}|\mathbf{Y}] and Cov(X∣Y)\mathrm{Cov}(\mathbf{X}|\mathbf{Y}) in closed form. The Kalman update is this formula, iterated.

  • Orthogonality principle for LMMSE(Review ch08)

    Self-check: State why the LMMSE estimation error is orthogonal (uncorrelated) to every function of the observations. The Kalman gain is derived from exactly this condition.

  • Innovations sequence and causal Wiener filtering(Review ch09)

    Self-check: The innovations jn=ynβˆ’y^n∣nβˆ’1\mathbf{j}_n = \mathbf{y}_n - \widehat{\mathbf{y}}_{n|n-1} are white. Explain why recursive filtering is equivalent to projecting onto the innovations basis.

  • Linear time-invariant systems and stability

    Self-check: A matrix F\mathbf{F} is (Schur) stable iff its eigenvalues lie strictly inside the unit disc. You should recognize controllability and observability Gramians.

  • Matrix inversion lemma (Sherman--Morrison--Woodbury)(Review ch08)

    Self-check: Apply (A+UCV)βˆ’1=Aβˆ’1βˆ’Aβˆ’1U(Cβˆ’1+VAβˆ’1U)βˆ’1VAβˆ’1(\mathbf{A}+\mathbf{U}\mathbf{C}\mathbf{V})^{-1} = \mathbf{A}^{-1} - \mathbf{A}^{-1}\mathbf{U}(\mathbf{C}^{-1}+\mathbf{V}\mathbf{A}^{-1}\mathbf{U})^{-1}\mathbf{V}\mathbf{A}^{-1} to convert between covariance-form and information-form Kalman updates.

Chapter-Specific Notation

The state-space literature uses two nearly-equivalent symbol sets: the controls/aerospace tradition writes (F,G,H)(\mathbf{F},\mathbf{G},\mathbf{H}), while Caire's course notes use (An,Bn,Cn)(\mathbf{A}_n,\mathbf{B}_n,\mathbf{C}_n). We adopt (F,G,H)(\mathbf{F},\mathbf{G},\mathbf{H}) because it matches the vast majority of signal-processing references, and flag the mapping in the first example. The observation matrix H\mathbf{H} in this chapter has nothing to do with the MIMO channel matrix.

SymbolMeaningIntroduced
xn∈Rd\mathbf{x}_n \in \mathbb{R}^dHidden state at discrete time nns01
yn∈Rp\mathbf{y}_n \in \mathbb{R}^pObservation at time nns01
F∈RdΓ—d\mathbf{F} \in \mathbb{R}^{d \times d}State transition matrixs01
G∈RdΓ—q\mathbf{G} \in \mathbb{R}^{d \times q}Process-noise input matrixs01
H∈RpΓ—d\mathbf{H} \in \mathbb{R}^{p \times d}Observation matrix (NOT a channel matrix in this chapter)s01
wn∼N(0,Q)\mathbf{w}_n \sim \mathcal{N}(\mathbf{0},\mathbf{Q})Process noise (driving noise), whites01
vn∼N(0,R)\mathbf{v}_n \sim \mathcal{N}(\mathbf{0},\mathbf{R})Observation noise, white, independent of w\mathbf{w}s01
x^n∣m\widehat{\mathbf{x}}_{n|m}Conditional mean E[xn∣y1,…,ym]\mathbb{E}[\mathbf{x}_n | \mathbf{y}_1,\dots,\mathbf{y}_m]s02
Pn∣m\mathbf{P}_{n|m}Conditional covariance Cov(xn∣y1,…,ym)\mathrm{Cov}(\mathbf{x}_n | \mathbf{y}_1,\dots,\mathbf{y}_m)s02
Kn\mathbf{K}_nKalman gain at step nns02
jn=ynβˆ’Hx^n∣nβˆ’1\mathbf{j}_n = \mathbf{y}_n - \mathbf{H}\widehat{\mathbf{x}}_{n|n-1}Innovation (prediction residual) at step nns02
Sn=HPn∣nβˆ’1HT+R\mathbf{S}_n = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T+\mathbf{R}Innovation covariances02
Pβ€Ύ\overline{\mathbf{P}}Steady-state prediction covariance, solves the DAREs03

Key Takeaway

The Kalman filter is not a new idea; it is the LMMSE estimator of Chapter 8, evaluated recursively along the innovations basis of Chapter 9, applied to a linear Gaussian Markov chain. Everything in this chapter follows from those three facts. Keep them in mind and the algebra will feel inevitable rather than ad hoc.