The Kalman Filter Equations

What We Are About to Derive

We now derive the five equations that make up the Kalman filter. The derivation is a controlled application of two tools you already own: the conditional Gaussian formula (for turning prior + likelihood into posterior) and the Markov property of the state (for decoupling future from past given the present). The proof proceeds by induction on the observation index, with the inductive hypothesis being simply: the conditional distribution of xn\mathbf{x}_n given Ynβˆ’1\mathcal{Y}_{n-1} is Gaussian with mean x^n∣nβˆ’1\widehat{\mathbf{x}}_{n|n-1} and covariance Pn∣nβˆ’1\mathbf{P}_{n|n-1}. Once we prove this stays true under both prediction and update steps, the recursion falls out.

Theorem: The Discrete-Time Kalman Filter

Consider the LGSS model of Definition 10.1. The MMSE estimates x^n∣n=E[xn∣Yn]\widehat{\mathbf{x}}_{n|n} = \mathbb{E}[\mathbf{x}_n|\mathcal{Y}_n] and x^n∣nβˆ’1=E[xn∣Ynβˆ’1]\widehat{\mathbf{x}}_{n|n-1} = \mathbb{E}[\mathbf{x}_n|\mathcal{Y}_{n-1}] and their error covariances Pn∣n\mathbf{P}_{n|n}, Pn∣nβˆ’1\mathbf{P}_{n|n-1} are computed by the following recursion, initialised with x^0βˆ£βˆ’1=m0\widehat{\mathbf{x}}_{0|-1} = \mathbf{m}_0 and P0βˆ£βˆ’1=P0\mathbf{P}_{0|-1} = \mathbf{P}_0.

Prediction step (time update): propagate the state estimate and covariance through the dynamics, x^n∣nβˆ’1=F x^nβˆ’1∣nβˆ’1,\widehat{\mathbf{x}}_{n|n-1} = \mathbf{F}\,\widehat{\mathbf{x}}_{n-1|n-1}, Pn∣nβˆ’1=F Pnβˆ’1∣nβˆ’1FT+GQGT.\mathbf{P}_{n|n-1} = \mathbf{F}\,\mathbf{P}_{n-1|n-1}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T.

Update step (measurement update): correct the prediction using the new observation, Sn=HPn∣nβˆ’1HT+R,\mathbf{S}_n = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T + \mathbf{R}, Knn=Pn∣nβˆ’1HTSnβˆ’1,{\mathbf{K}_n}_{n} = \mathbf{P}_{n|n-1}\mathbf{H}^T \mathbf{S}_n^{-1}, x^n∣n=x^n∣nβˆ’1+Knn(ynβˆ’H x^n∣nβˆ’1),\widehat{\mathbf{x}}_{n|n} = \widehat{\mathbf{x}}_{n|n-1} + {\mathbf{K}_n}_{n} (\mathbf{y}_n - \mathbf{H}\,\widehat{\mathbf{x}}_{n|n-1}), Pn∣n=(Iβˆ’KnnH)Pn∣nβˆ’1.\mathbf{P}_{n|n} = (\mathbf{I} - {\mathbf{K}_n}_{n} \mathbf{H})\mathbf{P}_{n|n-1}.

The conditional distribution of xn\mathbf{x}_n given Yn\mathcal{Y}_n is Gaussian with mean x^n∣n\widehat{\mathbf{x}}_{n|n} and covariance Pn∣n\mathbf{P}_{n|n}.

Prediction pushes the belief forward through the dynamics β€” the state drifts under F\mathbf{F} and the covariance inflates by the process noise. Update is a Bayesian correction: the innovation jn=ynβˆ’Hx^n∣nβˆ’1\mathbf{j}_n = \mathbf{y}_n - \mathbf{H}\widehat{\mathbf{x}}_{n|n-1} tells us how surprising the new observation was, and the Kalman gain Knn{\mathbf{K}_n}_{n} translates that surprise into a state correction. The gain is large when our state estimate is more uncertain than the measurement (Pn∣nβˆ’1\mathbf{P}_{n|n-1} dominates), and small when the measurement is more uncertain than our belief (R\mathbf{R} dominates). This is exactly the weight of Chapter 8's LMMSE, now running recursively.

,

Theorem: Orthogonality Characterisation of the Kalman Gain

The Kalman gain Knn{\mathbf{K}_n}_{n} is the unique matrix such that the filtered error x~n∣n=xnβˆ’x^n∣n\widetilde{\mathbf{x}}_{n|n} = \mathbf{x}_n - \widehat{\mathbf{x}}_{n|n} satisfies E[x~n∣n jnT]=0,\mathbb{E}[\widetilde{\mathbf{x}}_{n|n}\,\mathbf{j}_n^T] = \mathbf{0}, i.e., the posterior error is orthogonal (uncorrelated) to the current innovation.

This is the orthogonality principle of Chapter 8, read one observation at a time. If the filtered error were correlated with the innovation, we could shrink the mean-square error further by adding a bit more of the innovation back into the estimate β€” so optimality forces orthogonality.

Theorem: The Innovations Sequence Is White

For the LGSS model with the Kalman filter running, the innovations {jn}\{\mathbf{j}_n\} form a zero-mean uncorrelated Gaussian sequence: E[jn]=0,E[jnjmT]=Sn δnm.\mathbb{E}[\mathbf{j}_n] = \mathbf{0}, \qquad \mathbb{E}[\mathbf{j}_n\mathbf{j}_m^T] = \mathbf{S}_n\,\delta_{nm}. Moreover, span(j1,…,jn)\mathrm{span}(\mathbf{j}_1,\dots,\mathbf{j}_n) equals span(y1,…,yn)\mathrm{span}(\mathbf{y}_1,\dots,\mathbf{y}_n) in the Hilbert space of zero-mean L2L^2 random vectors.

This is the same whitening that the Wiener filter of Chapter 9 achieves via spectral factorization, now obtained recursively. The innovations are the truly new part of each observation β€” the part that could not have been predicted from the past.

Discrete-Time Kalman Filter (Pseudocode)

Complexity: Per step: O(d3)\mathcal{O}(d^3) for the covariance propagation and O(p3)\mathcal{O}(p^3) for the innovation covariance inversion, with d=dim⁑xd = \dim\mathbf{x} and p=dim⁑yp = \dim\mathbf{y}. Memory is O(d2)\mathcal{O}(d^2); observations are consumed one at a time, so no batch storage is needed.
Input: observations y[1], y[2], ..., y[N]
model matrices F, G, H, Q, R
initial mean m0, initial covariance P0
Output: filtered estimates xhat[n|n], covariances P[n|n] for n=1..N
# Initialisation
xhat_pred <- m0
P_pred <- P0
for n = 1 to N do
# ----- Prediction (time update) -----
# (At n=1 we use the initial prior; otherwise push forward.)
if n > 1 then
xhat_pred <- F @ xhat_filt
P_pred <- F @ P_filt @ F.T + G @ Q @ G.T
end if
# ----- Update (measurement update) -----
S <- H @ P_pred @ H.T + R # innovation covariance
K <- P_pred @ H.T @ inv(S) # Kalman gain
innov <- y[n] - H @ xhat_pred # innovation
xhat_filt <- xhat_pred + K @ innov
P_filt <- (I - K @ H) @ P_pred # covariance update
store xhat_filt, P_filt
end for

For better numerical conditioning, the Joseph form Pn∣n=(Iβˆ’KnnH)Pn∣nβˆ’1(Iβˆ’KnnH)T+KnnRKnnT\mathbf{P}_{n|n} = (\mathbf{I} - {\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}(\mathbf{I} - {\mathbf{K}_n}_{n}\mathbf{H})^T + {\mathbf{K}_n}_{n}\mathbf{R}{\mathbf{K}_n}_{n}^{T} is preferred β€” it preserves symmetry and positive definiteness of Pn∣n\mathbf{P}_{n|n} even under finite-precision arithmetic.

Information Form of the Update

Applying the Sherman-Morrison-Woodbury identity to the covariance update gives the information-form recursion Pn∣nβˆ’1=Pn∣nβˆ’1βˆ’1+HTRβˆ’1H,\mathbf{P}_{n|n}^{-1} = \mathbf{P}_{n|n-1}^{-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{H}, Pn∣nβˆ’1x^n∣n=Pn∣nβˆ’1βˆ’1x^n∣nβˆ’1+HTRβˆ’1yn.\mathbf{P}_{n|n}^{-1}\widehat{\mathbf{x}}_{n|n} = \mathbf{P}_{n|n-1}^{-1}\widehat{\mathbf{x}}_{n|n-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{y}_n. The matrix Pβˆ’1\mathbf{P}^{-1} is called the information matrix. The information form is advantageous when the number of sensors pp is large and the state dimension dd is small, or when the prior is uninformative (P0β†’βˆž\mathbf{P}_0 \to \infty, i.e., P0βˆ’1β†’0\mathbf{P}_0^{-1} \to \mathbf{0}).

Example: Scalar Random-Walk Estimation

Consider the scalar LGSS xn+1=xn+wn,yn=xn+vn,x_{n+1} = x_n + w_n, \quad y_n = x_n + v_n, with wn∼N(0,q)w_n\sim\mathcal{N}(0,q), vn∼N(0,r)v_n\sim\mathcal{N}(0,r), and prior x0∼N(0,p0)x_0\sim\mathcal{N}(0,p_0). Derive closed-form Kalman recursions for the scalar prediction variance Pn∣nβˆ’1P_{n|n-1} and the scalar gain knk_n.

Kalman Filter Tracking a 2-D Target

A constant-velocity target is tracked from noisy position measurements in 2-D. The filter combines the dynamics with observations to produce estimates that are far more accurate than the raw measurements. The Β±2Οƒ\pm 2\sigma error ellipse shows the filter's calibrated uncertainty.

Parameters
0.4
1.5
80
0.5

Kalman Gain and Posterior Variance over Time

For the scalar random-walk model, the Kalman gain and the posterior variance converge monotonically to their steady-state values. The convergence rate depends on the noise ratio q/rq/r: higher process noise means the filter relies more on fresh measurements and the steady-state gain is larger.

Parameters
0.25
1
10
30

The Kalman Prediction-Update Cycle

An animated walkthrough of one Kalman step: the prior belief (Gaussian) is pushed forward by F\mathbf{F}, inflates by GQGT\mathbf{G}\mathbf{Q}\mathbf{G}^T, collides with a new measurement, and contracts into the posterior. The innovation is highlighted as the distance between the measurement and the predicted measurement.
Prediction widens the belief; measurement narrows it. The Kalman gain Knn{\mathbf{K}_n}_{n} is the optimal trade-off between the two.

Common Mistake: The Covariance Update Can Lose Symmetry

Mistake:

The "standard" form Pn∣n=(Iβˆ’KnnH)Pn∣nβˆ’1\mathbf{P}_{n|n} = (\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1} is theoretically correct but numerically treacherous: it is not obviously symmetric, and in finite precision Pn∣n\mathbf{P}_{n|n} can develop negative eigenvalues.

Correction:

Use the Joseph form in production code: Pn∣n=(Iβˆ’KnnH)Pn∣nβˆ’1(Iβˆ’KnnH)T+KnnRKnnT\mathbf{P}_{n|n} = (\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}(\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})^T + {\mathbf{K}_n}_{n}\mathbf{R}{\mathbf{K}_n}_{n}^{T}. It is manifestly symmetric and positive semidefinite regardless of the gain used (it even stays PSD if Knn{\mathbf{K}_n}_{n} is sub-optimal, which is useful when the gain is computed in reduced precision).

Common Mistake: Filter Divergence from Model Mismatch

Mistake:

A Kalman filter with under-specified process noise (too-small Q\mathbf{Q}) produces confident estimates that drift arbitrarily far from the truth. This is called filter divergence β€” the covariance shrinks so fast that the filter stops paying attention to new data.

Correction:

Verify that innovations {jn}\{\mathbf{j}_n\} are white and consistent with Sn\mathbf{S}_n (the normalized innovation squared jnTSnβˆ’1jn\mathbf{j}_n^T\mathbf{S}_n^{-1}\mathbf{j}_n should be Ο‡p2\chi^2_p-distributed). Persistent violations indicate model mismatch; the remedy is usually to inflate Q\mathbf{Q} to cover unmodeled dynamics.

Quick Check

In the steady state of the scalar random-walk model, as q/rβ†’βˆžq/r \to \infty (large process noise, small measurement noise), the Kalman gain knk_n approaches:

0

1

1/2

q/r\sqrt{q/r}

Quick Check

Why are the Kalman innovations {jn}\{\mathbf{j}_n\} uncorrelated across time?

Because vn\mathbf{v}_n is white

Because the prediction x^n∣nβˆ’1\widehat{\mathbf{x}}_{n|n-1} is the projection of xn\mathbf{x}_n onto span(Ynβˆ’1)\mathrm{span}(\mathcal{Y}_{n-1}), so the residual is orthogonal to everything already in that span

Because F\mathbf{F} is stable

Because Gaussian random variables are independent iff they are uncorrelated

Quick Check

In the information form, the posterior information matrix is Pn∣nβˆ’1=Pn∣nβˆ’1βˆ’1+HTRβˆ’1H\mathbf{P}_{n|n}^{-1} = \mathbf{P}_{n|n-1}^{-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{H}. What does this say about combining prior and measurement?

Inverse variances (information) add

Variances add

The Kalman gain is Rβˆ’1\mathbf{R}^{-1}

It only works when P0=0\mathbf{P}_0 = \mathbf{0}

⚠️Engineering Note

Square-Root Kalman Filters

In safety-critical systems (aerospace navigation, autonomous vehicles, GNSS receivers), the standard Kalman filter is rarely used directly; instead, one propagates a Cholesky factor P=LLT\mathbf{P} = \mathbf{L}\mathbf{L}^T and performs the update using QR decomposition of an augmented matrix. This square-root Kalman filter preserves positive definiteness exactly (to machine precision), doubles the effective word length of floating-point arithmetic, and is demanded by qualification standards such as DO-178C for avionics.

Practical Constraints
  • β€’

    State dimension up to a few hundred per step

  • β€’

    Double-precision arithmetic required

  • β€’

    Cholesky/QR updates add ~2x computation over the naive filter

πŸ“‹ Ref: Bierman, 'Factorization Methods for Discrete Sequential Estimation' (1977)

Historical Note: Kalman's 1960 Paper

1960s

Rudolf E. Kalman's 1960 paper "A New Approach to Linear Filtering and Prediction Problems" appeared in the Transactions of the ASME β€” Journal of Basic Engineering. Remarkably, it was rejected by the IEEE electrical-engineering journals of the time, which still favoured the frequency-domain Wiener framework. Kalman's reframing of filtering as a state-space recursion β€” solvable on a digital computer rather than through spectral factorization β€” was simply too novel for reviewers accustomed to transfer functions.

The filter's first deployment was by Stanley Schmidt and Leonard McGee at NASA Ames, where they extended it to the nonlinear trajectory-estimation problem for Apollo. The Apollo Guidance Computer, with its 4 KB of magnetic-core memory and 36 KB of read-only rope memory, ran what we now call the Extended Kalman Filter at roughly 1 Hz. Every minute of cislunar flight owed its navigation accuracy to this recursion.

Why This Matters: From State-Space Estimation to GNSS Receivers

Every modern GNSS (GPS, Galileo, BeiDou) receiver runs a Kalman filter. The state vector xn\mathbf{x}_n typically contains receiver position, velocity, clock bias, and clock drift β€” often augmented by a model of ionospheric delay. Observations yn\mathbf{y}_n are the pseudoranges (code-phase) and carrier phases from visible satellites. The LGSS model is tight enough in clean conditions that positioning accuracy is dominated by the observation noise R\mathbf{R}, which itself depends on satellite geometry (GDOP). When multipath or jamming contaminates the measurements, the same innovations whiteness tests described here are used to detect the anomaly and de-weight or eject the offending satellite. The state-space toolbox of this chapter is literally the engine that makes meter-level positioning possible.

Key Takeaway

The Kalman filter is the conditional Gaussian formula, iterated. Prediction propagates beliefs forward through the dynamics; update corrects them by a Kalman gain-weighted innovation. The gain Knn=Pn∣nβˆ’1HTSnβˆ’1{\mathbf{K}_n}_{n} = \mathbf{P}_{n|n-1}\mathbf{H}^T\mathbf{S}_n^{-1} is precisely the ratio of prior uncertainty to measurement uncertainty, and it is derived by a one-line orthogonality argument. Everything else is bookkeeping.