The Kalman Filter Equations
What We Are About to Derive
We now derive the five equations that make up the Kalman filter. The derivation is a controlled application of two tools you already own: the conditional Gaussian formula (for turning prior + likelihood into posterior) and the Markov property of the state (for decoupling future from past given the present). The proof proceeds by induction on the observation index, with the inductive hypothesis being simply: the conditional distribution of given is Gaussian with mean and covariance . Once we prove this stays true under both prediction and update steps, the recursion falls out.
Theorem: The Discrete-Time Kalman Filter
Consider the LGSS model of Definition 10.1. The MMSE estimates and and their error covariances , are computed by the following recursion, initialised with and .
Prediction step (time update): propagate the state estimate and covariance through the dynamics,
Update step (measurement update): correct the prediction using the new observation,
The conditional distribution of given is Gaussian with mean and covariance .
Prediction pushes the belief forward through the dynamics β the state drifts under and the covariance inflates by the process noise. Update is a Bayesian correction: the innovation tells us how surprising the new observation was, and the Kalman gain translates that surprise into a state correction. The gain is large when our state estimate is more uncertain than the measurement ( dominates), and small when the measurement is more uncertain than our belief ( dominates). This is exactly the weight of Chapter 8's LMMSE, now running recursively.
Inductive hypothesis
Assume that, conditional on , the state is Gaussian with mean and covariance . The base case () is obtained from the prior: is Gaussian unconditionally, which we take as the "prior given empty data" statement.
Prediction: pushing forward through the dynamics
From and the fact that is independent of (the process noise at time has not yet influenced any observation before time , by causality), conditional on : The conditional covariance is where the cross term vanishes by conditional independence. Because the conditional distribution of is Gaussian (inductive hypothesis) and is independent Gaussian, their affine combination is also Gaussian. So .
Conditional joint distribution at time n
Working conditionally on , both and are Gaussian (the latter is unconditionally Gaussian and independent of ). Therefore is conditionally Gaussian with moments and conditional covariance (a block matrix β reader should verify each block)
Update: applying the conditional Gaussian formula
Conditioning further on is now just the conditional-Gaussian formula from Chapter 8. With and , This closes the inductive step: the conditional distribution of given is Gaussian, so the inductive hypothesis holds at time as well. The filter is, by construction, MMSE-optimal.
Theorem: Orthogonality Characterisation of the Kalman Gain
The Kalman gain is the unique matrix such that the filtered error satisfies i.e., the posterior error is orthogonal (uncorrelated) to the current innovation.
This is the orthogonality principle of Chapter 8, read one observation at a time. If the filtered error were correlated with the innovation, we could shrink the mean-square error further by adding a bit more of the innovation back into the estimate β so optimality forces orthogonality.
Expand the filtered error
Substitute : where is the prediction error.
Enforce orthogonality
Since , The first expectation is (using because is white and independent of the past), and the second is . Setting the whole thing to zero and solving: This reproduces the gain formula, derived without ever invoking the conditional Gaussian distribution β the orthogonality principle alone is enough.
Theorem: The Innovations Sequence Is White
For the LGSS model with the Kalman filter running, the innovations form a zero-mean uncorrelated Gaussian sequence: Moreover, equals in the Hilbert space of zero-mean random vectors.
This is the same whitening that the Wiener filter of Chapter 9 achieves via spectral factorization, now obtained recursively. The innovations are the truly new part of each observation β the part that could not have been predicted from the past.
Zero mean
. Since is unbiased () and , the mean is zero.
Uncorrelated across time
Without loss of generality take . Then , so is a function of past observations. But is the projection of onto that same span, so the prediction error is orthogonal to it. Adding (independent of the past) does not change orthogonality, so .
Spanning equivalence
By definition with , so each lies in . Conversely, the filter recursion expresses each as a deterministic function of . The two spans agree.
Discrete-Time Kalman Filter (Pseudocode)
Complexity: Per step: for the covariance propagation and for the innovation covariance inversion, with and . Memory is ; observations are consumed one at a time, so no batch storage is needed.For better numerical conditioning, the Joseph form is preferred β it preserves symmetry and positive definiteness of even under finite-precision arithmetic.
Information Form of the Update
Applying the Sherman-Morrison-Woodbury identity to the covariance update gives the information-form recursion The matrix is called the information matrix. The information form is advantageous when the number of sensors is large and the state dimension is small, or when the prior is uninformative (, i.e., ).
Example: Scalar Random-Walk Estimation
Consider the scalar LGSS with , , and prior . Derive closed-form Kalman recursions for the scalar prediction variance and the scalar gain .
Specialize the Riccati recursion
With , , , the recursion becomes
Posterior variance
P_{n|n-1}r$ (up to a factor), which is the scalar Bayesian-update rule: inverse variances add.
Substitute to get a one-line recursion
Combining the two, the prediction variance follows This is a nonlinear scalar map; we will find its fixed point in Section 10.3.
Gain interpretation
When (perfect measurement), : the filter just copies the observation. When (useless measurement), : the filter ignores it. The transition is smooth and data-driven β the filter weighs measurement quality against model confidence at every step.
Kalman Filter Tracking a 2-D Target
A constant-velocity target is tracked from noisy position measurements in 2-D. The filter combines the dynamics with observations to produce estimates that are far more accurate than the raw measurements. The error ellipse shows the filter's calibrated uncertainty.
Parameters
Kalman Gain and Posterior Variance over Time
For the scalar random-walk model, the Kalman gain and the posterior variance converge monotonically to their steady-state values. The convergence rate depends on the noise ratio : higher process noise means the filter relies more on fresh measurements and the steady-state gain is larger.
Parameters
The Kalman Prediction-Update Cycle
Common Mistake: The Covariance Update Can Lose Symmetry
Mistake:
The "standard" form is theoretically correct but numerically treacherous: it is not obviously symmetric, and in finite precision can develop negative eigenvalues.
Correction:
Use the Joseph form in production code: . It is manifestly symmetric and positive semidefinite regardless of the gain used (it even stays PSD if is sub-optimal, which is useful when the gain is computed in reduced precision).
Common Mistake: Filter Divergence from Model Mismatch
Mistake:
A Kalman filter with under-specified process noise (too-small ) produces confident estimates that drift arbitrarily far from the truth. This is called filter divergence β the covariance shrinks so fast that the filter stops paying attention to new data.
Correction:
Verify that innovations are white and consistent with (the normalized innovation squared should be -distributed). Persistent violations indicate model mismatch; the remedy is usually to inflate to cover unmodeled dynamics.
Quick Check
In the steady state of the scalar random-walk model, as (large process noise, small measurement noise), the Kalman gain approaches:
0
1
1/2
Correct. When process noise dominates, the prior is very uncertain, so the filter weights the fresh measurement nearly 1. The filter essentially outputs .
Quick Check
Why are the Kalman innovations uncorrelated across time?
Because is white
Because the prediction is the projection of onto , so the residual is orthogonal to everything already in that span
Because is stable
Because Gaussian random variables are independent iff they are uncorrelated
Correct. Each past innovation () lies in , so orthogonality of the projection residual gives the whiteness.
Quick Check
In the information form, the posterior information matrix is . What does this say about combining prior and measurement?
Inverse variances (information) add
Variances add
The Kalman gain is
It only works when
Correct. In the Gaussian-linear setting, the posterior precision is the sum of the prior precision and the measurement precision β this is the state-space version of 'inverse variances add' for independent observations.
Square-Root Kalman Filters
In safety-critical systems (aerospace navigation, autonomous vehicles, GNSS receivers), the standard Kalman filter is rarely used directly; instead, one propagates a Cholesky factor and performs the update using QR decomposition of an augmented matrix. This square-root Kalman filter preserves positive definiteness exactly (to machine precision), doubles the effective word length of floating-point arithmetic, and is demanded by qualification standards such as DO-178C for avionics.
- β’
State dimension up to a few hundred per step
- β’
Double-precision arithmetic required
- β’
Cholesky/QR updates add ~2x computation over the naive filter
Historical Note: Kalman's 1960 Paper
1960sRudolf E. Kalman's 1960 paper "A New Approach to Linear Filtering and Prediction Problems" appeared in the Transactions of the ASME β Journal of Basic Engineering. Remarkably, it was rejected by the IEEE electrical-engineering journals of the time, which still favoured the frequency-domain Wiener framework. Kalman's reframing of filtering as a state-space recursion β solvable on a digital computer rather than through spectral factorization β was simply too novel for reviewers accustomed to transfer functions.
The filter's first deployment was by Stanley Schmidt and Leonard McGee at NASA Ames, where they extended it to the nonlinear trajectory-estimation problem for Apollo. The Apollo Guidance Computer, with its 4 KB of magnetic-core memory and 36 KB of read-only rope memory, ran what we now call the Extended Kalman Filter at roughly 1 Hz. Every minute of cislunar flight owed its navigation accuracy to this recursion.
Why This Matters: From State-Space Estimation to GNSS Receivers
Every modern GNSS (GPS, Galileo, BeiDou) receiver runs a Kalman filter. The state vector typically contains receiver position, velocity, clock bias, and clock drift β often augmented by a model of ionospheric delay. Observations are the pseudoranges (code-phase) and carrier phases from visible satellites. The LGSS model is tight enough in clean conditions that positioning accuracy is dominated by the observation noise , which itself depends on satellite geometry (GDOP). When multipath or jamming contaminates the measurements, the same innovations whiteness tests described here are used to detect the anomaly and de-weight or eject the offending satellite. The state-space toolbox of this chapter is literally the engine that makes meter-level positioning possible.
Key Takeaway
The Kalman filter is the conditional Gaussian formula, iterated. Prediction propagates beliefs forward through the dynamics; update corrects them by a Kalman gain-weighted innovation. The gain is precisely the ratio of prior uncertainty to measurement uncertainty, and it is derived by a one-line orthogonality argument. Everything else is bookkeeping.