Ferkans — Interactive Telecom Tutor

What We Are About to Derive

We now derive the five equations that make up the Kalman filter. The derivation is a controlled application of two tools you already own: the conditional Gaussian formula (for turning prior + likelihood into posterior) and the Markov property of the state (for decoupling future from past given the present). The proof proceeds by induction on the observation index, with the inductive hypothesis being simply: the conditional distribution of $\mathbf{x}_n$ given $\mathcal{Y}_{n-1}$ is Gaussian with mean $\widehat{\mathbf{x}}_{n|n-1}$ and covariance $\mathbf{P}_{n|n-1}$ . Once we prove this stays true under both prediction and update steps, the recursion falls out.

Theorem: The Discrete-Time Kalman Filter

Consider the LGSS model of Definition 10.1. The MMSE estimates $\widehat{\mathbf{x}}_{n|n} = \mathbb{E}[\mathbf{x}_n|\mathcal{Y}_n]$ and $\widehat{\mathbf{x}}_{n|n-1} = \mathbb{E}[\mathbf{x}_n|\mathcal{Y}_{n-1}]$ and their error covariances $\mathbf{P}_{n|n}$ , $\mathbf{P}_{n|n-1}$ are computed by the following recursion, initialised with $\widehat{\mathbf{x}}_{0|-1} = \mathbf{m}_0$ and $\mathbf{P}_{0|-1} = \mathbf{P}_0$ .

Prediction step (time update): propagate the state estimate and covariance through the dynamics, $\widehat{\mathbf{x}}_{n|n-1} = \mathbf{F}\,\widehat{\mathbf{x}}_{n-1|n-1},$ $\mathbf{P}_{n|n-1} = \mathbf{F}\,\mathbf{P}_{n-1|n-1}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T.$

Update step (measurement update): correct the prediction using the new observation, $\mathbf{S}_n = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T + \mathbf{R},$ ${\mathbf{K}_n}_{n} = \mathbf{P}_{n|n-1}\mathbf{H}^T \mathbf{S}_n^{-1},$ $\widehat{\mathbf{x}}_{n|n} = \widehat{\mathbf{x}}_{n|n-1} + {\mathbf{K}_n}_{n} (\mathbf{y}_n - \mathbf{H}\,\widehat{\mathbf{x}}_{n|n-1}),$ $\mathbf{P}_{n|n} = (\mathbf{I} - {\mathbf{K}_n}_{n} \mathbf{H})\mathbf{P}_{n|n-1}.$

The conditional distribution of $\mathbf{x}_n$ given $\mathcal{Y}_n$ is Gaussian with mean $\widehat{\mathbf{x}}_{n|n}$ and covariance $\mathbf{P}_{n|n}$ .

Prediction pushes the belief forward through the dynamics — the state drifts under $\mathbf{F}$ and the covariance inflates by the process noise. Update is a Bayesian correction: the innovation $\mathbf{j}_n = \mathbf{y}_n - \mathbf{H}\widehat{\mathbf{x}}_{n|n-1}$ tells us how surprising the new observation was, and the Kalman gain ${\mathbf{K}_n}_{n}$ translates that surprise into a state correction. The gain is large when our state estimate is more uncertain than the measurement ( $\mathbf{P}_{n|n-1}$ dominates), and small when the measurement is more uncertain than our belief ( $\mathbf{R}$ dominates). This is exactly the weight of Chapter 8's LMMSE, now running recursively.

Proof

Inductive hypothesis

Assume that, conditional on $\mathcal{Y}_{n-1}$ , the state $\mathbf{x}_{n-1}$ is Gaussian with mean $\widehat{\mathbf{x}}_{n-1|n-1}$ and covariance $\mathbf{P}_{n-1|n-1}$ . The base case ( $n=0$ ) is obtained from the prior: $\mathbf{x}_0 \sim \mathcal{N}(\mathbf{m}_0, \mathbf{P}_0)$ is Gaussian unconditionally, which we take as the "prior given empty data" statement.

Prediction: pushing forward through the dynamics

From $\mathbf{x}_n = \mathbf{F}\mathbf{x}_{n-1} + \mathbf{G}\mathbf{w}_{n-1}$ and the fact that $\mathbf{w}_{n-1}$ is independent of $\mathcal{Y}_{n-1}$ (the process noise at time $n-1$ has not yet influenced any observation before time $n$ , by causality), conditional on $\mathcal{Y}_{n-1}$ : $\widehat{\mathbf{x}}_{n|n-1} = \mathbb{E}[\mathbf{F}\mathbf{x}_{n-1}+\mathbf{G}\mathbf{w}_{n-1}|\mathcal{Y}_{n-1}] = \mathbf{F}\widehat{\mathbf{x}}_{n-1|n-1}.$ The conditional covariance is $\mathbf{P}_{n|n-1} = \mathrm{Cov}(\mathbf{F}\mathbf{x}_{n-1}+\mathbf{G}\mathbf{w}_{n-1}|\mathcal{Y}_{n-1}) = \mathbf{F}\mathbf{P}_{n-1|n-1}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T,$ where the cross term vanishes by conditional independence. Because the conditional distribution of $\mathbf{x}_{n-1}$ is Gaussian (inductive hypothesis) and $\mathbf{w}_{n-1}$ is independent Gaussian, their affine combination is also Gaussian. So $\mathbf{x}_n | \mathcal{Y}_{n-1} \sim \mathcal{N}(\widehat{\mathbf{x}}_{n|n-1}, \mathbf{P}_{n|n-1})$ .

Conditional joint distribution at time n

Working conditionally on $\mathcal{Y}_{n-1}$ , both $\mathbf{x}_n$ and $\mathbf{v}_n$ are Gaussian (the latter is unconditionally Gaussian and independent of $\mathcal{Y}_{n-1}$ ). Therefore $(\mathbf{x}_n, \mathbf{y}_n)$ is conditionally Gaussian with moments $\mathbb{E}\!\begin{bmatrix}\mathbf{x}_n\\\mathbf{y}_n\end{bmatrix}\Bigg|\mathcal{Y}_{n-1} = \begin{bmatrix}\widehat{\mathbf{x}}_{n|n-1}\\ \mathbf{H}\widehat{\mathbf{x}}_{n|n-1}\end{bmatrix},$ and conditional covariance (a block matrix — reader should verify each block) $\begin{bmatrix}\mathbf{P}_{n|n-1} & \mathbf{P}_{n|n-1}\mathbf{H}^T \\ \mathbf{H}\mathbf{P}_{n|n-1} & \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T+\mathbf{R}\end{bmatrix}.$

Update: applying the conditional Gaussian formula

Conditioning further on $\mathbf{y}_n$ is now just the conditional-Gaussian formula from Chapter 8. With $\boldsymbol{\Sigma}_{xy} = \mathbf{P}_{n|n-1}\mathbf{H}^T$ and $\boldsymbol{\Sigma}_{yy} = \mathbf{S}_n = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T+\mathbf{R}$ , $\widehat{\mathbf{x}}_{n|n} = \widehat{\mathbf{x}}_{n|n-1} + \underbrace{\mathbf{P}_{n|n-1}\mathbf{H}^T\mathbf{S}_n^{-1}}_{{\mathbf{K}_n}_{n}}(\mathbf{y}_n - \mathbf{H}\widehat{\mathbf{x}}_{n|n-1}),$ $\mathbf{P}_{n|n} = \mathbf{P}_{n|n-1} - \mathbf{P}_{n|n-1}\mathbf{H}^T\mathbf{S}_n^{-1}\mathbf{H}\mathbf{P}_{n|n-1} = (\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}.$ This closes the inductive step: the conditional distribution of $\mathbf{x}_n$ given $\mathcal{Y}_n$ is Gaussian, so the inductive hypothesis holds at time $n+1$ as well. The filter is, by construction, MMSE-optimal. $\;\square$

,

Theorem: Orthogonality Characterisation of the Kalman Gain

The Kalman gain ${\mathbf{K}_n}_{n}$ is the unique matrix such that the filtered error $\widetilde{\mathbf{x}}_{n|n} = \mathbf{x}_n - \widehat{\mathbf{x}}_{n|n}$ satisfies $\mathbb{E}[\widetilde{\mathbf{x}}_{n|n}\,\mathbf{j}_n^T] = \mathbf{0},$ i.e., the posterior error is orthogonal (uncorrelated) to the current innovation.

This is the orthogonality principle of Chapter 8, read one observation at a time. If the filtered error were correlated with the innovation, we could shrink the mean-square error further by adding a bit more of the innovation back into the estimate — so optimality forces orthogonality.

Proof

Expand the filtered error

Substitute $\widehat{\mathbf{x}}_{n|n} = \widehat{\mathbf{x}}_{n|n-1} + {\mathbf{K}_n}_{n}\mathbf{j}_n$ : $\widetilde{\mathbf{x}}_{n|n} = \mathbf{x}_n - \widehat{\mathbf{x}}_{n|n-1} - {\mathbf{K}_n}_{n}\mathbf{j}_n = \widetilde{\mathbf{x}}_{n|n-1} - {\mathbf{K}_n}_{n}\mathbf{j}_n,$ where $\widetilde{\mathbf{x}}_{n|n-1} = \mathbf{x}_n - \widehat{\mathbf{x}}_{n|n-1}$ is the prediction error.

Enforce orthogonality

Since $\mathbf{j}_n = \mathbf{H}\widetilde{\mathbf{x}}_{n|n-1} + \mathbf{v}_n$ , $\mathbb{E}[\widetilde{\mathbf{x}}_{n|n}\mathbf{j}_n^T] = \mathbb{E}[\widetilde{\mathbf{x}}_{n|n-1}\mathbf{j}_n^T] - {\mathbf{K}_n}_{n}\mathbb{E}[\mathbf{j}_n\mathbf{j}_n^T].$ The first expectation is $\mathbf{P}_{n|n-1}\mathbf{H}^T$ (using $\mathbb{E}[\widetilde{\mathbf{x}}_{n|n-1}\mathbf{v}_n^T] = \mathbf{0}$ because $\mathbf{v}_n$ is white and independent of the past), and the second is $\mathbf{S}_n$ . Setting the whole thing to zero and solving: ${\mathbf{K}_n}_{n} = \mathbf{P}_{n|n-1}\mathbf{H}^T\mathbf{S}_n^{-1}.$ This reproduces the gain formula, derived without ever invoking the conditional Gaussian distribution — the orthogonality principle alone is enough.

Theorem: The Innovations Sequence Is White

For the LGSS model with the Kalman filter running, the innovations $\{\mathbf{j}_n\}$ form a zero-mean uncorrelated Gaussian sequence: $\mathbb{E}[\mathbf{j}_n] = \mathbf{0}, \qquad \mathbb{E}[\mathbf{j}_n\mathbf{j}_m^T] = \mathbf{S}_n\,\delta_{nm}.$ Moreover, $\mathrm{span}(\mathbf{j}_1,\dots,\mathbf{j}_n)$ equals $\mathrm{span}(\mathbf{y}_1,\dots,\mathbf{y}_n)$ in the Hilbert space of zero-mean $L^2$ random vectors.

This is the same whitening that the Wiener filter of Chapter 9 achieves via spectral factorization, now obtained recursively. The innovations are the truly new part of each observation — the part that could not have been predicted from the past.

Proof

Zero mean

$\mathbb{E}[\mathbf{j}_n] = \mathbb{E}[\mathbf{y}_n] - \mathbf{H}\mathbb{E}[\widehat{\mathbf{x}}_{n|n-1}]$ . Since $\widehat{\mathbf{x}}_{n|n-1}$ is unbiased ( $\mathbb{E}[\widehat{\mathbf{x}}_{n|n-1}] = \mathbf{F}^n\mathbf{m}_0 = \mathbb{E}[\mathbf{x}_n]$ ) and $\mathbb{E}[\mathbf{y}_n] = \mathbf{H}\mathbb{E}[\mathbf{x}_n]$ , the mean is zero.

Uncorrelated across time

Without loss of generality take $m < n$ . Then $\mathbf{j}_m \in \mathrm{span}(\mathcal{Y}_m) \subseteq \mathrm{span}(\mathcal{Y}_{n-1})$ , so $\mathbf{j}_m$ is a function of past observations. But $\widehat{\mathbf{x}}_{n|n-1}$ is the projection of $\mathbf{x}_n$ onto that same span, so the prediction error $\widetilde{\mathbf{x}}_{n|n-1}$ is orthogonal to it. Adding $\mathbf{v}_n$ (independent of the past) does not change orthogonality, so $\mathbb{E}[\mathbf{j}_n\mathbf{j}_m^T] = \mathbf{0}$ .

Spanning equivalence

By definition $\mathbf{j}_n = \mathbf{y}_n - \mathbf{H}\widehat{\mathbf{x}}_{n|n-1}$ with $\widehat{\mathbf{x}}_{n|n-1} \in \mathrm{span}(\mathcal{Y}_{n-1})$ , so each $\mathbf{j}_n$ lies in $\mathrm{span}(\mathcal{Y}_n)$ . Conversely, the filter recursion expresses each $\mathbf{y}_n$ as a deterministic function of $(\mathbf{j}_1,\dots,\mathbf{j}_n)$ . The two spans agree.

Discrete-Time Kalman Filter (Pseudocode)

Complexity: Per step:

\mathcal{O}(d^3)

for the covariance propagation and

\mathcal{O}(p^3)

for the innovation covariance inversion, with

d = \dim\mathbf{x}

and

p = \dim\mathbf{y}

. Memory is

\mathcal{O}(d^2)

; observations are consumed one at a time, so no batch storage is needed.

Input: observations y[1], y[2], ..., y[N]

model matrices F, G, H, Q, R

initial mean m0, initial covariance P0

Output: filtered estimates xhat[n|n], covariances P[n|n] for n=1..N

# Initialisation

xhat_pred <- m0

P_pred <- P0

for n = 1 to N do

# ----- Prediction (time update) -----

# (At n=1 we use the initial prior; otherwise push forward.)

if n > 1 then

xhat_pred <- F @ xhat_filt

P_pred <- F @ P_filt @ F.T + G @ Q @ G.T

end if

# ----- Update (measurement update) -----

S <- H @ P_pred @ H.T + R # innovation covariance

K <- P_pred @ H.T @ inv(S) # Kalman gain

innov <- y[n] - H @ xhat_pred # innovation

xhat_filt <- xhat_pred + K @ innov

P_filt <- (I - K @ H) @ P_pred # covariance update

store xhat_filt, P_filt

end for

For better numerical conditioning, the Joseph form $\mathbf{P}_{n|n} = (\mathbf{I} - {\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}(\mathbf{I} - {\mathbf{K}_n}_{n}\mathbf{H})^T + {\mathbf{K}_n}_{n}\mathbf{R}{\mathbf{K}_n}_{n}^{T}$ is preferred — it preserves symmetry and positive definiteness of $\mathbf{P}_{n|n}$ even under finite-precision arithmetic.

Information Form of the Update

Applying the Sherman-Morrison-Woodbury identity to the covariance update gives the information-form recursion $\mathbf{P}_{n|n}^{-1} = \mathbf{P}_{n|n-1}^{-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{H},$ $\mathbf{P}_{n|n}^{-1}\widehat{\mathbf{x}}_{n|n} = \mathbf{P}_{n|n-1}^{-1}\widehat{\mathbf{x}}_{n|n-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{y}_n.$ The matrix $\mathbf{P}^{-1}$ is called the information matrix. The information form is advantageous when the number of sensors $p$ is large and the state dimension $d$ is small, or when the prior is uninformative ( $\mathbf{P}_0 \to \infty$ , i.e., $\mathbf{P}_0^{-1} \to \mathbf{0}$ ).

Example: Scalar Random-Walk Estimation

Consider the scalar LGSS $x_{n+1} = x_n + w_n, \quad y_n = x_n + v_n,$ with $w_n\sim\mathcal{N}(0,q)$ , $v_n\sim\mathcal{N}(0,r)$ , and prior $x_0\sim\mathcal{N}(0,p_0)$ . Derive closed-form Kalman recursions for the scalar prediction variance $P_{n|n-1}$ and the scalar gain $k_n$ .

Solution

Specialize the Riccati recursion

With $\mathbf{F}=\mathbf{G}=\mathbf{H}=1$ , $\mathbf{Q}=q$ , $\mathbf{R}=r$ , the recursion becomes $P_{n|n-1} = P_{n-1|n-1} + q, \quad S_n = P_{n|n-1} + r, \quad k_n = \frac{P_{n|n-1}}{P_{n|n-1}+r}.$

Posterior variance

$P_{n|n} = (1-k_n)P_{n|n-1} = \frac{r P_{n|n-1}}{P_{n|n-1}+r}.$ $This is the **harmonic mean** of$ P_{n|n-1} $and$ r$ (up to a factor), which is the scalar Bayesian-update rule: inverse variances add.

Substitute to get a one-line recursion

Combining the two, the prediction variance follows $P_{n+1|n} = \frac{r P_{n|n-1}}{P_{n|n-1}+r} + q.$ This is a nonlinear scalar map; we will find its fixed point in Section 10.3.

Gain interpretation

When $r\to 0$ (perfect measurement), $k_n\to 1$ : the filter just copies the observation. When $r\to\infty$ (useless measurement), $k_n\to 0$ : the filter ignores it. The transition is smooth and data-driven — the filter weighs measurement quality against model confidence at every step.

Kalman Filter Tracking a 2-D Target

A constant-velocity target is tracked from noisy position measurements in 2-D. The filter combines the dynamics with observations to produce estimates that are far more accurate than the raw measurements. The $\pm 2\sigma$ error ellipse shows the filter's calibrated uncertainty.

Parameters

Process-noise std

\sigma_a

0.4

Measurement-noise std

\sigma_v

1.5

Time steps80

Sampling interval

T_s

0.5

Kalman Gain and Posterior Variance over Time

For the scalar random-walk model, the Kalman gain and the posterior variance converge monotonically to their steady-state values. The convergence rate depends on the noise ratio $q/r$ : higher process noise means the filter relies more on fresh measurements and the steady-state gain is larger.

Parameters

Process-noise variance

q

0.25

Measurement-noise variance

r

1

Initial variance

p_0

10

Number of steps30

The Kalman Prediction-Update Cycle

An animated walkthrough of one Kalman step: the prior belief (Gaussian) is pushed forward by

\mathbf{F}

, inflates by

\mathbf{G}\mathbf{Q}\mathbf{G}^T

, collides with a new measurement, and contracts into the posterior. The innovation is highlighted as the distance between the measurement and the predicted measurement.

Prediction widens the belief; measurement narrows it. The Kalman gain

{\mathbf{K}_n}_{n}

is the optimal trade-off between the two.

Common Mistake: The Covariance Update Can Lose Symmetry

Mistake:

The "standard" form $\mathbf{P}_{n|n} = (\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}$ is theoretically correct but numerically treacherous: it is not obviously symmetric, and in finite precision $\mathbf{P}_{n|n}$ can develop negative eigenvalues.

Correction:

Use the Joseph form in production code: $\mathbf{P}_{n|n} = (\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})\mathbf{P}_{n|n-1}(\mathbf{I}-{\mathbf{K}_n}_{n}\mathbf{H})^T + {\mathbf{K}_n}_{n}\mathbf{R}{\mathbf{K}_n}_{n}^{T}$ . It is manifestly symmetric and positive semidefinite regardless of the gain used (it even stays PSD if ${\mathbf{K}_n}_{n}$ is sub-optimal, which is useful when the gain is computed in reduced precision).

Common Mistake: Filter Divergence from Model Mismatch

Mistake:

A Kalman filter with under-specified process noise (too-small $\mathbf{Q}$ ) produces confident estimates that drift arbitrarily far from the truth. This is called filter divergence — the covariance shrinks so fast that the filter stops paying attention to new data.

Correction:

Verify that innovations $\{\mathbf{j}_n\}$ are white and consistent with $\mathbf{S}_n$ (the normalized innovation squared $\mathbf{j}_n^T\mathbf{S}_n^{-1}\mathbf{j}_n$ should be $\chi^2_p$ -distributed). Persistent violations indicate model mismatch; the remedy is usually to inflate $\mathbf{Q}$ to cover unmodeled dynamics.

Quick Check

In the steady state of the scalar random-walk model, as $q/r \to \infty$ (large process noise, small measurement noise), the Kalman gain $k_n$ approaches:

0

1

1/2

$\sqrt{q/r}$

Correction:

1

Correct. When process noise dominates, the prior is very uncertain, so the filter weights the fresh measurement nearly 1. The filter essentially outputs $\widehat{x}_{n|n} \approx y_n$ .

Quick Check

Why are the Kalman innovations $\{\mathbf{j}_n\}$ uncorrelated across time?

Because $\mathbf{v}_n$ is white

Because the prediction $\widehat{\mathbf{x}}_{n|n-1}$ is the projection of $\mathbf{x}_n$ onto $\mathrm{span}(\mathcal{Y}_{n-1})$ , so the residual is orthogonal to everything already in that span

Because $\mathbf{F}$ is stable

Because Gaussian random variables are independent iff they are uncorrelated

Correction:

Because the prediction

\widehat{\mathbf{x}}_{n|n-1}

is the projection of

\mathbf{x}_n

onto

\mathrm{span}(\mathcal{Y}_{n-1})

, so the residual is orthogonal to everything already in that span

Correct. Each past innovation $\mathbf{j}_m$ ( $m<n$ ) lies in $\mathrm{span}(\mathcal{Y}_{n-1})$ , so orthogonality of the projection residual gives the whiteness.

Quick Check

In the information form, the posterior information matrix is $\mathbf{P}_{n|n}^{-1} = \mathbf{P}_{n|n-1}^{-1} + \mathbf{H}^T\mathbf{R}^{-1}\mathbf{H}$ . What does this say about combining prior and measurement?

Inverse variances (information) add

Variances add

The Kalman gain is $\mathbf{R}^{-1}$

It only works when $\mathbf{P}_0 = \mathbf{0}$

Correction:

Inverse variances (information) add

Correct. In the Gaussian-linear setting, the posterior precision is the sum of the prior precision and the measurement precision — this is the state-space version of 'inverse variances add' for independent observations.

⚠️Engineering Note

Square-Root Kalman Filters

In safety-critical systems (aerospace navigation, autonomous vehicles, GNSS receivers), the standard Kalman filter is rarely used directly; instead, one propagates a Cholesky factor $\mathbf{P} = \mathbf{L}\mathbf{L}^T$ and performs the update using QR decomposition of an augmented matrix. This square-root Kalman filter preserves positive definiteness exactly (to machine precision), doubles the effective word length of floating-point arithmetic, and is demanded by qualification standards such as DO-178C for avionics.

Practical Constraints

•
State dimension up to a few hundred per step
•
Double-precision arithmetic required
•
Cholesky/QR updates add ~2x computation over the naive filter

📋 Ref: Bierman, 'Factorization Methods for Discrete Sequential Estimation' (1977)

Historical Note: Kalman's 1960 Paper

1960s

Rudolf E. Kalman's 1960 paper "A New Approach to Linear Filtering and Prediction Problems" appeared in the Transactions of the ASME — Journal of Basic Engineering. Remarkably, it was rejected by the IEEE electrical-engineering journals of the time, which still favoured the frequency-domain Wiener framework. Kalman's reframing of filtering as a state-space recursion — solvable on a digital computer rather than through spectral factorization — was simply too novel for reviewers accustomed to transfer functions.

The filter's first deployment was by Stanley Schmidt and Leonard McGee at NASA Ames, where they extended it to the nonlinear trajectory-estimation problem for Apollo. The Apollo Guidance Computer, with its 4 KB of magnetic-core memory and 36 KB of read-only rope memory, ran what we now call the Extended Kalman Filter at roughly 1 Hz. Every minute of cislunar flight owed its navigation accuracy to this recursion.

Why This Matters: From State-Space Estimation to GNSS Receivers

Every modern GNSS (GPS, Galileo, BeiDou) receiver runs a Kalman filter. The state vector $\mathbf{x}_n$ typically contains receiver position, velocity, clock bias, and clock drift — often augmented by a model of ionospheric delay. Observations $\mathbf{y}_n$ are the pseudoranges (code-phase) and carrier phases from visible satellites. The LGSS model is tight enough in clean conditions that positioning accuracy is dominated by the observation noise $\mathbf{R}$ , which itself depends on satellite geometry (GDOP). When multipath or jamming contaminates the measurements, the same innovations whiteness tests described here are used to detect the anomaly and de-weight or eject the offending satellite. The state-space toolbox of this chapter is literally the engine that makes meter-level positioning possible.

Key Takeaway

The Kalman filter is the conditional Gaussian formula, iterated. Prediction propagates beliefs forward through the dynamics; update corrects them by a Kalman gain-weighted innovation. The gain ${\mathbf{K}_n}_{n} = \mathbf{P}_{n|n-1}\mathbf{H}^T\mathbf{S}_n^{-1}$ is precisely the ratio of prior uncertainty to measurement uncertainty, and it is derived by a one-line orthogonality argument. Everything else is bookkeeping.

The Kalman Filter Equations

What We Are About to Derive

Theorem: The Discrete-Time Kalman Filter

Inductive hypothesis

Prediction: pushing forward through the dynamics

Conditional joint distribution at time n

Update: applying the conditional Gaussian formula

Theorem: Orthogonality Characterisation of the Kalman Gain

Expand the filtered error

Enforce orthogonality

Theorem: The Innovations Sequence Is White

Zero mean

Uncorrelated across time

Spanning equivalence

Discrete-Time Kalman Filter (Pseudocode)

Information Form of the Update

Example: Scalar Random-Walk Estimation

Specialize the Riccati recursion

Posterior variance

Substitute to get a one-line recursion

Gain interpretation

Kalman Filter Tracking a 2-D Target

Parameters

Kalman Gain and Posterior Variance over Time

Parameters

The Kalman Prediction-Update Cycle

Common Mistake: The Covariance Update Can Lose Symmetry

Common Mistake: Filter Divergence from Model Mismatch

Quick Check

Quick Check

Quick Check

Square-Root Kalman Filters

Historical Note: Kalman's 1960 Paper

Why This Matters: From State-Space Estimation to GNSS Receivers

Key Takeaway