Ferkans — Interactive Telecom Tutor

Why State-Space Models?

So far our estimation machinery has assumed a joint distribution of signal and observation that was given and static. In reality, most interesting estimation problems evolve: a target moves, a channel fades, a bearing bounces, a user drifts. The signal we want has a dynamics, and observations accumulate over time. The question is how to combine a model of the dynamics with a model of the measurements to produce estimates that get better as observations arrive.

The state-space model is the cleanest answer available. It says: keep a compact summary of "everything you need to know about the past" in a vector $\mathbf{x}_n$ — the state — such that the future evolves as a Markov chain driven by noise, and the observations depend only on the current state. This separation of concerns is the single most useful idea in estimation for dynamical systems, and the Kalman filter is what it buys us once the model is linear and Gaussian.

Definition:
Discrete-Time Linear Gaussian State-Space Model

A discrete-time linear Gaussian state-space model (LGSS) consists of two coupled equations: the state equation $\mathbf{x}_{n+1} = \mathbf{F}\,\mathbf{x}_n + \mathbf{G}\,\mathbf{w}_n,$ and the observation equation $\mathbf{y}_n = \mathbf{H}\,\mathbf{x}_n + \mathbf{v}_n,$ for $n = 0, 1, 2, \dots$ , together with the distributional assumptions $\mathbf{x}_0 \sim \mathcal{N}(\mathbf{m}_0, \mathbf{P}_0), \quad \mathbf{w}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}), \quad \mathbf{v}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{R}),$ with $\{\mathbf{w}_n\}$ , $\{\mathbf{v}_n\}$ , and $\mathbf{x}_0$ mutually independent, and both noise sequences white. The matrices $\mathbf{F} \in \mathbb{R}^{d\times d}$ , $\mathbf{G} \in \mathbb{R}^{d\times q}$ , $\mathbf{H} \in \mathbb{R}^{p\times d}$ , $\mathbf{Q}\succeq\mathbf{0}$ , $\mathbf{R}\succ\mathbf{0}$ are deterministic and, in the time-invariant case, independent of $n$ .

The positive-definite assumption $\mathbf{R}\succ\mathbf{0}$ is there to avoid singular innovations; $\mathbf{Q}$ may be only positive semidefinite, since some state coordinates (e.g. constant velocity in a constant-velocity model) are not driven by noise at all.

Definition:
Filtering, Prediction, Smoothing

Let $\mathcal{Y}_m = \{\mathbf{y}_1, \dots, \mathbf{y}_m\}$ . Three classical estimation tasks are distinguished by the relation between the target time index $n$ and the observation horizon $m$ :

Filtering ( $n = m$ ): compute $\widehat{\mathbf{x}}_{n|n} = \mathbb{E}[\mathbf{x}_n | \mathcal{Y}_n]$ — the best estimate of the current state given all observations up to and including now.
Prediction ( $n > m$ ): compute $\widehat{\mathbf{x}}_{n|m} = \mathbb{E}[\mathbf{x}_n | \mathcal{Y}_m]$ — the best forecast of a future state.
Smoothing ( $n < m$ ): compute $\widehat{\mathbf{x}}_{n|m}$ with $m > n$ — a retrospective estimate using both past and future observations.

The Kalman filter addresses filtering and one-step prediction jointly. Smoothing requires a backward recursion and is treated in Exercise 14.

Definition:
Innovation Sequence

The innovation at time $n$ is the one-step prediction residual of the observation: $\mathbf{j}_n \triangleq \mathbf{y}_n - \mathbf{H}\,\widehat{\mathbf{x}}_{n|n-1} = \mathbf{y}_n - \widehat{\mathbf{y}}_{n|n-1}.$ The innovation covariance is $\mathbf{S}_n \triangleq \mathbb{E}[\mathbf{j}_n \mathbf{j}_n^T] = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T + \mathbf{R}.$

The innovations $\{\mathbf{j}_n\}$ play the role that the whitened observations played in Chapter 9: they form an uncorrelated basis for the linear span of the observations, and conditioning on $\mathcal{Y}_n$ is equivalent to conditioning on $(\mathbf{j}_1,\dots,\mathbf{j}_n)$ .

Definition:
Controllability and Observability

For the time-invariant pair $(\mathbf{F},\mathbf{G}\mathbf{Q}^{1/2})$ , the controllability Gramian is $\mathcal{C}_k = \sum_{i=0}^{k-1}\mathbf{F}^i\mathbf{G}\mathbf{Q}\mathbf{G}^T(\mathbf{F}^T)^i.$ The pair is controllable (or more precisely, stabilizable if $\mathbf{F}$ is unstable) when $\mathcal{C}_d$ has full rank $d$ .

For $(\mathbf{F},\mathbf{H})$ , the observability Gramian is $\mathcal{O}_k = \sum_{i=0}^{k-1}(\mathbf{F}^T)^i\mathbf{H}^T\mathbf{H}\mathbf{F}^i,$ and the pair is observable (or detectable) when $\mathcal{O}_d$ has full rank.

These two conditions will reappear in Section 10.3 as the hypotheses under which the Riccati recursion has a unique stabilizing fixed point.

Definition:
Markov Property of the State

Conditional on $\mathbf{x}_n$ , the future $(\mathbf{x}_{n+1}, \mathbf{x}_{n+2}, \ldots)$ is independent of the past $(\mathbf{x}_0, \ldots, \mathbf{x}_{n-1})$ and of the past observations $(\mathbf{y}_1, \ldots, \mathbf{y}_{n-1})$ . Equivalently, $f(\mathbf{x}_{n+1} | \mathbf{x}_n, \mathbf{x}_{n-1}, \dots) = f(\mathbf{x}_{n+1} | \mathbf{x}_n).$

This is the structural reason the Kalman filter is recursive: all the information that past observations contain about the future is compressed into the current state estimate and its covariance. Nothing else needs to be remembered.

Theorem: Propagation of State Moments

Let $\mathbf{m}_n = \mathbb{E}[\mathbf{x}_n]$ and $\mathbf{\Pi}_n = \mathrm{Cov}(\mathbf{x}_n)$ for the LGSS model of Definition 10.1. Then for all $n \geq 0$ : $\mathbf{m}_{n+1} = \mathbf{F}\mathbf{m}_n, \qquad \mathbf{\Pi}_{n+1} = \mathbf{F}\mathbf{\Pi}_n\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T.$ Moreover, $\mathbf{x}_n$ is Gaussian for every $n$ , so $\mathbf{x}_n \sim \mathcal{N}(\mathbf{m}_n, \mathbf{\Pi}_n)$ in distribution.

The mean evolves deterministically through the noise-free dynamics — zero-mean driving noise cannot shift the mean. The covariance grows by two mechanisms: the old uncertainty rotates through $\mathbf{F}$ , and fresh process noise $\mathbf{G}\mathbf{Q}\mathbf{G}^T$ is injected at every step. This is the Lyapunov-type recursion that underlies Kalman prediction.

Proof

Mean recursion

Take expectations on both sides of the state equation: $\mathbf{m}_{n+1} = \mathbb{E}[\mathbf{F}\mathbf{x}_n + \mathbf{G}\mathbf{w}_n] = \mathbf{F}\mathbb{E}[\mathbf{x}_n] + \mathbf{G}\mathbb{E}[\mathbf{w}_n] = \mathbf{F}\mathbf{m}_n,$ using $\mathbb{E}[\mathbf{w}_n] = \mathbf{0}$ .

Covariance recursion

Define $\widetilde{\mathbf{x}}_n = \mathbf{x}_n - \mathbf{m}_n$ . Then $\widetilde{\mathbf{x}}_{n+1} = \mathbf{F}\widetilde{\mathbf{x}}_n + \mathbf{G}\mathbf{w}_n,$ so $\mathbf{\Pi}_{n+1} = \mathbb{E}[\widetilde{\mathbf{x}}_{n+1}\widetilde{\mathbf{x}}_{n+1}^T] = \mathbf{F}\mathbb{E}[\widetilde{\mathbf{x}}_n\widetilde{\mathbf{x}}_n^T]\mathbf{F}^T + \mathbf{G}\mathbb{E}[\mathbf{w}_n\mathbf{w}_n^T]\mathbf{G}^T,$ where the cross term vanishes because $\mathbf{w}_n$ is independent of $\mathbf{x}_n$ (and therefore of $\widetilde{\mathbf{x}}_n$ ). Substituting $\mathbb{E}[\mathbf{w}_n\mathbf{w}_n^T] = \mathbf{Q}$ gives the claim.

Gaussianity

Each $\mathbf{x}_n$ is an affine function of $\mathbf{x}_0$ and $(\mathbf{w}_0, \dots, \mathbf{w}_{n-1})$ , all of which are jointly Gaussian and mutually independent. Affine transformations of a Gaussian vector are Gaussian, so $\mathbf{x}_n$ is Gaussian with the computed mean and covariance.

Example: Constant-Velocity Target in One Dimension

A particle moves along a line. Let the state be $\mathbf{x}_n = [p_n,\, v_n]^T$ , with $p_n$ position and $v_n$ velocity. Over a sampling interval $T_s$ , the kinematic model $p_{n+1} = p_n + T_s v_n + \tfrac12 T_s^2 a_n$ , $v_{n+1} = v_n + T_s a_n$ , with random acceleration $a_n \sim \mathcal{N}(0,\sigma_a^2)$ , yields the matrices of an LGSS. The sensor measures only the position, corrupted by noise $v_n \sim \mathcal{N}(0,\sigma_v^2)$ . Write the LGSS matrices $(\mathbf{F},\mathbf{G},\mathbf{H},\mathbf{Q},\mathbf{R})$ and compute the two-step predicted covariance starting from $\mathbf{P}_{0|-1} = \mathbf{0}$ (perfectly known initial state).

Solution

State and observation equations

Stacking the kinematic equations gives $\mathbf{F} = \begin{bmatrix} 1 & T_s \\ 0 & 1 \end{bmatrix}, \quad \mathbf{G} = \begin{bmatrix} T_s^2/2 \\ T_s \end{bmatrix}, \quad \mathbf{H} = [1\;\; 0].$ The noise covariances are scalars here: $\mathbf{Q} = \sigma_a^2$ and $\mathbf{R} = \sigma_v^2$ .

Process-noise covariance in state coordinates

The effective state-noise covariance is $\mathbf{G}\mathbf{Q}\mathbf{G}^T = \sigma_a^2 \begin{bmatrix} T_s^4/4 & T_s^3/2 \\ T_s^3/2 & T_s^2 \end{bmatrix}.$ Notice that position and velocity noise are positively correlated — a random kick to the acceleration moves both.

One-step prediction

From $\mathbf{P}_{0|-1}=\mathbf{0}$ and Theorem 10.1, $\mathbf{P}_{1|0} = \mathbf{F}\cdot\mathbf{0}\cdot\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T = \sigma_a^2\begin{bmatrix} T_s^4/4 & T_s^3/2 \\ T_s^3/2 & T_s^2 \end{bmatrix}.$

Two-step prediction

Applying the recursion once more, $\mathbf{P}_{2|0} = \mathbf{F}\mathbf{P}_{1|0}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T.$ Carrying out the matrix products (reader should verify), $\mathbf{P}_{2|0} = \sigma_a^2\begin{bmatrix} 5T_s^4/4 & 3T_s^3/2 \\ 3T_s^3/2 & 2T_s^2 \end{bmatrix}.$ The position variance grows roughly like $T_s^4$ and the velocity variance like $T_s^2$ — unobserved motion becomes increasingly uncertain, fast.

Realizations of a Linear Gaussian State-Space Model

Sample trajectories of the state $\mathbf{x}_n$ and observations $\mathbf{y}_n$ for the constant-velocity model. As $\sigma_a$ increases, trajectories diverge faster; as $\sigma_v$ increases, observations become noisier around the true path.

Parameters

Process-noise std

\sigma_a

0.3

Measurement-noise std

\sigma_v

0.8

Sampling interval

T_s

0.5

Number of steps

N

80

Number of realizations3

Common Mistake: Coloured Noise Is Not Allowed (Directly)

Mistake:

Students often apply the Kalman filter with process or observation noise that is correlated across time — e.g., $\mathbf{v}_n$ generated as a filtered version of a white sequence.

Correction:

The derivation requires $\{\mathbf{w}_n\}$ and $\{\mathbf{v}_n\}$ to be white (temporally uncorrelated). If the noise is coloured, the standard recursion is no longer optimal. The standard fix is to augment the state: model the coloured noise itself as the output of a linear system driven by white noise, append its state to $\mathbf{x}_n$ , and rewrite the model. The filter then runs on the augmented state. This is a routine manoeuvre, but forgetting to do it silently destroys optimality.

Block Diagram of a Linear Gaussian State-Space Model — The unit delay closes the state loop; process noise enters through $\mathbf{G}$ and observation noise is additive at the output. The Kalman filter consumes the observations $\mathbf{y}_n$ and produces estimates of $\mathbf{x}_n$ .

Key Takeaway

A linear Gaussian state-space model is a Markov chain in $\mathbb{R}^d$ whose transition is a linear Gaussian kernel and whose observations are linear Gaussian functions of the state. Two recursions — one for the mean, one for the covariance — fully describe its marginal moments. The Kalman filter is what you get when you condition those moments on observations.

State-Space Models