State-Space Models

Why State-Space Models?

So far our estimation machinery has assumed a joint distribution of signal and observation that was given and static. In reality, most interesting estimation problems evolve: a target moves, a channel fades, a bearing bounces, a user drifts. The signal we want has a dynamics, and observations accumulate over time. The question is how to combine a model of the dynamics with a model of the measurements to produce estimates that get better as observations arrive.

The state-space model is the cleanest answer available. It says: keep a compact summary of "everything you need to know about the past" in a vector xn\mathbf{x}_n β€” the state β€” such that the future evolves as a Markov chain driven by noise, and the observations depend only on the current state. This separation of concerns is the single most useful idea in estimation for dynamical systems, and the Kalman filter is what it buys us once the model is linear and Gaussian.

Definition:

Discrete-Time Linear Gaussian State-Space Model

A discrete-time linear Gaussian state-space model (LGSS) consists of two coupled equations: the state equation xn+1=F xn+G wn,\mathbf{x}_{n+1} = \mathbf{F}\,\mathbf{x}_n + \mathbf{G}\,\mathbf{w}_n, and the observation equation yn=H xn+vn,\mathbf{y}_n = \mathbf{H}\,\mathbf{x}_n + \mathbf{v}_n, for n=0,1,2,…n = 0, 1, 2, \dots, together with the distributional assumptions x0∼N(m0,P0),wn∼N(0,Q),vn∼N(0,R),\mathbf{x}_0 \sim \mathcal{N}(\mathbf{m}_0, \mathbf{P}_0), \quad \mathbf{w}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}), \quad \mathbf{v}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{R}), with {wn}\{\mathbf{w}_n\}, {vn}\{\mathbf{v}_n\}, and x0\mathbf{x}_0 mutually independent, and both noise sequences white. The matrices F∈RdΓ—d\mathbf{F} \in \mathbb{R}^{d\times d}, G∈RdΓ—q\mathbf{G} \in \mathbb{R}^{d\times q}, H∈RpΓ—d\mathbf{H} \in \mathbb{R}^{p\times d}, Qβͺ°0\mathbf{Q}\succeq\mathbf{0}, R≻0\mathbf{R}\succ\mathbf{0} are deterministic and, in the time-invariant case, independent of nn.

The positive-definite assumption R≻0\mathbf{R}\succ\mathbf{0} is there to avoid singular innovations; Q\mathbf{Q} may be only positive semidefinite, since some state coordinates (e.g. constant velocity in a constant-velocity model) are not driven by noise at all.

Definition:

Filtering, Prediction, Smoothing

Let Ym={y1,…,ym}\mathcal{Y}_m = \{\mathbf{y}_1, \dots, \mathbf{y}_m\}. Three classical estimation tasks are distinguished by the relation between the target time index nn and the observation horizon mm:

  • Filtering (n=mn = m): compute x^n∣n=E[xn∣Yn]\widehat{\mathbf{x}}_{n|n} = \mathbb{E}[\mathbf{x}_n | \mathcal{Y}_n] β€” the best estimate of the current state given all observations up to and including now.
  • Prediction (n>mn > m): compute x^n∣m=E[xn∣Ym]\widehat{\mathbf{x}}_{n|m} = \mathbb{E}[\mathbf{x}_n | \mathcal{Y}_m] β€” the best forecast of a future state.
  • Smoothing (n<mn < m): compute x^n∣m\widehat{\mathbf{x}}_{n|m} with m>nm > n β€” a retrospective estimate using both past and future observations.

The Kalman filter addresses filtering and one-step prediction jointly. Smoothing requires a backward recursion and is treated in Exercise 14.

Definition:

Innovation Sequence

The innovation at time nn is the one-step prediction residual of the observation: jnβ‰œynβˆ’H x^n∣nβˆ’1=ynβˆ’y^n∣nβˆ’1.\mathbf{j}_n \triangleq \mathbf{y}_n - \mathbf{H}\,\widehat{\mathbf{x}}_{n|n-1} = \mathbf{y}_n - \widehat{\mathbf{y}}_{n|n-1}. The innovation covariance is Snβ‰œE[jnjnT]=HPn∣nβˆ’1HT+R.\mathbf{S}_n \triangleq \mathbb{E}[\mathbf{j}_n \mathbf{j}_n^T] = \mathbf{H}\mathbf{P}_{n|n-1}\mathbf{H}^T + \mathbf{R}.

The innovations {jn}\{\mathbf{j}_n\} play the role that the whitened observations played in Chapter 9: they form an uncorrelated basis for the linear span of the observations, and conditioning on Yn\mathcal{Y}_n is equivalent to conditioning on (j1,…,jn)(\mathbf{j}_1,\dots,\mathbf{j}_n).

Definition:

Controllability and Observability

For the time-invariant pair (F,GQ1/2)(\mathbf{F},\mathbf{G}\mathbf{Q}^{1/2}), the controllability Gramian is Ck=βˆ‘i=0kβˆ’1FiGQGT(FT)i.\mathcal{C}_k = \sum_{i=0}^{k-1}\mathbf{F}^i\mathbf{G}\mathbf{Q}\mathbf{G}^T(\mathbf{F}^T)^i. The pair is controllable (or more precisely, stabilizable if F\mathbf{F} is unstable) when Cd\mathcal{C}_d has full rank dd.

For (F,H)(\mathbf{F},\mathbf{H}), the observability Gramian is Ok=βˆ‘i=0kβˆ’1(FT)iHTHFi,\mathcal{O}_k = \sum_{i=0}^{k-1}(\mathbf{F}^T)^i\mathbf{H}^T\mathbf{H}\mathbf{F}^i, and the pair is observable (or detectable) when Od\mathcal{O}_d has full rank.

These two conditions will reappear in Section 10.3 as the hypotheses under which the Riccati recursion has a unique stabilizing fixed point.

Definition:

Markov Property of the State

Conditional on xn\mathbf{x}_n, the future (xn+1,xn+2,…)(\mathbf{x}_{n+1}, \mathbf{x}_{n+2}, \ldots) is independent of the past (x0,…,xnβˆ’1)(\mathbf{x}_0, \ldots, \mathbf{x}_{n-1}) and of the past observations (y1,…,ynβˆ’1)(\mathbf{y}_1, \ldots, \mathbf{y}_{n-1}). Equivalently, f(xn+1∣xn,xnβˆ’1,… )=f(xn+1∣xn).f(\mathbf{x}_{n+1} | \mathbf{x}_n, \mathbf{x}_{n-1}, \dots) = f(\mathbf{x}_{n+1} | \mathbf{x}_n).

This is the structural reason the Kalman filter is recursive: all the information that past observations contain about the future is compressed into the current state estimate and its covariance. Nothing else needs to be remembered.

Theorem: Propagation of State Moments

Let mn=E[xn]\mathbf{m}_n = \mathbb{E}[\mathbf{x}_n] and Ξ n=Cov(xn)\mathbf{\Pi}_n = \mathrm{Cov}(\mathbf{x}_n) for the LGSS model of Definition 10.1. Then for all nβ‰₯0n \geq 0: mn+1=Fmn,Ξ n+1=FΞ nFT+GQGT.\mathbf{m}_{n+1} = \mathbf{F}\mathbf{m}_n, \qquad \mathbf{\Pi}_{n+1} = \mathbf{F}\mathbf{\Pi}_n\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T. Moreover, xn\mathbf{x}_n is Gaussian for every nn, so xn∼N(mn,Ξ n)\mathbf{x}_n \sim \mathcal{N}(\mathbf{m}_n, \mathbf{\Pi}_n) in distribution.

The mean evolves deterministically through the noise-free dynamics β€” zero-mean driving noise cannot shift the mean. The covariance grows by two mechanisms: the old uncertainty rotates through F\mathbf{F}, and fresh process noise GQGT\mathbf{G}\mathbf{Q}\mathbf{G}^T is injected at every step. This is the Lyapunov-type recursion that underlies Kalman prediction.

Example: Constant-Velocity Target in One Dimension

A particle moves along a line. Let the state be xn=[pn, vn]T\mathbf{x}_n = [p_n,\, v_n]^T, with pnp_n position and vnv_n velocity. Over a sampling interval TsT_s, the kinematic model pn+1=pn+Tsvn+12Ts2anp_{n+1} = p_n + T_s v_n + \tfrac12 T_s^2 a_n, vn+1=vn+Tsanv_{n+1} = v_n + T_s a_n, with random acceleration an∼N(0,Οƒa2)a_n \sim \mathcal{N}(0,\sigma_a^2), yields the matrices of an LGSS. The sensor measures only the position, corrupted by noise vn∼N(0,Οƒv2)v_n \sim \mathcal{N}(0,\sigma_v^2). Write the LGSS matrices (F,G,H,Q,R)(\mathbf{F},\mathbf{G},\mathbf{H},\mathbf{Q},\mathbf{R}) and compute the two-step predicted covariance starting from P0βˆ£βˆ’1=0\mathbf{P}_{0|-1} = \mathbf{0} (perfectly known initial state).

Realizations of a Linear Gaussian State-Space Model

Sample trajectories of the state xn\mathbf{x}_n and observations yn\mathbf{y}_n for the constant-velocity model. As Οƒa\sigma_a increases, trajectories diverge faster; as Οƒv\sigma_v increases, observations become noisier around the true path.

Parameters
0.3
0.8
0.5
80
3

Common Mistake: Coloured Noise Is Not Allowed (Directly)

Mistake:

Students often apply the Kalman filter with process or observation noise that is correlated across time β€” e.g., vn\mathbf{v}_n generated as a filtered version of a white sequence.

Correction:

The derivation requires {wn}\{\mathbf{w}_n\} and {vn}\{\mathbf{v}_n\} to be white (temporally uncorrelated). If the noise is coloured, the standard recursion is no longer optimal. The standard fix is to augment the state: model the coloured noise itself as the output of a linear system driven by white noise, append its state to xn\mathbf{x}_n, and rewrite the model. The filter then runs on the augmented state. This is a routine manoeuvre, but forgetting to do it silently destroys optimality.

Block Diagram of a Linear Gaussian State-Space Model

Block Diagram of a Linear Gaussian State-Space Model
The unit delay closes the state loop; process noise enters through G\mathbf{G} and observation noise is additive at the output. The Kalman filter consumes the observations yn\mathbf{y}_n and produces estimates of xn\mathbf{x}_n.

Key Takeaway

A linear Gaussian state-space model is a Markov chain in Rd\mathbb{R}^d whose transition is a linear Gaussian kernel and whose observations are linear Gaussian functions of the state. Two recursions β€” one for the mean, one for the covariance β€” fully describe its marginal moments. The Kalman filter is what you get when you condition those moments on observations.