Nonlinear Extensions: EKF, UKF, and Particle Filters

Why Nonlinear Extensions Exist

The Kalman filter is optimal under a precise pair of assumptions: linear dynamics and Gaussian noise. Reality, of course, is rarely linear. A radar measures range and bearing β€” which are nonlinear functions of Cartesian position. A mobile user's path bends through buildings. A satellite's orbit satisfies Kepler's equation. In all these cases we still want a recursive estimator that uses a state model, still want it to produce mean and covariance, but we have to confront nonlinearity.

Three standard approaches exist: linearize (EKF), use deterministic samples to propagate moments exactly through the nonlinearity (UKF), or drop Gaussianity entirely and approximate the posterior by weighted samples (particle filters). Each makes a different trade-off between computational cost and fidelity. A practitioner should understand all three and, more importantly, know when each one fails.

Definition:

Nonlinear State-Space Model

A nonlinear state-space model is a pair of equations xn+1=f(xn)+wn,yn=h(xn)+vn,\mathbf{x}_{n+1} = f(\mathbf{x}_n) + \mathbf{w}_n, \qquad \mathbf{y}_n = h(\mathbf{x}_n) + \mathbf{v}_n, where f:Rdβ†’Rdf:\mathbb{R}^d\to\mathbb{R}^d and h:Rdβ†’Rph:\mathbb{R}^d\to\mathbb{R}^p are (possibly nonlinear) deterministic functions, and wn∼N(0,Q)\mathbf{w}_n\sim\mathcal{N}(\mathbf{0},\mathbf{Q}), vn∼N(0,R)\mathbf{v}_n\sim\mathcal{N}(\mathbf{0},\mathbf{R}) are white Gaussian processes.

Additive Gaussian noise is the default simplification; fully nonlinear "process noise enters through f(xn,wn)f(\mathbf{x}_n,\mathbf{w}_n)" models are covered in the exercises.

Extended Kalman Filter (EKF)

Complexity: Same O(d3)\mathcal{O}(d^3) per step as the linear Kalman filter, plus one Jacobian evaluation per prediction and per update.
Input: nonlinear functions f, h and their Jacobians Jf, Jh
noise covariances Q, R
initial mean m0, covariance P0
observations y[1], ..., y[N]
Output: filtered estimates xhat[n|n], covariances P[n|n]
xhat_pred <- m0
P_pred <- P0
for n = 1 to N do
# --- Prediction: propagate mean through f, linearize for covariance ---
if n > 1 then
F_n <- Jf(xhat_filt) # Jacobian at last filt
xhat_pred <- f(xhat_filt)
P_pred <- F_n @ P_filt @ F_n.T + Q
end if
# --- Update: linearize h at prediction ---
H_n <- Jh(xhat_pred)
S <- H_n @ P_pred @ H_n.T + R
K <- P_pred @ H_n.T @ inv(S)
innov <- y[n] - h(xhat_pred)
xhat_filt <- xhat_pred + K @ innov
P_filt <- (I - K @ H_n) @ P_pred
store xhat_filt, P_filt
end for

The EKF is the first thing anyone tries, and usually the first thing that works on mildly nonlinear problems. It breaks down when the Jacobian is a poor local approximation β€” near turns, singular geometries, or whenever the state uncertainty spans a region over which ff or hh changes significantly.

Where the EKF Comes From

The derivation is a first-order Taylor expansion. Replace f(xn)f(\mathbf{x}_n) by f(x^nβˆ’1∣nβˆ’1)+Fn(xnβˆ’x^nβˆ’1∣nβˆ’1)f(\widehat{\mathbf{x}}_{n-1|n-1}) + \mathbf{F}_n(\mathbf{x}_n - \widehat{\mathbf{x}}_{n-1|n-1}) with Fn=βˆ‚fβˆ‚x∣x^nβˆ’1∣nβˆ’1\mathbf{F}_n = \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\widehat{\mathbf{x}}_{n-1|n-1}}, and similarly for hh around x^n∣nβˆ’1\widehat{\mathbf{x}}_{n|n-1}. Feed the resulting (pretend-linear) model into the standard Kalman recursion. The catch: the errors are no longer Gaussian, only approximately so, and the filter is no longer MMSE-optimal. It is a useful approximation, but it is an approximation.

Definition:

Sigma Points and the Unscented Transform

Given a Gaussian distribution N(m,P)\mathcal{N}(\mathbf{m},\mathbf{P}) on Rd\mathbb{R}^d, the symmetric sigma-point set consists of 2d+12d+1 points Ο‡0=m,Ο‡i=m+((d+Ξ»)P)i,Ο‡d+i=mβˆ’((d+Ξ»)P)i,\boldsymbol{\chi}_0 = \mathbf{m}, \quad \boldsymbol{\chi}_i = \mathbf{m} + \bigl(\sqrt{(d+\lambda)\mathbf{P}}\bigr)_i, \quad \boldsymbol{\chi}_{d+i} = \mathbf{m} - \bigl(\sqrt{(d+\lambda)\mathbf{P}}\bigr)_i, for i=1,…,di=1,\dots,d, where (β‹…)i(\cdot)_i denotes the iith column of the matrix square root and Ξ»\lambda is a tuning parameter (commonly Ξ»=Ξ±2(d+ΞΊ)βˆ’d\lambda = \alpha^2(d+\kappa)-d with Ξ±=10βˆ’3\alpha=10^{-3}, ΞΊ=0\kappa=0). The associated weights are w0(m)=Ξ»/(d+Ξ»)w_0^{(m)} = \lambda/(d+\lambda), w0(c)=Ξ»/(d+Ξ»)+(1βˆ’Ξ±2+Ξ²)w_0^{(c)} = \lambda/(d+\lambda) + (1-\alpha^2+\beta), and wi(m)=wi(c)=1/[2(d+Ξ»)]w_i^{(m)} = w_i^{(c)} = 1/[2(d+\lambda)] for iβ‰₯1i\geq 1. The unscented transform of a function gg is E[g(x)]β‰ˆβˆ‘i=02dwi(m)g(Ο‡i),Cov(g(x))β‰ˆβˆ‘i=02dwi(c)(g(Ο‡i)βˆ’gΛ‰)(g(Ο‡i)βˆ’gΛ‰)T.\mathbb{E}[g(\mathbf{x})] \approx \sum_{i=0}^{2d} w_i^{(m)} g(\boldsymbol{\chi}_i), \quad \mathrm{Cov}(g(\mathbf{x})) \approx \sum_{i=0}^{2d} w_i^{(c)}(g(\boldsymbol{\chi}_i)-\bar g)(g(\boldsymbol{\chi}_i)-\bar g)^T.

The unscented transform matches the first two moments of the input Gaussian by construction and captures higher-order terms through ff exactly up to second order (third order if the tuning parameter ΞΊ\kappa is set so). This is strictly better than the first-order linearization of the EKF at essentially the same cost.

Unscented Kalman Filter (UKF)

Complexity: O(d3)\mathcal{O}(d^3) per step (dominated by the Cholesky factor needed for sigma-point generation). No Jacobians β€” a genuine advantage when f,hf,h are available as black boxes.
Input: nonlinear f, h (no Jacobians needed)
Q, R, initial m0, P0
sigma-point weights w_m, w_c
Output: xhat[n|n], P[n|n]
m_filt, P_filt <- m0, P0
for n = 1 to N do
# --- Generate sigma points from (m_filt, P_filt) ---
chi <- sigma_points(m_filt, P_filt)
# --- Prediction through f ---
chi_pred <- f(chi) # per-column apply
m_pred <- sum_i w_m[i] * chi_pred[:, i]
P_pred <- sum_i w_c[i] * (chi_pred[:, i] - m_pred) *
(chi_pred[:, i] - m_pred).T + Q
# --- Regenerate sigma points from predicted moments ---
chi_obs <- sigma_points(m_pred, P_pred)
Y <- h(chi_obs)
y_pred <- sum_i w_m[i] * Y[:, i]
# --- Innovation and cross-covariance ---
S <- sum_i w_c[i] * (Y[:, i] - y_pred)(Y[:, i] - y_pred).T + R
C_xy <- sum_i w_c[i] * (chi_obs[:, i] - m_pred)(Y[:, i] - y_pred).T
# --- Kalman-style update ---
K <- C_xy @ inv(S)
m_filt <- m_pred + K @ (y[n] - y_pred)
P_filt <- P_pred - K @ S @ K.T
store m_filt, P_filt
end for

The UKF's most useful property is derivative-free operation: it treats ff and hh as opaque. This matters when they are simulation outputs, table look-ups, or legacy code without clean analytic derivatives.

When Gaussianity Breaks: Particle Filters

Both EKF and UKF assume the posterior is approximately Gaussian at every step. When the posterior is multimodal (ambiguous bearings, data association, tracking through clutter), this fails and the filter can become overconfident or pick the wrong mode. Particle filters bypass Gaussianity entirely: they represent the posterior f(xn∣Yn)f(\mathbf{x}_n|\mathcal{Y}_n) by a weighted set of particles {(xn(i),wn(i))}i=1N\{(\mathbf{x}_n^{(i)}, w_n^{(i)})\}_{i=1}^N, updated by sequential importance sampling with resampling.

The particle filter is consistent as Nβ†’βˆžN\to\infty and handles arbitrary nonlinear / non-Gaussian models, but it suffers from degeneracy (a few particles accumulating all the weight) and curse of dimensionality (the required NN grows exponentially in the effective dimension of the posterior). As a rule of thumb: use a Kalman variant if the posterior is unimodal and the dimension exceeds 10; switch to a particle filter if the posterior is multimodal or the dimension is small.

Linear vs. EKF vs. UKF vs. Particle Filter

PropertyKalman (linear)EKFUKFParticle Filter
Dynamics modelLinearNonlinear f,hf,hNonlinear f,hf,hArbitrary
Noise modelGaussianGaussianGaussianAny
Posterior representationGaussian (exact)Gaussian (approx.)Gaussian (approx.)Weighted particles
Derivatives neededNoYes (Jacobians Fn,Hn\mathbf{F}_n,\mathbf{H}_n)NoNo
Accuracy order through nonlinearityExact1st-order Taylor2nd-order (3rd if tuned)Consistent as Nβ†’βˆžN\to\infty
Complexity per stepO(d3)\mathcal{O}(d^3)O(d3)\mathcal{O}(d^3)O(d3)\mathcal{O}(d^3)O(Nd)\mathcal{O}(Nd) (with NN large)
Handles multimodal posteriorsN/ANoNoYes
Typical dimensionAny≀100\leq 100≀50\leq 50≀10\leq 10

EKF vs. UKF on a Bearing-Only Tracking Problem

A target moves on a straight line while a stationary sensor measures only its bearing β€” a classic nonlinear estimation benchmark. The EKF linearizes the arctangent and can lose lock when the target crosses close to the sensor; the UKF propagates sigma points through the nonlinearity and remains consistent. The plot shows tracking errors for both.

Parameters
0.05
1
2
100

Common Mistake: EKF's Silent Bias

Mistake:

Users assume the EKF inherits the unbiasedness of the linear Kalman filter. It does not: the linearization introduces a systematic bias proportional to the curvature of ff and hh at the current estimate. Over many steps, this bias accumulates and the filter becomes overconfident.

Correction:

Always monitor the normalized innovation squared jnTSnβˆ’1jn\mathbf{j}_n^T\mathbf{S}_n^{-1}\mathbf{j}_n. If the time-averaged value systematically exceeds pp (the observation dimension), the filter is overconfident and the linearization is suspect. Switch to a UKF or inflate the process noise empirically.

Common Mistake: Particle Degeneracy

Mistake:

A practitioner runs a particle filter without resampling and after a few dozen steps one particle carries 99.9% of the weight. The Monte-Carlo approximation has effectively collapsed to a single point β€” not a posterior at all.

Correction:

Resample whenever the effective sample size Neff=1/βˆ‘i(wn(i))2N_{\text{eff}} = 1/\sum_i (w_n^{(i)})^2 drops below a threshold (typically N/2N/2). For high-dimensional problems, consider Rao-Blackwellized particle filters that marginalize analytically over the linear-Gaussian subsystem, or switch to a different nonlinear estimator altogether.

Quick Check

When should you prefer a UKF over an EKF?

Whenever the state dimension exceeds 100

When analytical Jacobians are unavailable or the nonlinearity is strong relative to the state uncertainty

When the posterior is multimodal

When the noise is non-Gaussian

Quick Check

What is the most common practical cause of EKF divergence?

Numerical precision

A poor initial state estimate combined with strong nonlinearity in hh

Too much measurement noise

Using a sampling time that is too small

πŸ”§Engineering Note

EKF in Consumer GNSS

Every smartphone GNSS chipset runs an EKF (or a tightly-coupled EKF with inertial measurements). The state includes 3-D position, velocity, clock bias, clock drift, and often a per-satellite ambiguity state. Measurement models (pseudorange, Doppler, carrier phase) are nonlinear functions of the geometry, so linearization is essential. In urban canyons the EKF integrates with a pedestrian-dead-reckoning model (step counting + compass) as a second sensor stream, which is a textbook example of state augmentation.

Practical Constraints
  • β€’

    State dimension: 10-30 depending on ambiguity tracking

  • β€’

    Update rate: 1-10 Hz; prediction at IMU rate (100-1000 Hz)

  • β€’

    Must run in <5 mW on a mobile SoC

πŸ“‹ Ref: 3GPP TS 37.355 (LTE positioning), IS-GPS-200

Historical Note: Apollo Navigation and the Birth of the EKF

1960s

The Apollo trajectory estimation problem is what made the EKF a standard tool. Stanley Schmidt at NASA Ames is credited with extending Kalman's linear filter to the nonlinear orbital mechanics and sensor (star tracker, sextant) models of Apollo. The Apollo Guidance Computer propagated the state vector at 1 Hz and updated on astronaut sextant marks. Margaret Hamilton's team at MIT Instrumentation Laboratory shepherded the code through five crewed lunar missions without a navigation loss. The EKF's reputation as a bulletproof engineering tool was forged in that decade β€” and largely upheld ever since, despite its well-known theoretical weaknesses.

πŸŽ“CommIT Contribution(2020)

Channel Tracking via Kalman Filtering in Massive MIMO

M. B. Khalilsarai, Y. Song, S. Haghighatshoar, G. Caire β€” IEEE Trans. Wireless Commun.

The CommIT group has applied Kalman-style recursive estimation to the problem of tracking slowly-varying angular channel covariances in massive-MIMO systems. The state is a structured representation of the channel covariance matrix (low-rank plus Toeplitz), and the observations are pilot-aided channel estimates contaminated by estimation noise. The Kalman framework here is less about instantaneous tracking and more about exploiting the state-space structure of the slowly-varying parameters to reduce pilot overhead β€” an instance of the recurring theme: when something evolves smoothly, model it as a state and you will save resources.

massive MIMOchannel trackingcovariance estimationView Paper β†’

Why This Matters: Mobility Tracking in 5G NR Positioning

5G NR positioning (Rel-16 and beyond) uses multilateration with reference signals transmitted from multiple gNBs, fused with IMU data. The UE-side estimator is an EKF whose state is [x,y,z,xΛ™,yΛ™,zΛ™,bclock,bΛ™][x, y, z, \dot x, \dot y, \dot z, b_{\text{clock}}, \dot b], observations are downlink TDoAs and angle-of-arrival estimates, and the dynamics are a constant-velocity (or constant-acceleration) model. The same chapter material β€” state augmentation for clock bias, innovation whiteness tests for integrity monitoring, steady-state Riccati for DoA sensitivity β€” is deployed here directly. The material of this chapter literally powers 5G location services.

Sigma point

One of 2d+12d+1 deterministic sample points chosen to match the first two moments of a dd-dimensional Gaussian distribution. Propagated through a nonlinearity to approximate the transformed mean and covariance without computing Jacobians.

Related: unscented transform, Unscented Kalman Filter (UKF)

Particle filter

A sequential Monte Carlo estimator that represents the posterior by a weighted set of samples (particles) and updates them by importance sampling plus resampling. Handles arbitrary nonlinear non-Gaussian models; suffers from degeneracy in high dimensions.

Related: sequential importance sampling, effective sample size

Key Takeaway

The EKF linearizes; the UKF propagates sigma points; the particle filter drops Gaussianity. None is a drop-in substitute for the linear Kalman filter. Pick the extension whose approximation matches the actual posterior structure of your problem, and always validate with innovation whiteness tests β€” they work for EKF and UKF as well as for the linear filter.

Innovation

The one-step-ahead prediction residual Ξ½[n]=y[n]βˆ’Hx^[n∣nβˆ’1]\boldsymbol{\nu}[n] = \mathbf{y}[n] - \mathbf{H}\hat{\mathbf{x}}[n|n-1]. Under the linear Gaussian model, the innovation sequence is white with covariance S[n]=HP[n∣nβˆ’1]HT+R\mathbf{S}[n] = \mathbf{H}\mathbf{P}[n|n-1]\mathbf{H}^T + \mathbf{R} and carries all the new information that the observation brings beyond the prior prediction.

Observability

A property of the pair (F,H)(\mathbf{F},\mathbf{H}) asserting that the initial state x[0]\mathbf{x}[0] is recoverable from a finite observation record {y[n]}n=0Nβˆ’1\{\mathbf{y}[n]\}_{n=0}^{N-1} in the noise-free case. Detectability, the sufficient condition for DARE convergence, is the weaker requirement that only unstable modes be observable.

Discrete algebraic Riccati equation (DARE)

The nonlinear matrix equation PΛ‰=FPΛ‰FTβˆ’FPΛ‰HT(HPΛ‰HT+R)βˆ’1HPΛ‰FT+GQGT\bar{\mathbf{P}} = \mathbf{F}\bar{\mathbf{P}}\mathbf{F}^T - \mathbf{F}\bar{\mathbf{P}}\mathbf{H}^T(\mathbf{H}\bar{\mathbf{P}}\mathbf{H}^T+\mathbf{R})^{-1}\mathbf{H}\bar{\mathbf{P}}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T whose unique PSD solution (under detectability + stabilizability) is the steady-state prediction covariance of the Kalman filter for an LTI state-space model.