Nonlinear Extensions: EKF, UKF, and Particle Filters
Why Nonlinear Extensions Exist
The Kalman filter is optimal under a precise pair of assumptions: linear dynamics and Gaussian noise. Reality, of course, is rarely linear. A radar measures range and bearing β which are nonlinear functions of Cartesian position. A mobile user's path bends through buildings. A satellite's orbit satisfies Kepler's equation. In all these cases we still want a recursive estimator that uses a state model, still want it to produce mean and covariance, but we have to confront nonlinearity.
Three standard approaches exist: linearize (EKF), use deterministic samples to propagate moments exactly through the nonlinearity (UKF), or drop Gaussianity entirely and approximate the posterior by weighted samples (particle filters). Each makes a different trade-off between computational cost and fidelity. A practitioner should understand all three and, more importantly, know when each one fails.
Definition: Nonlinear State-Space Model
Nonlinear State-Space Model
A nonlinear state-space model is a pair of equations where and are (possibly nonlinear) deterministic functions, and , are white Gaussian processes.
Additive Gaussian noise is the default simplification; fully nonlinear "process noise enters through " models are covered in the exercises.
Extended Kalman Filter (EKF)
Complexity: Same per step as the linear Kalman filter, plus one Jacobian evaluation per prediction and per update.The EKF is the first thing anyone tries, and usually the first thing that works on mildly nonlinear problems. It breaks down when the Jacobian is a poor local approximation β near turns, singular geometries, or whenever the state uncertainty spans a region over which or changes significantly.
Where the EKF Comes From
The derivation is a first-order Taylor expansion. Replace by with , and similarly for around . Feed the resulting (pretend-linear) model into the standard Kalman recursion. The catch: the errors are no longer Gaussian, only approximately so, and the filter is no longer MMSE-optimal. It is a useful approximation, but it is an approximation.
Definition: Sigma Points and the Unscented Transform
Sigma Points and the Unscented Transform
Given a Gaussian distribution on , the symmetric sigma-point set consists of points for , where denotes the th column of the matrix square root and is a tuning parameter (commonly with , ). The associated weights are , , and for . The unscented transform of a function is
The unscented transform matches the first two moments of the input Gaussian by construction and captures higher-order terms through exactly up to second order (third order if the tuning parameter is set so). This is strictly better than the first-order linearization of the EKF at essentially the same cost.
Unscented Kalman Filter (UKF)
Complexity: per step (dominated by the Cholesky factor needed for sigma-point generation). No Jacobians β a genuine advantage when are available as black boxes.The UKF's most useful property is derivative-free operation: it treats and as opaque. This matters when they are simulation outputs, table look-ups, or legacy code without clean analytic derivatives.
When Gaussianity Breaks: Particle Filters
Both EKF and UKF assume the posterior is approximately Gaussian at every step. When the posterior is multimodal (ambiguous bearings, data association, tracking through clutter), this fails and the filter can become overconfident or pick the wrong mode. Particle filters bypass Gaussianity entirely: they represent the posterior by a weighted set of particles , updated by sequential importance sampling with resampling.
The particle filter is consistent as and handles arbitrary nonlinear / non-Gaussian models, but it suffers from degeneracy (a few particles accumulating all the weight) and curse of dimensionality (the required grows exponentially in the effective dimension of the posterior). As a rule of thumb: use a Kalman variant if the posterior is unimodal and the dimension exceeds 10; switch to a particle filter if the posterior is multimodal or the dimension is small.
Linear vs. EKF vs. UKF vs. Particle Filter
| Property | Kalman (linear) | EKF | UKF | Particle Filter |
|---|---|---|---|---|
| Dynamics model | Linear | Nonlinear | Nonlinear | Arbitrary |
| Noise model | Gaussian | Gaussian | Gaussian | Any |
| Posterior representation | Gaussian (exact) | Gaussian (approx.) | Gaussian (approx.) | Weighted particles |
| Derivatives needed | No | Yes (Jacobians ) | No | No |
| Accuracy order through nonlinearity | Exact | 1st-order Taylor | 2nd-order (3rd if tuned) | Consistent as |
| Complexity per step | (with large) | |||
| Handles multimodal posteriors | N/A | No | No | Yes |
| Typical dimension | Any |
EKF vs. UKF on a Bearing-Only Tracking Problem
A target moves on a straight line while a stationary sensor measures only its bearing β a classic nonlinear estimation benchmark. The EKF linearizes the arctangent and can lose lock when the target crosses close to the sensor; the UKF propagates sigma points through the nonlinearity and remains consistent. The plot shows tracking errors for both.
Parameters
Common Mistake: EKF's Silent Bias
Mistake:
Users assume the EKF inherits the unbiasedness of the linear Kalman filter. It does not: the linearization introduces a systematic bias proportional to the curvature of and at the current estimate. Over many steps, this bias accumulates and the filter becomes overconfident.
Correction:
Always monitor the normalized innovation squared . If the time-averaged value systematically exceeds (the observation dimension), the filter is overconfident and the linearization is suspect. Switch to a UKF or inflate the process noise empirically.
Common Mistake: Particle Degeneracy
Mistake:
A practitioner runs a particle filter without resampling and after a few dozen steps one particle carries 99.9% of the weight. The Monte-Carlo approximation has effectively collapsed to a single point β not a posterior at all.
Correction:
Resample whenever the effective sample size drops below a threshold (typically ). For high-dimensional problems, consider Rao-Blackwellized particle filters that marginalize analytically over the linear-Gaussian subsystem, or switch to a different nonlinear estimator altogether.
Quick Check
When should you prefer a UKF over an EKF?
Whenever the state dimension exceeds 100
When analytical Jacobians are unavailable or the nonlinearity is strong relative to the state uncertainty
When the posterior is multimodal
When the noise is non-Gaussian
Correct. The UKF is derivative-free and captures second-order moments through the nonlinearity, so it outperforms the EKF when Jacobians are hard to get or the linearization error is significant.
Quick Check
What is the most common practical cause of EKF divergence?
Numerical precision
A poor initial state estimate combined with strong nonlinearity in
Too much measurement noise
Using a sampling time that is too small
Correct. If the initial is far from the truth and is strongly nonlinear, the Jacobian is evaluated at the wrong point and the filter can update toward a wrong basin. This is the canonical EKF failure mode.
EKF in Consumer GNSS
Every smartphone GNSS chipset runs an EKF (or a tightly-coupled EKF with inertial measurements). The state includes 3-D position, velocity, clock bias, clock drift, and often a per-satellite ambiguity state. Measurement models (pseudorange, Doppler, carrier phase) are nonlinear functions of the geometry, so linearization is essential. In urban canyons the EKF integrates with a pedestrian-dead-reckoning model (step counting + compass) as a second sensor stream, which is a textbook example of state augmentation.
- β’
State dimension: 10-30 depending on ambiguity tracking
- β’
Update rate: 1-10 Hz; prediction at IMU rate (100-1000 Hz)
- β’
Must run in <5 mW on a mobile SoC
Historical Note: Apollo Navigation and the Birth of the EKF
1960sThe Apollo trajectory estimation problem is what made the EKF a standard tool. Stanley Schmidt at NASA Ames is credited with extending Kalman's linear filter to the nonlinear orbital mechanics and sensor (star tracker, sextant) models of Apollo. The Apollo Guidance Computer propagated the state vector at 1 Hz and updated on astronaut sextant marks. Margaret Hamilton's team at MIT Instrumentation Laboratory shepherded the code through five crewed lunar missions without a navigation loss. The EKF's reputation as a bulletproof engineering tool was forged in that decade β and largely upheld ever since, despite its well-known theoretical weaknesses.
Channel Tracking via Kalman Filtering in Massive MIMO
The CommIT group has applied Kalman-style recursive estimation to the problem of tracking slowly-varying angular channel covariances in massive-MIMO systems. The state is a structured representation of the channel covariance matrix (low-rank plus Toeplitz), and the observations are pilot-aided channel estimates contaminated by estimation noise. The Kalman framework here is less about instantaneous tracking and more about exploiting the state-space structure of the slowly-varying parameters to reduce pilot overhead β an instance of the recurring theme: when something evolves smoothly, model it as a state and you will save resources.
Why This Matters: Mobility Tracking in 5G NR Positioning
5G NR positioning (Rel-16 and beyond) uses multilateration with reference signals transmitted from multiple gNBs, fused with IMU data. The UE-side estimator is an EKF whose state is , observations are downlink TDoAs and angle-of-arrival estimates, and the dynamics are a constant-velocity (or constant-acceleration) model. The same chapter material β state augmentation for clock bias, innovation whiteness tests for integrity monitoring, steady-state Riccati for DoA sensitivity β is deployed here directly. The material of this chapter literally powers 5G location services.
Sigma point
One of deterministic sample points chosen to match the first two moments of a -dimensional Gaussian distribution. Propagated through a nonlinearity to approximate the transformed mean and covariance without computing Jacobians.
Particle filter
A sequential Monte Carlo estimator that represents the posterior by a weighted set of samples (particles) and updates them by importance sampling plus resampling. Handles arbitrary nonlinear non-Gaussian models; suffers from degeneracy in high dimensions.
Related: sequential importance sampling, effective sample size
Key Takeaway
The EKF linearizes; the UKF propagates sigma points; the particle filter drops Gaussianity. None is a drop-in substitute for the linear Kalman filter. Pick the extension whose approximation matches the actual posterior structure of your problem, and always validate with innovation whiteness tests β they work for EKF and UKF as well as for the linear filter.
Innovation
The one-step-ahead prediction residual . Under the linear Gaussian model, the innovation sequence is white with covariance and carries all the new information that the observation brings beyond the prior prediction.
Observability
A property of the pair asserting that the initial state is recoverable from a finite observation record in the noise-free case. Detectability, the sufficient condition for DARE convergence, is the weaker requirement that only unstable modes be observable.
Discrete algebraic Riccati equation (DARE)
The nonlinear matrix equation whose unique PSD solution (under detectability + stabilizability) is the steady-state prediction covariance of the Kalman filter for an LTI state-space model.