Ferkans — Interactive Telecom Tutor

Why Nonlinear Extensions Exist

The Kalman filter is optimal under a precise pair of assumptions: linear dynamics and Gaussian noise. Reality, of course, is rarely linear. A radar measures range and bearing — which are nonlinear functions of Cartesian position. A mobile user's path bends through buildings. A satellite's orbit satisfies Kepler's equation. In all these cases we still want a recursive estimator that uses a state model, still want it to produce mean and covariance, but we have to confront nonlinearity.

Three standard approaches exist: linearize (EKF), use deterministic samples to propagate moments exactly through the nonlinearity (UKF), or drop Gaussianity entirely and approximate the posterior by weighted samples (particle filters). Each makes a different trade-off between computational cost and fidelity. A practitioner should understand all three and, more importantly, know when each one fails.

Definition:
Nonlinear State-Space Model

A nonlinear state-space model is a pair of equations $\mathbf{x}_{n+1} = f(\mathbf{x}_n) + \mathbf{w}_n, \qquad \mathbf{y}_n = h(\mathbf{x}_n) + \mathbf{v}_n,$ where $f:\mathbb{R}^d\to\mathbb{R}^d$ and $h:\mathbb{R}^d\to\mathbb{R}^p$ are (possibly nonlinear) deterministic functions, and $\mathbf{w}_n\sim\mathcal{N}(\mathbf{0},\mathbf{Q})$ , $\mathbf{v}_n\sim\mathcal{N}(\mathbf{0},\mathbf{R})$ are white Gaussian processes.

Additive Gaussian noise is the default simplification; fully nonlinear "process noise enters through $f(\mathbf{x}_n,\mathbf{w}_n)$ " models are covered in the exercises.

Extended Kalman Filter (EKF)

Complexity: Same

\mathcal{O}(d^3)

per step as the linear Kalman filter, plus one Jacobian evaluation per prediction and per update.

Input: nonlinear functions f, h and their Jacobians Jf, Jh

noise covariances Q, R

initial mean m0, covariance P0

observations y[1], ..., y[N]

Output: filtered estimates xhat[n|n], covariances P[n|n]

xhat_pred <- m0

P_pred <- P0

for n = 1 to N do

# --- Prediction: propagate mean through f, linearize for covariance ---

if n > 1 then

F_n <- Jf(xhat_filt) # Jacobian at last filt

xhat_pred <- f(xhat_filt)

P_pred <- F_n @ P_filt @ F_n.T + Q

end if

# --- Update: linearize h at prediction ---

H_n <- Jh(xhat_pred)

S <- H_n @ P_pred @ H_n.T + R

K <- P_pred @ H_n.T @ inv(S)

innov <- y[n] - h(xhat_pred)

xhat_filt <- xhat_pred + K @ innov

P_filt <- (I - K @ H_n) @ P_pred

store xhat_filt, P_filt

end for

The EKF is the first thing anyone tries, and usually the first thing that works on mildly nonlinear problems. It breaks down when the Jacobian is a poor local approximation — near turns, singular geometries, or whenever the state uncertainty spans a region over which $f$ or $h$ changes significantly.

Where the EKF Comes From

The derivation is a first-order Taylor expansion. Replace $f(\mathbf{x}_n)$ by $f(\widehat{\mathbf{x}}_{n-1|n-1}) + \mathbf{F}_n(\mathbf{x}_n - \widehat{\mathbf{x}}_{n-1|n-1})$ with $\mathbf{F}_n = \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\widehat{\mathbf{x}}_{n-1|n-1}}$ , and similarly for $h$ around $\widehat{\mathbf{x}}_{n|n-1}$ . Feed the resulting (pretend-linear) model into the standard Kalman recursion. The catch: the errors are no longer Gaussian, only approximately so, and the filter is no longer MMSE-optimal. It is a useful approximation, but it is an approximation.

Definition:
Sigma Points and the Unscented Transform

Given a Gaussian distribution $\mathcal{N}(\mathbf{m},\mathbf{P})$ on $\mathbb{R}^d$ , the symmetric sigma-point set consists of $2d+1$ points $\boldsymbol{\chi}_0 = \mathbf{m}, \quad \boldsymbol{\chi}_i = \mathbf{m} + \bigl(\sqrt{(d+\lambda)\mathbf{P}}\bigr)_i, \quad \boldsymbol{\chi}_{d+i} = \mathbf{m} - \bigl(\sqrt{(d+\lambda)\mathbf{P}}\bigr)_i,$ for $i=1,\dots,d$ , where $(\cdot)_i$ denotes the $i$ th column of the matrix square root and $\lambda$ is a tuning parameter (commonly $\lambda = \alpha^2(d+\kappa)-d$ with $\alpha=10^{-3}$ , $\kappa=0$ ). The associated weights are $w_0^{(m)} = \lambda/(d+\lambda)$ , $w_0^{(c)} = \lambda/(d+\lambda) + (1-\alpha^2+\beta)$ , and $w_i^{(m)} = w_i^{(c)} = 1/[2(d+\lambda)]$ for $i\geq 1$ . The unscented transform of a function $g$ is $\mathbb{E}[g(\mathbf{x})] \approx \sum_{i=0}^{2d} w_i^{(m)} g(\boldsymbol{\chi}_i), \quad \mathrm{Cov}(g(\mathbf{x})) \approx \sum_{i=0}^{2d} w_i^{(c)}(g(\boldsymbol{\chi}_i)-\bar g)(g(\boldsymbol{\chi}_i)-\bar g)^T.$

The unscented transform matches the first two moments of the input Gaussian by construction and captures higher-order terms through $f$ exactly up to second order (third order if the tuning parameter $\kappa$ is set so). This is strictly better than the first-order linearization of the EKF at essentially the same cost.

Unscented Kalman Filter (UKF)

Complexity:

\mathcal{O}(d^3)

per step (dominated by the Cholesky factor needed for sigma-point generation). No Jacobians — a genuine advantage when

f,h

are available as black boxes.

Input: nonlinear f, h (no Jacobians needed)

Q, R, initial m0, P0

sigma-point weights w_m, w_c

Output: xhat[n|n], P[n|n]

m_filt, P_filt <- m0, P0

for n = 1 to N do

# --- Generate sigma points from (m_filt, P_filt) ---

chi <- sigma_points(m_filt, P_filt)

# --- Prediction through f ---

chi_pred <- f(chi) # per-column apply

m_pred <- sum_i w_m[i] * chi_pred[:, i]

P_pred <- sum_i w_c[i] * (chi_pred[:, i] - m_pred) *

(chi_pred[:, i] - m_pred).T + Q

# --- Regenerate sigma points from predicted moments ---

chi_obs <- sigma_points(m_pred, P_pred)

Y <- h(chi_obs)

y_pred <- sum_i w_m[i] * Y[:, i]

# --- Innovation and cross-covariance ---

S <- sum_i w_c[i] * (Y[:, i] - y_pred)(Y[:, i] - y_pred).T + R

C_xy <- sum_i w_c[i] * (chi_obs[:, i] - m_pred)(Y[:, i] - y_pred).T

# --- Kalman-style update ---

K <- C_xy @ inv(S)

m_filt <- m_pred + K @ (y[n] - y_pred)

P_filt <- P_pred - K @ S @ K.T

store m_filt, P_filt

end for

The UKF's most useful property is derivative-free operation: it treats $f$ and $h$ as opaque. This matters when they are simulation outputs, table look-ups, or legacy code without clean analytic derivatives.

When Gaussianity Breaks: Particle Filters

Both EKF and UKF assume the posterior is approximately Gaussian at every step. When the posterior is multimodal (ambiguous bearings, data association, tracking through clutter), this fails and the filter can become overconfident or pick the wrong mode. Particle filters bypass Gaussianity entirely: they represent the posterior $f(\mathbf{x}_n|\mathcal{Y}_n)$ by a weighted set of particles $\{(\mathbf{x}_n^{(i)}, w_n^{(i)})\}_{i=1}^N$ , updated by sequential importance sampling with resampling.

The particle filter is consistent as $N\to\infty$ and handles arbitrary nonlinear / non-Gaussian models, but it suffers from degeneracy (a few particles accumulating all the weight) and curse of dimensionality (the required $N$ grows exponentially in the effective dimension of the posterior). As a rule of thumb: use a Kalman variant if the posterior is unimodal and the dimension exceeds 10; switch to a particle filter if the posterior is multimodal or the dimension is small.

Linear vs. EKF vs. UKF vs. Particle Filter

Property	Kalman (linear)	EKF	UKF	Particle Filter
Dynamics model	Linear	Nonlinear $f,h$	Nonlinear $f,h$	Arbitrary
Noise model	Gaussian	Gaussian	Gaussian	Any
Posterior representation	Gaussian (exact)	Gaussian (approx.)	Gaussian (approx.)	Weighted particles
Derivatives needed	No	Yes (Jacobians $\mathbf{F}_n,\mathbf{H}_n$ )	No	No
Accuracy order through nonlinearity	Exact	1st-order Taylor	2nd-order (3rd if tuned)	Consistent as $N\to\infty$
Complexity per step	$\mathcal{O}(d^3)$	$\mathcal{O}(d^3)$	$\mathcal{O}(d^3)$	$\mathcal{O}(Nd)$ (with $N$ large)
Handles multimodal posteriors	N/A	No	No	Yes
Typical dimension	Any	$\leq 100$	$\leq 50$	$\leq 10$

EKF vs. UKF on a Bearing-Only Tracking Problem

A target moves on a straight line while a stationary sensor measures only its bearing — a classic nonlinear estimation benchmark. The EKF linearizes the arctangent and can lose lock when the target crosses close to the sensor; the UKF propagates sigma points through the nonlinearity and remains consistent. The plot shows tracking errors for both.

Parameters

Bearing-noise std (rad)

\sigma_\theta

0.05

Target speed1

Closest approach to sensor2

Number of steps100

Common Mistake: EKF's Silent Bias

Mistake:

Users assume the EKF inherits the unbiasedness of the linear Kalman filter. It does not: the linearization introduces a systematic bias proportional to the curvature of $f$ and $h$ at the current estimate. Over many steps, this bias accumulates and the filter becomes overconfident.

Correction:

Always monitor the normalized innovation squared $\mathbf{j}_n^T\mathbf{S}_n^{-1}\mathbf{j}_n$ . If the time-averaged value systematically exceeds $p$ (the observation dimension), the filter is overconfident and the linearization is suspect. Switch to a UKF or inflate the process noise empirically.

Common Mistake: Particle Degeneracy

Mistake:

A practitioner runs a particle filter without resampling and after a few dozen steps one particle carries 99.9% of the weight. The Monte-Carlo approximation has effectively collapsed to a single point — not a posterior at all.

Correction:

Resample whenever the effective sample size $N_{\text{eff}} = 1/\sum_i (w_n^{(i)})^2$ drops below a threshold (typically $N/2$ ). For high-dimensional problems, consider Rao-Blackwellized particle filters that marginalize analytically over the linear-Gaussian subsystem, or switch to a different nonlinear estimator altogether.

Quick Check

When should you prefer a UKF over an EKF?

Whenever the state dimension exceeds 100

When analytical Jacobians are unavailable or the nonlinearity is strong relative to the state uncertainty

When the posterior is multimodal

When the noise is non-Gaussian

Correction:

When analytical Jacobians are unavailable or the nonlinearity is strong relative to the state uncertainty

Correct. The UKF is derivative-free and captures second-order moments through the nonlinearity, so it outperforms the EKF when Jacobians are hard to get or the linearization error is significant.

Quick Check

What is the most common practical cause of EKF divergence?

Numerical precision

A poor initial state estimate combined with strong nonlinearity in $h$

Too much measurement noise

Using a sampling time that is too small

Correction:

A poor initial state estimate combined with strong nonlinearity in

h

Correct. If the initial $\widehat{\mathbf{x}}_{0|-1}$ is far from the truth and $h$ is strongly nonlinear, the Jacobian is evaluated at the wrong point and the filter can update toward a wrong basin. This is the canonical EKF failure mode.

🔧Engineering Note

EKF in Consumer GNSS

Every smartphone GNSS chipset runs an EKF (or a tightly-coupled EKF with inertial measurements). The state includes 3-D position, velocity, clock bias, clock drift, and often a per-satellite ambiguity state. Measurement models (pseudorange, Doppler, carrier phase) are nonlinear functions of the geometry, so linearization is essential. In urban canyons the EKF integrates with a pedestrian-dead-reckoning model (step counting + compass) as a second sensor stream, which is a textbook example of state augmentation.

Practical Constraints

•
State dimension: 10-30 depending on ambiguity tracking
•
Update rate: 1-10 Hz; prediction at IMU rate (100-1000 Hz)
•
Must run in <5 mW on a mobile SoC

📋 Ref: 3GPP TS 37.355 (LTE positioning), IS-GPS-200

Historical Note: Apollo Navigation and the Birth of the EKF

1960s

The Apollo trajectory estimation problem is what made the EKF a standard tool. Stanley Schmidt at NASA Ames is credited with extending Kalman's linear filter to the nonlinear orbital mechanics and sensor (star tracker, sextant) models of Apollo. The Apollo Guidance Computer propagated the state vector at 1 Hz and updated on astronaut sextant marks. Margaret Hamilton's team at MIT Instrumentation Laboratory shepherded the code through five crewed lunar missions without a navigation loss. The EKF's reputation as a bulletproof engineering tool was forged in that decade — and largely upheld ever since, despite its well-known theoretical weaknesses.

🎓CommIT Contribution(2020)

Channel Tracking via Kalman Filtering in Massive MIMO

M. B. Khalilsarai, Y. Song, S. Haghighatshoar, G. Caire — IEEE Trans. Wireless Commun.

The CommIT group has applied Kalman-style recursive estimation to the problem of tracking slowly-varying angular channel covariances in massive-MIMO systems. The state is a structured representation of the channel covariance matrix (low-rank plus Toeplitz), and the observations are pilot-aided channel estimates contaminated by estimation noise. The Kalman framework here is less about instantaneous tracking and more about exploiting the state-space structure of the slowly-varying parameters to reduce pilot overhead — an instance of the recurring theme: when something evolves smoothly, model it as a state and you will save resources.

massive MIMOchannel trackingcovariance estimationView Paper →

Why This Matters: Mobility Tracking in 5G NR Positioning

5G NR positioning (Rel-16 and beyond) uses multilateration with reference signals transmitted from multiple gNBs, fused with IMU data. The UE-side estimator is an EKF whose state is $[x, y, z, \dot x, \dot y, \dot z, b_{\text{clock}}, \dot b]$ , observations are downlink TDoAs and angle-of-arrival estimates, and the dynamics are a constant-velocity (or constant-acceleration) model. The same chapter material — state augmentation for clock bias, innovation whiteness tests for integrity monitoring, steady-state Riccati for DoA sensitivity — is deployed here directly. The material of this chapter literally powers 5G location services.

Sigma point

One of $2d+1$ deterministic sample points chosen to match the first two moments of a $d$ -dimensional Gaussian distribution. Propagated through a nonlinearity to approximate the transformed mean and covariance without computing Jacobians.

Particle filter

A sequential Monte Carlo estimator that represents the posterior by a weighted set of samples (particles) and updates them by importance sampling plus resampling. Handles arbitrary nonlinear non-Gaussian models; suffers from degeneracy in high dimensions.

Key Takeaway

The EKF linearizes; the UKF propagates sigma points; the particle filter drops Gaussianity. None is a drop-in substitute for the linear Kalman filter. Pick the extension whose approximation matches the actual posterior structure of your problem, and always validate with innovation whiteness tests — they work for EKF and UKF as well as for the linear filter.

Innovation

The one-step-ahead prediction residual $\boldsymbol{\nu}[n] = \mathbf{y}[n] - \mathbf{H}\hat{\mathbf{x}}[n|n-1]$ . Under the linear Gaussian model, the innovation sequence is white with covariance $\mathbf{S}[n] = \mathbf{H}\mathbf{P}[n|n-1]\mathbf{H}^T + \mathbf{R}$ and carries all the new information that the observation brings beyond the prior prediction.

Observability

A property of the pair $(\mathbf{F},\mathbf{H})$ asserting that the initial state $\mathbf{x}[0]$ is recoverable from a finite observation record $\{\mathbf{y}[n]\}_{n=0}^{N-1}$ in the noise-free case. Detectability, the sufficient condition for DARE convergence, is the weaker requirement that only unstable modes be observable.

Discrete algebraic Riccati equation (DARE)

The nonlinear matrix equation $\bar{\mathbf{P}} = \mathbf{F}\bar{\mathbf{P}}\mathbf{F}^T - \mathbf{F}\bar{\mathbf{P}}\mathbf{H}^T(\mathbf{H}\bar{\mathbf{P}}\mathbf{H}^T+\mathbf{R})^{-1}\mathbf{H}\bar{\mathbf{P}}\mathbf{F}^T + \mathbf{G}\mathbf{Q}\mathbf{G}^T$ whose unique PSD solution (under detectability + stabilizability) is the steady-state prediction covariance of the Kalman filter for an LTI state-space model.

Nonlinear Extensions: EKF, UKF, and Particle Filters

Why Nonlinear Extensions Exist

Definition: Nonlinear State-Space Model

Extended Kalman Filter (EKF)

Where the EKF Comes From

Definition: Sigma Points and the Unscented Transform

Unscented Kalman Filter (UKF)

When Gaussianity Breaks: Particle Filters

Linear vs. EKF vs. UKF vs. Particle Filter

EKF vs. UKF on a Bearing-Only Tracking Problem

Parameters

Common Mistake: EKF's Silent Bias

Common Mistake: Particle Degeneracy

Quick Check

Quick Check

EKF in Consumer GNSS

Historical Note: Apollo Navigation and the Birth of the EKF

Channel Tracking via Kalman Filtering in Massive MIMO

Why This Matters: Mobility Tracking in 5G NR Positioning

Sigma point

Particle filter

Key Takeaway

Innovation

Observability

Discrete algebraic Riccati equation (DARE)

Definition:
Nonlinear State-Space Model

Definition:
Sigma Points and the Unscented Transform