Multi-Target Tracking on the DD Grid

From Snapshot to Track

§§1-3 treated a single frame of MIMO-OTFS-ISAC: estimate the target scene once, design the beamformer, run the comms and sensing tasks in parallel. But the scene evolves: vehicles move, pedestrians cross, new scatterers appear. This section lifts the snapshot analysis to the tracking problem — estimating target trajectories over frames, exploiting their continuity in the DD-angle domain. The DD representation is especially convenient for tracking because each target is a point in the DD plane whose coordinates change smoothly frame to frame.

Definition:

Target State Model

At frame tt, target ii has state si(t)  =  (Ri(t),vi(t),θi(t),θ˙i(t),ai(t))    R4×C.\mathbf{s}_i^{(t)} \;=\; (R_i^{(t)}, v_i^{(t)}, \theta_i^{(t)}, \dot\theta_i^{(t)}, a_i^{(t)}) \;\in\; \mathbb{R}^4 \times \mathbb{C}. — range, radial velocity, angle, angular velocity, complex reflectivity.

State evolution (linear constant-velocity model): si(t+1)  =  Asi(t)+ui(t),A=(1Tfr000100001Tfr0001)\mathbf{s}_i^{(t+1)} \;=\; \mathbf{A}\, \mathbf{s}_i^{(t)} \,+\, \mathbf{u}_i^{(t)}, \qquad \mathbf{A} = \begin{pmatrix}1 & T_{\text{fr}} & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & T_{\text{fr}} \\ 0 & 0 & 0 & 1\end{pmatrix} with frame duration TfrT_{\text{fr}} and process noise u\mathbf{u}.

Observation model (from MIMO-OTFS-ISAC): zi(t)  =  h(si(t))+vi(t),h:s(τ,ν,θ)\mathbf{z}_i^{(t)} \;=\; h(\mathbf{s}_i^{(t)}) \,+\, \mathbf{v}_i^{(t)}, \qquad h : \mathbf{s} \mapsto (\tau, \nu, \theta) where hh maps the state to the observation (delay, Doppler, angle), and v\mathbf{v} is the estimation error with covariance given by the CRB.

,

Theorem: Extended Kalman Tracking on the DD-Angle Grid

For a target with linear state evolution and nonlinear observation (the (τ,ν,θ)(\tau, \nu, \theta) mapping is nonlinear in R,v,θR, v, \theta), the extended Kalman filter (EKF) tracks the target state with covariance P(tt)  =  (IK(t)H(t))P(tt1),\mathbf{P}^{(t|t)} \;=\; (\mathbf{I} - \mathbf{K}^{(t)} \mathbf{H}^{(t)}) \mathbf{P}^{(t|t-1)}, where K(t)\mathbf{K}^{(t)} is the Kalman gain, H(t)=h/ss^(tt1)\mathbf{H}^{(t)} = \partial h/\partial \mathbf{s}|_{\hat{\mathbf{s}}^{(t|t-1)}} is the observation Jacobian, and P(tt1)\mathbf{P}^{(t|t-1)} is the predicted covariance.

Under steady-state tracking with process noise Q\mathbf{Q} and observation noise R=CRB(Rx)\mathbf{R} = \mathrm{CRB}(\mathbf{R}_x), the steady-state filter MSE is MSE    Q1/2(Rx)1/4Q1/2.\mathrm{MSE}_{\infty} \;\sim\; \mathbf{Q}^{1/2} (\mathbf{R}_x)^{-1/4} \mathbf{Q}^{1/2}. Consequence. Sensing-optimal beamforming (Rx\mathbf{R}_x illuminating target directions) reduces tracking MSE by 1/Rx1/\sqrt{|\mathbf{R}_x|} vs. uniform illumination. This is the quantitative gain from beam-aware tracking.

Tracking a moving target is like solving a noisy linear regression — the data noise is the CRB, the process noise is how erratically the target maneuvers. Lower CRB (better sensing) compounds over time via the Kalman update, giving a multiplicative improvement in steady-state MSE. This is why even a small sensing gain per frame matters: it compounds into a large tracking gain over many frames.

Multi-Target EKF on the DD-Angle Grid

Input: DD-angle observations Z^{(t)} = {ẑ_1, ..., ẑ_P} at frame t
Existing tracks T^{(t-1)} = {s_1, ..., s_{T_{t-1}}}
Gating radius γ, birth threshold π_b, death threshold π_d
Output: Updated tracks T^{(t)}
1. PREDICT:
For each track s_i ∈ T^{(t-1)}:
s_i^{(t|t-1)} = A s_i^{(t-1)}
P_i^{(t|t-1)} = A P_i^{(t-1)} A^T + Q
2. ASSOCIATE (JPDA or Hungarian):
Cost matrix C[i, j] = ||ẑ_j - h(s_i^{(t|t-1)})||²_Σ
If C[i, j] < γ: candidate association
Solve linear-assignment to get (i, j(i)) pairings
3. UPDATE (per associated track):
Innovation i_i = ẑ_{j(i)} - h(s_i^{(t|t-1)})
K_i = P_i^{(t|t-1)} H^T (H P_i^{(t|t-1)} H^T + CRB)^{-1}
s_i^{(t|t)} = s_i^{(t|t-1)} + K_i · i_i
P_i^{(t|t)} = (I - K_i H) P_i^{(t|t-1)}
4. BIRTH:
Unassociated observations above π_b: initialize new tracks
5. DEATH:
Tracks unassociated for ≥ π_d frames: remove
Return updated track set T^{(t)}.
Complexity: O(T² P² + T · MN) per frame. For T = 6 targets,
P = 20 clutter points, MN = 10⁴: ~5 × 10⁴ ops/frame.
Real-time at 100 Hz frame rate.

Example: Highway Multi-Vehicle Tracking

A roadside BS at 77 GHz tracks T=6T = 6 vehicles on a highway. Frame rate Tfr=10T_{\text{fr}} = 10 ms. Vehicle speeds 60-120 km/h. Range resolution ΔR=1.5\Delta R = 1.5 m (from W=100W = 100 MHz), velocity resolution Δv=1.3\Delta v = 1.3 m/s (from T=10T = 10 ms at 77 GHz).

(a) Predict tracking MSE in steady state. (b) Evaluate association reliability for two vehicles at similar range. (c) Discuss birth/death handling at highway entrances.

Steady-State Tracking MSE vs SNR

Plot the steady-state Kalman tracking MSE (position) as a function of receive SNR, comparing single-snapshot CRB (no tracking) with steady-state EKF. Sliders: frame rate, process noise, beam-aware vs uniform illumination.

Parameters
100
1

Theorem: Predictive Beamforming Gain

Suppose the BS knows the predicted target states s^i(tt1)\hat{\mathbf{s}}_i^{(t|t-1)} for frame tt with covariance P(tt1)\mathbf{P}^{(t|t-1)}. Using this prediction to pre-steer the sensing beam at frame tt yields improvement in tracking MSE of MSEpredMSEblind    tr(P(tt1))tr(Runiform),\frac{\mathrm{MSE}^{\text{pred}}}{\mathrm{MSE}^{\text{blind}}} \;\approx\; \frac{\mathrm{tr}(\mathbf{P}^{(t|t-1)})}{\mathrm{tr}(\mathbf{R}_{\text{uniform}})}, where the denominator is the CRB with uniform illumination. For well-tracked targets, this ratio is 1\ll 1 — predictive beamforming provides order-of-magnitude MSE improvement vs. blind (uniform) illumination.

Once a target is being tracked, the system knows where it is likely to be at the next frame — within a beamwidth. Concentrating the sensing beam there improves observation SNR and therefore reduces tracking noise. This creates a positive feedback loop: good tracking leads to good prediction leads to focused sensing leads to better tracking. The loop is stable as long as predictions do not diverge — the topic of §5.

🎓CommIT Contribution(2023)

Predictive Tracking with MIMO-OTFS-ISAC

Y. Cui, W. Yuan, G. CaireIEEE Trans. Signal Processing

The CommIT contribution on predictive MIMO-OTFS-ISAC tracking establishes two key results: (1) the steady-state tracking MSE scales as Q/Rx\sqrt{Q/R_x} for a Kalman-filtered target, with explicit closed-form expressions for the multi-target multi-user scenario; (2) sensing-aware beamforming (pre-steering based on predictions) reduces steady-state MSE by the beamforming gain Nt/Ttgt\sim N_t/T_{\text{tgt}}, a multiplicative improvement over blind illumination.

Combined with the DD-domain channel sparsity of §1, this result makes cm-level multi-target tracking feasible at highway frame rates (100 Hz). Without the DD framework, the same sensing gain would be nullified by channel estimation errors on the order of the target spacing. The DD domain's sparsity is what allows the predictive feedback loop to remain stable under realistic CSI uncertainty.

committrackingpredictive-bfmimo-otfs

Historical Note: From Classical Radar Tracking to DD-Angle EKF

Classical radar tracking (PDA, IMM, JPDA) dates to Bar-Shalom's 1970s work on multi-target estimation. Classical algorithms operate in Cartesian position-velocity space and assume a known measurement likelihood. The DD-angle framework here gives a principled prior distribution for the measurements (from the DD structure of OTFS), not an ad-hoc choice. This is the main advance: the same Kalman and JPDA machinery, but with measurement noise and innovation covariances derived from the waveform, not guessed.

In automotive applications, this integration eliminates the "sensor fusion layer" that classical designs use to reconcile radar and camera tracks — OTFS-ISAC provides both modalities simultaneously, with coherent measurement models.

Common Mistake: Don't Track Ghosts

Mistake:

Associating every observed DD-angle peak with a target track. Spurious peaks — from sidelobes of nearby targets, ground clutter, or random noise — create ghost tracks that persist if not actively pruned.

Correction:

Use confirmation windows: a track is confirmed only after 2-3 frames of consistent observations. Use track quality metrics (cumulative innovation, likelihood ratio) to terminate low-quality tracks. In high-clutter environments (urban, forest), operate with higher birth thresholds (πb\pi_b). Cross-modal confirmation with camera or lidar is a standard robustification technique in automotive.