CUSUM and Change Detection

Detecting When the World Changes

The SPRT assumes the hypothesis is fixed before the experiment begins. In many applications the hypothesis changes during the observation window. A user enters a cell, a jammer switches on, a link experiences a shadowing event, a machine begins to degrade. The statistician must detect the change as quickly as possible while avoiding false alarms.

Formally, observations Y1,Y2,…Y_1, Y_2, \ldots are i.i.d.\ with density f0f_0 up to some unknown change-point Ξ½\nu, and with density f1β‰ f0f_1 \neq f_0 afterwards. The goal is a stopping rule Ο„\tau such that, roughly, EΞ½[(Ο„βˆ’Ξ½)+]\mathbb{E}_\nu[(\tau - \nu)^+] (the detection delay) is small, while E∞[Ο„]\mathbb{E}_\infty[\tau] (the time to a false alarm when no change occurs) is large.

The CUSUM (cumulative sum) algorithm of Page (1954) is the sequential analogue of the likelihood-ratio test for change detection. Unlike the SPRT, CUSUM runs indefinitely without terminal decision, firing an alarm whenever evidence of a change accumulates beyond a threshold. Lorden (1971) and Moustakides (1986) showed that CUSUM is minimax optimal: it minimizes the worst-case expected detection delay among all procedures with a given false-alarm rate.

Definition:

Page's CUSUM Statistic

Let β„“(y)=log⁑f1(y)f0(y)\ell(y) = \log \frac{f_1(y)}{f_0(y)} as before. The CUSUM statistic is the nonnegative process

S0=0,Sn=max⁑(0,Snβˆ’1+β„“(Yn)).S_0 = 0, \qquad S_n = \max(0, S_{n-1} + \ell(Y_n)).

Equivalently, Sn=β„“(n)βˆ’min⁑0≀k≀nβ„“(k)S_n = \ell^{(n)} - \min_{0 \leq k \leq n} \ell^{(k)}: the current LLR minus its running minimum. The CUSUM stopping rule with threshold h>0h > 0 is

Ο„h=inf⁑{nβ‰₯1:Snβ‰₯h}.\tau_h = \inf\{n \geq 1 : S_n \geq h\}.

Intuitively, SnS_n is the accumulated evidence in favor of H1\mathcal{H}_1 since the last time the evidence favored H0\mathcal{H}_0 most strongly. The reflection at zero is what distinguishes CUSUM from a plain running sum: it resets the statistic whenever past evidence points toward f0f_0, so the detector is not distracted by ancient history.

,

CUSUM (Cumulative Sum)

A sequential change-detection algorithm that accumulates log-likelihood ratios with reflection at zero and raises an alarm when the running statistic exceeds a threshold. Optimal under Lorden's minimax criterion.

Related: ARL (Average Run Length), Change Point

ARL (Average Run Length)

ARL0=E∞[Ο„]\mathrm{ARL}_0 = \mathbb{E}_\infty[\tau] is the mean time to a false alarm when no change ever occurs. ARL1=E0[Ο„]\mathrm{ARL}_1 = \mathbb{E}_0[\tau] is the mean detection delay when the change occurs at time zero. These play the roles of 1/Pf1/P_f and detection delay in change detection.

Related: CUSUM (Cumulative Sum)

Change Point

The unknown time ν∈{1,2,…,∞}\nu \in \{1, 2, \ldots, \infty\} at which the distribution of the observations switches from f0f_0 to f1f_1. Ξ½=∞\nu = \infty means the change never happens.

Related: CUSUM (Cumulative Sum)

Theorem: CUSUM as a Sequence of One-Sided SPRTs

Let Ο„h=inf⁑{n:Snβ‰₯h}\tau_h = \inf\{n : S_n \geq h\} be the CUSUM stopping time, where Sn=β„“(n)βˆ’min⁑k≀nβ„“(k)S_n = \ell^{(n)} - \min_{k \leq n} \ell^{(k)}. Then Ο„h\tau_h equals the first index at which some one-sided SPRT, started at some k∈{1,…,n}k \in \{1, \ldots, n\} with upper boundary hh and lower boundary 00, crosses hh. Equivalently, CUSUM is the supremum over all hypothetical change-points of the LLR statistic.

The running minimum acts as a "restart clock." Every time the running LLR reaches a new low, the CUSUM forgets its past and begins a fresh one-sided SPRT. The alarm fires as soon as any of these imagined restarts crosses hh.

,

Theorem: CUSUM ARL Approximations

For the CUSUM with threshold hh and log-likelihood-ratio increments with pre- and post-change means ΞΌ0=βˆ’D(f0βˆ₯f1)<0\mu_0 = -D(f_0 \| f_1) < 0 and ΞΌ1=D(f1βˆ₯f0)>0\mu_1 = D(f_1 \| f_0) > 0, neglecting excess over the boundary,

ARL0=E∞[Ο„h]β‰ˆehβˆ’hβˆ’1ΞΌ1β‹…11βˆ’eβˆ’ΞΌ1/∣μ0∣,\mathrm{ARL}_0 = \mathbb{E}_\infty[\tau_h] \approx \frac{e^h - h - 1}{\mu_1} \cdot \frac{1}{1 - e^{-\mu_1 / |\mu_0|}},

ARL1=E0[Ο„h]β‰ˆhΞΌ1.\mathrm{ARL}_1 = \mathbb{E}_0[\tau_h] \approx \frac{h}{\mu_1}.

In particular, to a first approximation ARL0β‰ˆeh/ΞΌ1\mathrm{ARL}_0 \approx e^h/\mu_1 for large hh: the false-alarm ARL grows exponentially with the threshold, while the detection delay grows only linearly.

The asymmetry between linear detection delay and exponential false-alarm time is what makes CUSUM so effective: doubling hh quadruples ARL0\mathrm{ARL}_0 while only doubling ARL1\mathrm{ARL}_1.

,

Key Takeaway

The CUSUM exchanges linear detection delay for exponential false-alarm time: ARL1β‰ˆh/D(f1βˆ₯f0)\mathrm{ARL}_1 \approx h/D(f_1 \| f_0) versus ARL0β‰ˆeh/D(f1βˆ₯f0)\mathrm{ARL}_0 \approx e^h/D(f_1 \| f_0). Every nat added to hh decuples ARL0\mathrm{ARL}_0 at the cost of one additional sample of expected detection delay.

Example: CUSUM for a Gaussian Mean Shift

Observations YiY_i are i.i.d.\ N(ΞΌ0,Οƒ2)\mathcal{N}(\mu_0, \sigma^2) before the change and N(ΞΌ1,Οƒ2)\mathcal{N}(\mu_1, \sigma^2) after, with known ΞΌ0,ΞΌ1,Οƒ2\mu_0, \mu_1, \sigma^2 and ΞΌ1>ΞΌ0\mu_1 > \mu_0. Design a CUSUM with ARL0β‰₯104\mathrm{ARL}_0 \geq 10^4 and compute the expected detection delay.

CUSUM Change Detector

Complexity: O(1)O(1) memory and per-sample work
Input: Densities f0,f1f_0, f_1; threshold h>0h > 0
Output: Alarm time Ο„\tau
1. S←0S \leftarrow 0
2. n←0n \leftarrow 0
3. loop
4. n←n+1\quad n \leftarrow n + 1
5. \quad observe YnY_n
6. S←max⁑(0, S+log⁑(f1(Yn)/f0(Yn)))\quad S \leftarrow \max(0, \, S + \log(f_1(Y_n) / f_0(Y_n)))
7. \quad if Sβ‰₯hS \geq h then return τ←n\tau \leftarrow n
8. end loop

The two-sided CUSUM (for changes in either direction) runs two statistics in parallel with opposite signs and fires when either exceeds hh. Computational cost remains O(1)O(1) per sample.

CUSUM Statistic with Injected Change Point

Simulate a sample path of observations with a change from N(0,1)\mathcal{N}(0, 1) to N(ΞΌ1,1)\mathcal{N}(\mu_1, 1) at a chosen time Ξ½\nu. The upper panel shows the observations and the true change-point; the lower panel shows the CUSUM statistic and threshold. Adjust ΞΌ1\mu_1, Ξ½\nu, and hh to explore detection delay and false alarms.

Parameters
1
100
6

SPRT vs. CUSUM

PropertySPRTCUSUM
ProblemFixed hypothesis, accept/rejectDetect change in distribution
StoppingTerminal decision at Ο„\tauAlarm and reset
ThresholdsTwo: A>0A > 0, B<0B < 0One: h>0h > 0
ReflectionNoYes, at zero
Performance metric(Ξ±,Ξ²)(\alpha, \beta) and ASN(ARL0,ARL1)(\mathrm{ARL}_0, \mathrm{ARL}_1)
OptimalityWald-Wolfowitz (double ASN)Lorden-Moustakides (minimax delay)
Typical useEarly-stop decoding, acceptance testingHandover, jammer detection, quality control

Why This Matters: Handover Triggering and Jammer Detection

Handover triggering. A UE continuously measures RSRP (reference signal received power) from serving and neighbor cells. The serving-cell RSRP is modeled as a Gaussian process centered on its slow-fading mean. When the UE moves into a new cell, the mean shifts. Detecting this shift quickly β€” but not so quickly that ping-pong handovers occur β€” is a classical change-detection problem. 3GPP events A1-A6 implement thresholded, hysteresis-based variants of CUSUM-like logic.

Jammer detection. A cognitive radio sensing receiver observes the noise-plus-interference power in its frequency band. When a jammer switches on, the mean power abruptly increases. A CUSUM on the log-power samples (or on the square of band-limited noise samples) declares the jammer within a few symbol periods, enabling fast frequency hopping or spectral notching.

πŸ”§Engineering Note

Hysteresis vs. CUSUM in 3GPP Measurement Events

3GPP measurement events A3 ("neighbor becomes offset better than serving") employ a time-to-trigger (TTT) parameter that requires the condition to hold for a consecutive number of measurement periods before reporting. This is essentially a dwell-time test, not a CUSUM.

A strict CUSUM has the theoretical advantage of reacting earlier when the mean shift is large and later when the shift is small, adapting to the realized signal strength. Some recent handover algorithms exploit CUSUM-style adaptive dwell; they remain non-standardized but show 10-30% ping-pong reduction in simulation.

Practical Constraints
  • β€’

    3GPP NR TTT values: 0, 40, 64, 80, 100, ..., 5120 ms

  • β€’

    Measurement period: 200 ms (SSB-based L3 filtering)

  • β€’

    Hysteresis margin: 0-30 dB in 0.5 dB steps

πŸ“‹ Ref: 3GPP TS 38.331, Section 5.5.4 (measurement events)

Common Mistake: CUSUM Requires Knowing the Post-Change Density

Mistake:

Designing a CUSUM for a Gaussian mean-shift detector by picking ΞΌ1=ΞΌ0+1\mu_1 = \mu_0 + 1 "because that seems reasonable," then wondering why the ARLs are way off when the real shift is ΞΌ1=ΞΌ0+3\mu_1 = \mu_0 + 3.

Correction:

The CUSUM increment β„“(y)=log⁑(f1(y)/f0(y))\ell(y) = \log(f_1(y)/f_0(y)) depends on the assumed f1f_1. Performance is optimal only when the assumed post- change density matches reality. When the shift size is unknown, use the GLR-CUSUM (generalized likelihood ratio): replace the fixed LLR with its supremum over a parameter set, at the cost of O(n)O(n) storage. Alternatively, use a windowed GLR for bounded complexity.

Common Mistake: ARL Is Not a Probability

Mistake:

Confusing 1/ARL01/\mathrm{ARL}_0 with the false-alarm probability, or reporting "the CUSUM has ARL0=104\mathrm{ARL}_0 = 10^4, so Pf=10βˆ’4P_f = 10^{-4}."

Correction:

There is no fixed sample size, so there is no per-decision false-alarm probability. ARL0\mathrm{ARL}_0 is the expected time to the first false alarm β€” with units of samples, not a probability. A useful interpretation: if the system runs indefinitely with no change, false alarms occur at rate β‰ˆ1/ARL0\approx 1/\mathrm{ARL}_0 per sample, so 1/ARL01/\mathrm{ARL}_0 plays the role of a false-alarm rate in continuous-time analogues.

Historical Note: Page and the Birth of Change Detection

1954

E. S. Page introduced CUSUM charts in 1954 for industrial quality control. Shewhart's 3Οƒ3\sigma chart (1931) would flag any point outside ΞΌΒ±3Οƒ\mu \pm 3\sigma, but it had poor detection power for small persistent mean shifts. Page's insight was to accumulate the deviations rather than threshold each individually.

Page's 1954 paper is remarkably short (six pages) and contains neither the likelihood-ratio interpretation nor the optimality theory. Those would arrive 17 years later with Lorden's 1971 proof that CUSUM is minimax with respect to worst-case change points, refined by Moustakides in 1986 to exact minimax optimality. Page himself moved on to operations research and never returned to the topic.

Historical Note: From Asymptotic to Exact Optimality

1971-1986

Lorden (1971) introduced the worst-case detection delay sup⁑νess sup⁑EΞ½[(Ο„βˆ’Ξ½)+∣Y1,…,YΞ½βˆ’1]\sup_\nu \operatorname{ess\,sup} \mathbb{E}_\nu[(\tau - \nu)^+ | Y_1, \ldots, Y_{\nu-1}] and proved that CUSUM is asymptotically minimax β€” it attains the best possible delay up to first order as ARL0β†’βˆž\mathrm{ARL}_0 \to \infty. Moustakides (1986) closed the gap by showing CUSUM is exactly minimax for every finite ARL0\mathrm{ARL}_0, using a dynamic programming argument.

This result placed CUSUM on the same footing as the SPRT: an exactly optimal sequential procedure, not merely an asymptotic one. It remains one of the cleanest optimality statements in sequential statistics.

Quick Check

What is the role of the reflection at zero in the CUSUM statistic?

To prevent the statistic from overflowing

To restart the evidence count after periods consistent with H0\mathcal{H}_0

To guarantee the CUSUM is unbiased

To make the statistic Gaussian

Quick Check

Doubling the CUSUM threshold hh approximately multiplies ARL0\mathrm{ARL}_0 by:

2

ehe^h

ehe^{h} (quadratic in the original ARL)

2h2h