Ferkans — Interactive Telecom Tutor

Detecting When the World Changes

The SPRT assumes the hypothesis is fixed before the experiment begins. In many applications the hypothesis changes during the observation window. A user enters a cell, a jammer switches on, a link experiences a shadowing event, a machine begins to degrade. The statistician must detect the change as quickly as possible while avoiding false alarms.

Formally, observations $Y_1, Y_2, \ldots$ are i.i.d.\ with density $f_0$ up to some unknown change-point $\nu$ , and with density $f_1 \neq f_0$ afterwards. The goal is a stopping rule $\tau$ such that, roughly, $\mathbb{E}_\nu[(\tau - \nu)^+]$ (the detection delay) is small, while $\mathbb{E}_\infty[\tau]$ (the time to a false alarm when no change occurs) is large.

The CUSUM (cumulative sum) algorithm of Page (1954) is the sequential analogue of the likelihood-ratio test for change detection. Unlike the SPRT, CUSUM runs indefinitely without terminal decision, firing an alarm whenever evidence of a change accumulates beyond a threshold. Lorden (1971) and Moustakides (1986) showed that CUSUM is minimax optimal: it minimizes the worst-case expected detection delay among all procedures with a given false-alarm rate.

Definition:
Page's CUSUM Statistic

Let $\ell(y) = \log \frac{f_1(y)}{f_0(y)}$ as before. The CUSUM statistic is the nonnegative process

$S_0 = 0, \qquad S_n = \max(0, S_{n-1} + \ell(Y_n)).$

Equivalently, $S_n = \ell^{(n)} - \min_{0 \leq k \leq n} \ell^{(k)}$ : the current LLR minus its running minimum. The CUSUM stopping rule with threshold $h > 0$ is

$\tau_h = \inf\{n \geq 1 : S_n \geq h\}.$

Intuitively, $S_n$ is the accumulated evidence in favor of $\mathcal{H}_1$ since the last time the evidence favored $\mathcal{H}_0$ most strongly. The reflection at zero is what distinguishes CUSUM from a plain running sum: it resets the statistic whenever past evidence points toward $f_0$ , so the detector is not distracted by ancient history.

,

CUSUM (Cumulative Sum)

A sequential change-detection algorithm that accumulates log-likelihood ratios with reflection at zero and raises an alarm when the running statistic exceeds a threshold. Optimal under Lorden's minimax criterion.

ARL (Average Run Length)

$\mathrm{ARL}_0 = \mathbb{E}_\infty[\tau]$ is the mean time to a false alarm when no change ever occurs. $\mathrm{ARL}_1 = \mathbb{E}_0[\tau]$ is the mean detection delay when the change occurs at time zero. These play the roles of $1/P_f$ and detection delay in change detection.

Related: CUSUM (Cumulative Sum)

Change Point

The unknown time $\nu \in \{1, 2, \ldots, \infty\}$ at which the distribution of the observations switches from $f_0$ to $f_1$ . $\nu = \infty$ means the change never happens.

Related: CUSUM (Cumulative Sum)

Theorem: CUSUM as a Sequence of One-Sided SPRTs

Let $\tau_h = \inf\{n : S_n \geq h\}$ be the CUSUM stopping time, where $S_n = \ell^{(n)} - \min_{k \leq n} \ell^{(k)}$ . Then $\tau_h$ equals the first index at which some one-sided SPRT, started at some $k \in \{1, \ldots, n\}$ with upper boundary $h$ and lower boundary $0$ , crosses $h$ . Equivalently, CUSUM is the supremum over all hypothetical change-points of the LLR statistic.

The running minimum acts as a "restart clock." Every time the running LLR reaches a new low, the CUSUM forgets its past and begins a fresh one-sided SPRT. The alarm fires as soon as any of these imagined restarts crosses $h$ .

Show Hint

Write $S_n = \max_{1 \leq k \leq n} (\ell^{(n)} - \ell^{(k-1)})$ .

Interpret the increment $\ell^{(n)} - \ell^{(k-1)}$ as an SPRT statistic that started at time $k$ .

Proof

Express the CUSUM as a maximum

By definition, $S_n = \ell^{(n)} - \min_{0 \leq k \leq n} \ell^{(k)} = \max_{0 \leq k \leq n}(\ell^{(n)} - \ell^{(k)}) = \max_{1 \leq k \leq n+1}\ \sum_{i=k}^n \ell(Y_i),$ where the last sum is $0$ when $k = n+1$ .

Recognize the inner sum as an SPRT statistic

For each fixed "hypothetical start" $k$ , the inner sum $\sum_{i=k}^n \ell(Y_i)$ is the cumulative LLR of a one-sided SPRT that began observing samples at time $k$ .

Conclude

The alarm $\tau_h$ fires when $\max_k(\ell^{(n)} - \ell^{(k-1)}) \geq h$ , i.e., when at least one of these hypothetical SPRTs has accumulated $h$ nats of evidence. $\blacksquare$

,

Theorem: CUSUM ARL Approximations

For the CUSUM with threshold $h$ and log-likelihood-ratio increments with pre- and post-change means $\mu_0 = -D(f_0 \| f_1) < 0$ and $\mu_1 = D(f_1 \| f_0) > 0$ , neglecting excess over the boundary,

$\mathrm{ARL}_0 = \mathbb{E}_\infty[\tau_h] \approx \frac{e^h - h - 1}{\mu_1} \cdot \frac{1}{1 - e^{-\mu_1 / |\mu_0|}},$

$\mathrm{ARL}_1 = \mathbb{E}_0[\tau_h] \approx \frac{h}{\mu_1}.$

In particular, to a first approximation $\mathrm{ARL}_0 \approx e^h/\mu_1$ for large $h$ : the false-alarm ARL grows exponentially with the threshold, while the detection delay grows only linearly.

The asymmetry between linear detection delay and exponential false-alarm time is what makes CUSUM so effective: doubling $h$ quadruples $\mathrm{ARL}_0$ while only doubling $\mathrm{ARL}_1$ .

Proof

Detection delay via Wald's identity

Under $\mathcal{H}_1$ (change at time $0$ ), $S_n$ is a random walk with positive drift $\mu_1$ reflected at $0$ . Since the drift is positive, reflection is rare and the walk hits $h$ in about $h / \mu_1$ steps on average, matching Wald's identity.

False alarm via renewal theory

Under $\mathcal{H}_0$ , $S_n$ has negative drift $\mu_0 < 0$ . The walk spends most of its time near zero, and the hitting time of $h$ is the first success in a sequence of approximately independent "excursions" above zero, each of duration $O(1)$ . Standard renewal analysis (see Siegmund 1985) yields the stated exponential scaling.

First-order approximation

For $h$ not too small, $\mathrm{ARL}_0 \approx e^h / \mu_1$ , and the quantity $e^h$ plays the role of $1/\alpha$ in a fixed-sample test. $\blacksquare$

,

Key Takeaway

The CUSUM exchanges linear detection delay for exponential false-alarm time: $\mathrm{ARL}_1 \approx h/D(f_1 \| f_0)$ versus $\mathrm{ARL}_0 \approx e^h/D(f_1 \| f_0)$ . Every nat added to $h$ decuples $\mathrm{ARL}_0$ at the cost of one additional sample of expected detection delay.

Example: CUSUM for a Gaussian Mean Shift

Observations $Y_i$ are i.i.d.\ $\mathcal{N}(\mu_0, \sigma^2)$ before the change and $\mathcal{N}(\mu_1, \sigma^2)$ after, with known $\mu_0, \mu_1, \sigma^2$ and $\mu_1 > \mu_0$ . Design a CUSUM with $\mathrm{ARL}_0 \geq 10^4$ and compute the expected detection delay.

Solution

LLR increment

$\ell(y) = \frac{\mu_1 - \mu_0}{\sigma^2}\left(y - \frac{\mu_0 + \mu_1}{2}\right).$ $Denote$ \Delta = (\mu_1 - \mu_0)/\sigma $, so$ D(f_1 | f_0) = \Delta^2/2$.

Choose threshold

For $\mathrm{ARL}_0 \geq 10^4$ , use $h$ such that $e^h / (\Delta^2/2) \geq 10^4$ , i.e., $h \geq \log(10^4 \Delta^2/2)$ . For $\Delta = 1$ (1 standard deviation shift), $h \geq \log(5000) \approx 8.52$ nats.

Detection delay

$\mathrm{ARL}_1 \approx h / (\Delta^2/2) = 8.52 / 0.5 \approx 17$ samples. Interpretation: with a $1\sigma$ mean shift, the CUSUM raises a false alarm roughly once per $10^4$ samples and detects a real change within 17 samples of its onset.

CUSUM Change Detector

Complexity:

O(1)

memory and per-sample work

Input: Densities

f_0, f_1

; threshold

h > 0

Output: Alarm time

\tau

1.

S \leftarrow 0

2.

n \leftarrow 0

3. loop

4.

\quad n \leftarrow n + 1

5.

\quad

observe

Y_n

6.

\quad S \leftarrow \max(0, \, S + \log(f_1(Y_n) / f_0(Y_n)))

7.

\quad

if

S \geq h

then return

\tau \leftarrow n

8. end loop

The two-sided CUSUM (for changes in either direction) runs two statistics in parallel with opposite signs and fires when either exceeds $h$ . Computational cost remains $O(1)$ per sample.

CUSUM Statistic with Injected Change Point

Simulate a sample path of observations with a change from $\mathcal{N}(0, 1)$ to $\mathcal{N}(\mu_1, 1)$ at a chosen time $\nu$ . The upper panel shows the observations and the true change-point; the lower panel shows the CUSUM statistic and threshold. Adjust $\mu_1$ , $\nu$ , and $h$ to explore detection delay and false alarms.

Parameters

\mu_1

(post-change mean)1

Change point

\nu

100

Threshold

h

6

SPRT vs. CUSUM

Property	SPRT	CUSUM
Problem	Fixed hypothesis, accept/reject	Detect change in distribution
Stopping	Terminal decision at $\tau$	Alarm and reset
Thresholds	Two: $A > 0$ , $B < 0$	One: $h > 0$
Reflection	No	Yes, at zero
Performance metric	$(\alpha, \beta)$ and ASN	$(\mathrm{ARL}_0, \mathrm{ARL}_1)$
Optimality	Wald-Wolfowitz (double ASN)	Lorden-Moustakides (minimax delay)
Typical use	Early-stop decoding, acceptance testing	Handover, jammer detection, quality control

Why This Matters: Handover Triggering and Jammer Detection

Handover triggering. A UE continuously measures RSRP (reference signal received power) from serving and neighbor cells. The serving-cell RSRP is modeled as a Gaussian process centered on its slow-fading mean. When the UE moves into a new cell, the mean shifts. Detecting this shift quickly — but not so quickly that ping-pong handovers occur — is a classical change-detection problem. 3GPP events A1-A6 implement thresholded, hysteresis-based variants of CUSUM-like logic.

Jammer detection. A cognitive radio sensing receiver observes the noise-plus-interference power in its frequency band. When a jammer switches on, the mean power abruptly increases. A CUSUM on the log-power samples (or on the square of band-limited noise samples) declares the jammer within a few symbol periods, enabling fast frequency hopping or spectral notching.

🔧Engineering Note

Hysteresis vs. CUSUM in 3GPP Measurement Events

3GPP measurement events A3 ("neighbor becomes offset better than serving") employ a time-to-trigger (TTT) parameter that requires the condition to hold for a consecutive number of measurement periods before reporting. This is essentially a dwell-time test, not a CUSUM.

A strict CUSUM has the theoretical advantage of reacting earlier when the mean shift is large and later when the shift is small, adapting to the realized signal strength. Some recent handover algorithms exploit CUSUM-style adaptive dwell; they remain non-standardized but show 10-30% ping-pong reduction in simulation.

Practical Constraints

•
3GPP NR TTT values: 0, 40, 64, 80, 100, ..., 5120 ms
•
Measurement period: 200 ms (SSB-based L3 filtering)
•
Hysteresis margin: 0-30 dB in 0.5 dB steps

📋 Ref: 3GPP TS 38.331, Section 5.5.4 (measurement events)

Common Mistake: CUSUM Requires Knowing the Post-Change Density

Mistake:

Designing a CUSUM for a Gaussian mean-shift detector by picking $\mu_1 = \mu_0 + 1$ "because that seems reasonable," then wondering why the ARLs are way off when the real shift is $\mu_1 = \mu_0 + 3$ .

Correction:

The CUSUM increment $\ell(y) = \log(f_1(y)/f_0(y))$ depends on the assumed $f_1$ . Performance is optimal only when the assumed post- change density matches reality. When the shift size is unknown, use the GLR-CUSUM (generalized likelihood ratio): replace the fixed LLR with its supremum over a parameter set, at the cost of $O(n)$ storage. Alternatively, use a windowed GLR for bounded complexity.

Common Mistake: ARL Is Not a Probability

Mistake:

Confusing $1/\mathrm{ARL}_0$ with the false-alarm probability, or reporting "the CUSUM has $\mathrm{ARL}_0 = 10^4$ , so $P_f = 10^{-4}$ ."

Correction:

There is no fixed sample size, so there is no per-decision false-alarm probability. $\mathrm{ARL}_0$ is the expected time to the first false alarm — with units of samples, not a probability. A useful interpretation: if the system runs indefinitely with no change, false alarms occur at rate $\approx 1/\mathrm{ARL}_0$ per sample, so $1/\mathrm{ARL}_0$ plays the role of a false-alarm rate in continuous-time analogues.

Historical Note: Page and the Birth of Change Detection

1954

E. S. Page introduced CUSUM charts in 1954 for industrial quality control. Shewhart's $3\sigma$ chart (1931) would flag any point outside $\mu \pm 3\sigma$ , but it had poor detection power for small persistent mean shifts. Page's insight was to accumulate the deviations rather than threshold each individually.

Page's 1954 paper is remarkably short (six pages) and contains neither the likelihood-ratio interpretation nor the optimality theory. Those would arrive 17 years later with Lorden's 1971 proof that CUSUM is minimax with respect to worst-case change points, refined by Moustakides in 1986 to exact minimax optimality. Page himself moved on to operations research and never returned to the topic.

Historical Note: From Asymptotic to Exact Optimality

1971-1986

Lorden (1971) introduced the worst-case detection delay $\sup_\nu \operatorname{ess\,sup} \mathbb{E}_\nu[(\tau - \nu)^+ | Y_1, \ldots, Y_{\nu-1}]$ and proved that CUSUM is asymptotically minimax — it attains the best possible delay up to first order as $\mathrm{ARL}_0 \to \infty$ . Moustakides (1986) closed the gap by showing CUSUM is exactly minimax for every finite $\mathrm{ARL}_0$ , using a dynamic programming argument.

This result placed CUSUM on the same footing as the SPRT: an exactly optimal sequential procedure, not merely an asymptotic one. It remains one of the cleanest optimality statements in sequential statistics.

Quick Check

What is the role of the reflection at zero in the CUSUM statistic?

To prevent the statistic from overflowing

To restart the evidence count after periods consistent with $\mathcal{H}_0$

To guarantee the CUSUM is unbiased

To make the statistic Gaussian

Correction:

To restart the evidence count after periods consistent with

\mathcal{H}_0

Reflection ensures past $f_0$ -consistent evidence does not inflate the threshold-hitting time after a real change.

Quick Check

Doubling the CUSUM threshold $h$ approximately multiplies $\mathrm{ARL}_0$ by:

2

$e^h$

$e^{h}$ (quadratic in the original ARL)

$2h$

Correction:

e^{h}

(quadratic in the original ARL)

Since $\mathrm{ARL}_0 \approx e^h$ , doubling $h$ gives $e^{2h} = (e^h)^2$ , i.e., squares the original ARL.

CUSUM and Change Detection

Detecting When the World Changes

Definition: Page's CUSUM Statistic

CUSUM (Cumulative Sum)

ARL (Average Run Length)

Change Point

Theorem: CUSUM as a Sequence of One-Sided SPRTs

Express the CUSUM as a maximum

Recognize the inner sum as an SPRT statistic

Conclude

Theorem: CUSUM ARL Approximations

Detection delay via Wald's identity

False alarm via renewal theory

First-order approximation

Key Takeaway

Example: CUSUM for a Gaussian Mean Shift

LLR increment

Choose threshold

Detection delay

CUSUM Change Detector

CUSUM Statistic with Injected Change Point

Parameters

SPRT vs. CUSUM

Why This Matters: Handover Triggering and Jammer Detection

Hysteresis vs. CUSUM in 3GPP Measurement Events

Common Mistake: CUSUM Requires Knowing the Post-Change Density

Common Mistake: ARL Is Not a Probability

Historical Note: Page and the Birth of Change Detection

Historical Note: From Asymptotic to Exact Optimality

Quick Check

Quick Check

Definition:
Page's CUSUM Statistic