Non-Stationary Demand and Concept Drift
Real Demand Is Non-Stationary
The i.i.d. assumption underlying log-regret bounds (Β§20.2) is violated by real content demand:
- Daily / weekly periodicity. Work-hour news vs evening entertainment; weekday vs weekend.
- Viral spikes. A trending video's popularity rises and falls on 1-week timescales.
- Long-tail drift. Catalog popularity distribution reshapes with new content uploads.
- Seasonal shifts. Summer sports vs winter sports; movie release calendar.
Under non-stationary demand, standard log-regret bounds fail. Regret against a single static optimum becomes unbounded. The correct benchmark is dynamic regret: compare against a piecewise-constant or slowly-changing reference.
Definition: Dynamic Regret
Dynamic Regret
For a non-stationary demand process, dynamic regret is The reference is the best per-round placement (knows before committing). Much stronger than static regret.
Under non-stationarity measured by total variation (how much the optimal placement drifts over rounds), dynamic regret is bounded by in general.
Dynamic regret can exceed static regret because the reference adapts. The bound reveals the tradeoff: slow drift ( small) allows tight regret; fast drift () makes learning impossible.
Theorem: Dynamic Regret for Coded Online Caching
For coded online caching under non-stationary demand with path length , there exists an algorithm (with restart / tracking mechanism) achieving For slow drift ( with ), regret is sublinear.
The algorithm uses a sliding-window estimate of recent demands or a restart mechanism when drift is detected. Standard machinery from online learning with drifting environments. The scaling is inherent: tracking a changing target is harder than tracking a fixed one.
Window size selection
Use window rounds. Within a window, drift is bounded; average KL divergence from current is .
Per-window regret
Within-window regret: . Number of windows: .
Total regret
Sum: (roughly).
Adaptive vs Static Placement Over Time
Delivery rate convergence of adaptive online caching vs static MAN placement. Adaptive approaches oracle-optimal as it learns. Static stays at MAN baseline forever.
Parameters
Drift Detection Techniques
Detecting non-stationarity in real-time enables adaptive schemes to restart or re-weight:
- Sliding window. Use the last observations; drop older. Simple; requires tuning .
- Exponential weighting. Weight observation by . Continuous; no window boundary.
- Change-point detection. Statistical tests (CUSUM, Shiryaev-Roberts) to identify drift events. Restart the estimator at detected change points.
- ML-based predictors. Neural networks predicting from recent history. Powerful but harder to theoretically analyze.
Practical systems use combinations: windowed estimation + ML prediction for popular-shift forecasting.
Example: Trending Video: Concept Drift in Action
A trending video's popularity rises 100Γ over 1 hour (spike), then decays exponentially over 3 days. Design an online caching response.
Detection
Request count exceeds 10Γ recent average for 5 consecutive windows β trigger rapid cache refresh.
Re-placement
Admit trending video into caches of all edge nodes serving affected population. Evict least-popular current items.
Delivery
During 1-hour surge, trending video's broadcast is coded over all requesting users β massive multicast gain. Delivery rate near-zero additional bandwidth.
Decay
After 3 days, popularity returns to baseline. Evict from edge; re-admit if another surge occurs.
Dynamic regret analysis
Over 4-day horizon with one spike: bounded. Sublinear dynamic regret achievable.
How Production CDNs Handle Drift
Real CDNs' approaches to non-stationary demand:
- Fast admission, slow eviction. Admit new items quickly; evict only when cache full. LRU does this implicitly.
- Popularity prediction. ML models (on publisher metadata: upload time, social signals) predict popularity. Cache forward-looking.
- Geographic specialization. Edge caches specialize per region; trending local content cached locally.
- Pre-positioning. For scheduled events (live sports, movie releases), preload caches.
Coded online caching could augment each with multicasting gain β but integration requires coded-delivery-aware admission logic, not yet commercialized.
- β’
Akamai: regional pre-positioning for scheduled events
- β’
Netflix: Popularity-based prediction + edge caching
- β’
Meta/TikTok: Heavy ML-based popularity prediction
- β’
Coded augmentation: research frontier
Common Mistake: Don't Assume Stationarity Blindly
Mistake:
Using i.i.d. log-regret bounds to claim performance in non-stationary regimes.
Correction:
Log-regret is proven under i.i.d. Under concept drift, log-regret may be infinite. Always verify the assumption matches the workload.
If drift is suspected: (i) measure drift rate from history; (ii) use dynamic-regret bounds; (iii) allocate algorithmic budget for tracking.
Key Takeaway
Real demand drifts β \\mathcal{R}_T^\\text{dyn} = O(T^{2/3} V_T^{1/3}) is the right bound. Algorithmic responses include sliding windows, exponential weighting, and change-point detection. Production CDNs handle drift with LRU + ML prediction; coded augmentation is an open research opportunity.