Regret Bounds and Online Algorithms

Theorem: Sublinear Regret for Online Coded Caching

For the online coded caching problem with KK users, NN files, cache size MM, and TT rounds, there exists an online algorithm A\mathcal{A} with regret RTβ€…β€Š=β€…β€ŠO ⁣(TKlog⁑N)\mathcal{R}_T \;=\; O\!\left(\sqrt{T K \log N}\right) against the best static MAN placement, in the adversarial demand setting.

The algorithm maintains a distribution over placements and updates it based on observed regret (Follow-the-Perturbed-Leader style). Standard online convex optimization machinery yields the T\sqrt{T} bound. Coded caching's piecewise-linear cost structure is amenable to this machinery.

Theorem: Logarithmic Regret under I.I.D. Demand

If demands are i.i.d.\ from a fixed distribution Ο€\pi, an online algorithm based on empirical-frequency estimation achieves RT=O(log⁑T)\mathcal{R}_T = O(\log T) in expectation.

Under i.i.d., empirical demand frequencies concentrate around true Ο€\pi at rate 1/t1/\sqrt{t}. The algorithm converges to the MAN-optimal placement for Ο€\pi and incurs only O(1)O(1) regret per round once converged. Total: O(log⁑T)O(\log T) from the initial learning phase.

Regret of Online Algorithms

Regret curves (log-log) for three regimes: adversarial (O(Tlog⁑K)O(\sqrt{T \log K})), stochastic (O(log⁑T)O(\log T)), LRU (O(T)O(T)). LRU has linear regret β€” structurally incompatible with coded delivery.

Parameters
1000
10

FTPL for Online Coded Caching

Complexity: O(∣X∣)O(|\mathcal{X}|) per round for ∣X∣|\mathcal{X}| candidate placements. Approximate with sampled / parameterized placement space for scalability.
Input: KK, NN, MM, horizon TT.
Output: Cache placements {Zk(t)}\{\mathcal{Z}_k^{(t)}\} for each tt.
1. Initialize cumulative cost estimate C^Z=0\hat{C}_{\mathbf{Z}} = 0 for all candidate placements Z\mathbf{Z}.
2. Set step η=log⁑N/T\eta = \sqrt{\log N / T}.
3. for t=1t = 1 to TT do
4. \quad Sample perturbation ξ(t)∼Exp(η)\boldsymbol{\xi}^{(t)} \sim \text{Exp}(\eta) for each candidate placement.
5. \quad Select placement Z(t)=arg⁑min⁑Z(C^Z+ξZ)\mathbf{Z}^{(t)} = \arg\min_\mathbf{Z} (\hat{C}_\mathbf{Z} + \xi_\mathbf{Z}).
6. \quad Observe demand d(t)\mathbf{d}^{(t)} and compute cost C(t)C^{(t)}.
7. \quad Update estimates: C^Z←C^Z+C(Z,d(t))\hat{C}_{\mathbf{Z}} \leftarrow \hat{C}_{\mathbf{Z}} + C(\mathbf{Z}, \mathbf{d}^{(t)}) for all Z\mathbf{Z}.
8. end for

FTPL (Follow-the-Perturbed-Leader) is a standard online algorithm with simple implementation. The perturbation ΞΎ\boldsymbol{\xi} provides exploration; the minimum cost leads to exploitation. For practical K,NK, N: the placement space is too large β€” use parameterized placement (e.g., per-file weights) with continuous optimization.

Example: The Warmup Penalty in Online Caching

An online coded cache scheme is deployed on K=20K = 20 users with M/N=0.2M/N = 0.2 cache. Estimate the "warmup" penalty β€” how many rounds before the cache matches static MAN performance?

Common Mistake: Regret Per Round Can Be Misleading

Mistake:

Concluding online is "always worse" than offline from per-round regret.

Correction:

  • Total regret RT\mathcal{R}_T grows sublinearly (T\sqrt{T} or log⁑T\log T).
  • Average per-round regret RT/Tβ†’0\mathcal{R}_T / T \to 0 as Tβ†’βˆžT \to \infty.

The online algorithm is asymptotically as good as the best static offline. Finite-horizon performance is slightly worse due to learning; asymptotically it matches.

A bigger concern: the offline baseline is itself restricted (best static placement). Online with dynamic updates could beat offline static β€” a regret-free guarantee against static adversary.

Key Takeaway

Online coded caching achieves sublinear regret. Under adversarial demand: RT=O(Tlog⁑K)\mathcal{R}_T = O(\sqrt{T \log K}). Under stochastic: RT=O(log⁑T)\mathcal{R}_T = O(\log T). Both decay fast enough that online performance closely matches offline static MAN within a few hundred rounds β€” while supporting dynamic cache updates that static schemes cannot.