Regret Bounds and Online Algorithms
Theorem: Sublinear Regret for Online Coded Caching
For the online coded caching problem with users, files, cache size , and rounds, there exists an online algorithm with regret against the best static MAN placement, in the adversarial demand setting.
The algorithm maintains a distribution over placements and updates it based on observed regret (Follow-the-Perturbed-Leader style). Standard online convex optimization machinery yields the bound. Coded caching's piecewise-linear cost structure is amenable to this machinery.
Cost-function analysis
Delivery cost is convex in (cache contents) for fixed demand β a max of linear functions.
Online convex optimization
Apply FTPL (or FTRL) with step : standard regret where diameter of cache-space scales as for discretized representations.
Cache-space structure
Cache is a discrete subset of , but relaxation to probability simplex allows continuous optimization. Apply standard bounds.
User dependency
Aggregating across users adds factor. Combined: .
Theorem: Logarithmic Regret under I.I.D. Demand
If demands are i.i.d.\ from a fixed distribution , an online algorithm based on empirical-frequency estimation achieves in expectation.
Under i.i.d., empirical demand frequencies concentrate around true at rate . The algorithm converges to the MAN-optimal placement for and incurs only regret per round once converged. Total: from the initial learning phase.
Concentration
Empirical distribution is within of w.p. (Hoeffding).
Plug-in MAN
Place cache per empirical-optimal MAN: computed from .
Regret per round
If is close to , the placement is close to optimal β regret per round decays as . Sum: .
Bias and variance
Bias from estimation error adds per round; variance adds . Total: .
Regret of Online Algorithms
Regret curves (log-log) for three regimes: adversarial (), stochastic (), LRU (). LRU has linear regret β structurally incompatible with coded delivery.
Parameters
FTPL for Online Coded Caching
Complexity: per round for candidate placements. Approximate with sampled / parameterized placement space for scalability.FTPL (Follow-the-Perturbed-Leader) is a standard online algorithm with simple implementation. The perturbation provides exploration; the minimum cost leads to exploitation. For practical : the placement space is too large β use parameterized placement (e.g., per-file weights) with continuous optimization.
Example: The Warmup Penalty in Online Caching
An online coded cache scheme is deployed on users with cache. Estimate the "warmup" penalty β how many rounds before the cache matches static MAN performance?
Target rate
MAN rate files/use.
Early-round penalty
Under i.i.d. demand: regret . At rounds: . Extra cost per round: files β 0.2% over MAN.
Adversarial regime
: at , extra cost per round . About 1.7% over MAN.
Take-away
Warmup is short: within 1000 rounds (a few hours of operation), online coded caching closely matches static MAN. Over long horizons (days/weeks), convergence is tight.
Common Mistake: Regret Per Round Can Be Misleading
Mistake:
Concluding online is "always worse" than offline from per-round regret.
Correction:
- Total regret grows sublinearly ( or ).
- Average per-round regret as .
The online algorithm is asymptotically as good as the best static offline. Finite-horizon performance is slightly worse due to learning; asymptotically it matches.
A bigger concern: the offline baseline is itself restricted (best static placement). Online with dynamic updates could beat offline static β a regret-free guarantee against static adversary.
Key Takeaway
Online coded caching achieves sublinear regret. Under adversarial demand: . Under stochastic: . Both decay fast enough that online performance closely matches offline static MAN within a few hundred rounds β while supporting dynamic cache updates that static schemes cannot.