Ferkans — Interactive Telecom Tutor

Theorem: Sublinear Regret for Online Coded Caching

For the online coded caching problem with $K$ users, $N$ files, cache size $M$ , and $T$ rounds, there exists an online algorithm $\mathcal{A}$ with regret $\mathcal{R}_T \;=\; O\!\left(\sqrt{T K \log N}\right)$ against the best static MAN placement, in the adversarial demand setting.

The algorithm maintains a distribution over placements and updates it based on observed regret (Follow-the-Perturbed-Leader style). Standard online convex optimization machinery yields the $\sqrt{T}$ bound. Coded caching's piecewise-linear cost structure is amenable to this machinery.

Proof

Cost-function analysis

Delivery cost $C^{(t)}(\mathbf{Z}, \mathbf{d}^{(t)})$ is convex in $\mathbf{Z}$ (cache contents) for fixed demand — a max of linear functions.

Online convex optimization

Apply FTPL (or FTRL) with step $\eta \propto 1/\sqrt{T}$ : standard regret $O(\text{diam}(\mathcal{X}) \cdot \sqrt{T})$ where diameter of cache-space scales as $\sqrt{\log N}$ for discretized representations.

Cache-space structure

Cache is a discrete subset of $[N]$ , but relaxation to probability simplex allows continuous optimization. Apply standard bounds.

User dependency

Aggregating across $K$ users adds $\sqrt{K}$ factor. Combined: $O(\sqrt{T K \log N})$ . $\blacksquare$

Theorem: Logarithmic Regret under I.I.D. Demand

If demands are i.i.d.\ from a fixed distribution $\pi$ , an online algorithm based on empirical-frequency estimation achieves $\mathcal{R}_T = O(\log T)$ in expectation.

Under i.i.d., empirical demand frequencies concentrate around true $\pi$ at rate $1/\sqrt{t}$ . The algorithm converges to the MAN-optimal placement for $\pi$ and incurs only $O(1)$ regret per round once converged. Total: $O(\log T)$ from the initial learning phase.

Proof

Concentration

Empirical distribution $\hat\pi^{(t)}$ is within $\sqrt{\log(1/\delta)/t}$ of $\pi$ w.p. $1-\delta$ (Hoeffding).

Plug-in MAN

Place cache per empirical-optimal MAN: $\mathcal{Z}_k^{(t)}$ computed from $\hat\pi^{(t)}$ .

Regret per round

If $\hat\pi$ is close to $\pi$ , the placement is close to optimal → regret per round decays as $1/t$ . Sum: $\log T$ .

Bias and variance

Bias from estimation error adds $O(1)$ per round; variance adds $O(\log T)$ . Total: $O(\log T)$ . $\blacksquare$

Regret of Online Algorithms

Regret curves (log-log) for three regimes: adversarial ( $O(\sqrt{T \log K})$ ), stochastic ( $O(\log T)$ ), LRU ( $O(T)$ ). LRU has linear regret — structurally incompatible with coded delivery.

Parameters

Rounds T1000

Users K10

FTPL for Online Coded Caching

Complexity:

O(|\mathcal{X}|)

per round for

|\mathcal{X}|

candidate placements. Approximate with sampled / parameterized placement space for scalability.

Input:

K

,

N

,

M

, horizon

T

.

Output: Cache placements

\{\mathcal{Z}_k^{(t)}\}

for each

t

.

1. Initialize cumulative cost estimate

\hat{C}_{\mathbf{Z}} = 0

for all candidate placements

\mathbf{Z}

.

2. Set step

\eta = \sqrt{\log N / T}

.

3. for

t = 1

to

T

do

4.

\quad

Sample perturbation

\boldsymbol{\xi}^{(t)} \sim \text{Exp}(\eta)

for each candidate placement.

5.

\quad

Select placement

\mathbf{Z}^{(t)} = \arg\min_\mathbf{Z} (\hat{C}_\mathbf{Z} + \xi_\mathbf{Z})

.

6.

\quad

Observe demand

\mathbf{d}^{(t)}

and compute cost

C^{(t)}

.

7.

\quad

Update estimates:

\hat{C}_{\mathbf{Z}} \leftarrow \hat{C}_{\mathbf{Z}} + C(\mathbf{Z}, \mathbf{d}^{(t)})

for all

\mathbf{Z}

.

8. end for

FTPL (Follow-the-Perturbed-Leader) is a standard online algorithm with simple implementation. The perturbation $\boldsymbol{\xi}$ provides exploration; the minimum cost leads to exploitation. For practical $K, N$ : the placement space is too large — use parameterized placement (e.g., per-file weights) with continuous optimization.

Example: The Warmup Penalty in Online Caching

An online coded cache scheme is deployed on $K = 20$ users with $M/N = 0.2$ cache. Estimate the "warmup" penalty — how many rounds before the cache matches static MAN performance?

Solution

Target rate

MAN rate $= K(1-\mu)/(1+K\mu) = 20 \cdot 0.8 / 5 = 3.2$ files/use.

Early-round penalty

Under i.i.d. demand: regret $O(\log T)$ . At $T = 10^3$ rounds: $\log 10^3 \approx 7$ . Extra cost per round: $7/1000 \approx 0.007$ files — 0.2% over MAN.

Adversarial regime

$O(\sqrt{T \log K})$ : at $T = 10^3$ , extra cost per round $\sim \sqrt{10^3 \cdot \log 20}/T \approx 0.055$ . About 1.7% over MAN.

Take-away

Warmup is short: within 1000 rounds (a few hours of operation), online coded caching closely matches static MAN. Over long horizons (days/weeks), convergence is tight.

Common Mistake: Regret Per Round Can Be Misleading

Mistake:

Concluding online is "always worse" than offline from per-round regret.

Correction:

Total regret $\mathcal{R}_T$ grows sublinearly ( $\sqrt{T}$ or $\log T$ ).
Average per-round regret $\mathcal{R}_T / T \to 0$ as $T \to \infty$ .

The online algorithm is asymptotically as good as the best static offline. Finite-horizon performance is slightly worse due to learning; asymptotically it matches.

A bigger concern: the offline baseline is itself restricted (best static placement). Online with dynamic updates could beat offline static — a regret-free guarantee against static adversary.

Key Takeaway

Online coded caching achieves sublinear regret. Under adversarial demand: $\mathcal{R}_T = O(\sqrt{T \log K})$ . Under stochastic: $\mathcal{R}_T = O(\log T)$ . Both decay fast enough that online performance closely matches offline static MAN within a few hundred rounds — while supporting dynamic cache updates that static schemes cannot.

Regret Bounds and Online Algorithms