Ferkans — Interactive Telecom Tutor

ex-ch20-e01

Easy

Define regret. What does "sublinear" mean for $\mathcal{R}_T$ ?

Show Hint

Regret compares cumulative algorithm cost to best hindsight.

Sublinear: grows slower than $T$ .

Solution

Regret

$\mathcal{R}_T = \sum_t C^{(t)}(\mathcal{A}^{(t)}) - \min_\mathbf{Z} \sum_t C^{(t)}(\mathbf{Z})$ . Extra cumulative cost vs best-hindsight placement.

Sublinear

$\mathcal{R}_T = o(T)$ as $T \to \infty$ . E.g., $\sqrt{T}$ (adversarial), $\log T$ (stochastic). Average regret per round $\to 0$ : online matches offline asymptotically.

ex-ch20-e02

Easy

For $T = 10000$ , $K = 20$ , $N = 100$ : compute the regret bound for adversarial online coded caching.

Solution

Formula

$\mathcal{R}_T = O(\sqrt{T K \log N})$ .

Value

$\sqrt{10^4 \cdot 20 \cdot \log 100} = \sqrt{10^4 \cdot 20 \cdot 4.6} \approx \sqrt{9.2 \times 10^5} \approx 960$ .

Per-round

Average extra cost per round $\approx 960/10^4 = 0.096$ files/use. About 3% over MAN rate in this regime.

ex-ch20-e03

Medium

Show that LRU has $\Theta(T)$ regret against the offline coded MAN baseline.

Show Hint

LRU lacks coded multicasting gain entirely.

Coded MAN rate $\approx$ uncoded rate / $(1 + K\mu)$ .

Solution

LRU per-round cost

LRU delivers uncoded: $\sim K(1 - \mu)$ per round (if hit rate $\approx \mu$ ).

MAN per-round

$\sim K(1-\mu)/(1+K\mu)$ .

Gap

$\Delta_t \approx K(1-\mu) \cdot K\mu/(1+K\mu)$ per round. Constant gap → $\Theta(T)$ cumulative regret.

Interpretation

LRU fundamentally cannot achieve the multicasting gain. Structural, not tunable.

ex-ch20-e04

Medium

Derive the optimal FTPL learning rate $\eta$ for our setting.

Show Hint

Standard FTPL: $\eta = \sqrt{\log(\text{support})/T}$ .

Solution

Standard FTPL

For finite action set of size $m$ : $\eta^* = \sqrt{\log m / T}$ .

Applied to CC

Action = placement choice; with per-user MAN, up to $\binom{K}{t}$ choices per user. $m \sim \binom{K}{t}$ . So $\eta \sim \sqrt{\log \binom{K}{t}/T}$ .

Approximation

$\log \binom{K}{t} \approx K h(t/K)$ where $h$ is binary entropy. $\eta \sim \sqrt{K/T}$ .

ex-ch20-e05

Medium

For i.i.d. Zipf- $\alpha = 0.8$ demand, estimate the convergence time for empirical popularity estimation to within 1% KL.

Show Hint

Zipf: fastest-to-converge among distributions with same support.

KL rate $\sim \alpha / T$ .

Solution

Convergence rate

Under Zipf- $\alpha = 0.8$ : effective support size $\sim N^{0.2}$ (concentrated on popular).

For 1% KL

KL $\propto 1/T$ . For KL = 0.01: $T \sim 100 \cdot \text{effective support}$ .

Numerical

$N = 1000$ : effective support $\sim 1000^{0.2} \approx 4$ . $T \sim 400$ rounds for 1% KL. Very fast.

ex-ch20-e06

Hard

Prove Theorem 20.2.2 (logarithmic regret under i.i.d.).

Show Hint

Empirical concentration + plug-in estimate.

Per-round regret from estimation error.

Solution

Setup

At round $t$ , cache placed based on $\hat\pi^{(t-1)}$ . Regret per round: cost with $\hat\pi$ -optimal minus cost with $\pi$ -optimal.

Taylor expansion

Cost is smooth in $\pi$ : regret $\propto \|\hat\pi^{(t-1)} - \pi\|^2$ near optimum.

Concentration

$\mathbb{E}[\|\hat\pi^{(t-1)} - \pi\|^2] = O(1/t)$ .

Total

$\sum_{t=1}^T O(1/t) = O(\log T)$ . $\blacksquare$

ex-ch20-e07

Medium

Compare adversarial and stochastic regret bounds at $T = 10^6$ rounds. Which is tighter? When would you expect each to dominate?

Solution

Adversarial

$\sqrt{T \log K} = \sqrt{10^6 \cdot \log K} \sim \sqrt{5} \cdot 10^3 \approx 2200$ (for $K = 20$ ).

Stochastic

$\log T = \log 10^6 \approx 14$ . Vastly tighter.

When stochastic applies

Strictly i.i.d. demand (rare in practice). Or piecewise-stationary where within-piece bound applies.

When adversarial applies

Anything else: worst-case bound always holds. Pessimistic for real workloads.

ex-ch20-e08

Hard

Design an adaptive algorithm for piecewise-stationary demand with $S$ change points over $T$ rounds. What is the optimal regret?

Solution

Piecewise-stationary model

Demand is i.i.d. within each of $S$ pieces. Change points unknown.

Change-point detection

Restart estimator when detected. Cost: $\log T$ per piece.

Total regret

$S \cdot \log T$ from within-piece + $O(\sqrt{S T})$ from transitions. Total $O(S \log T + \sqrt{ST})$ .

Interpretation

For $S \ll T$ : near-stochastic performance. For $S = T$ : degrades to adversarial.

ex-ch20-e09

Medium

Why does FTPL work (even though the action space is discrete)?

Show Hint

FTPL adds noise to explore.

Sample complexity bounds.

Solution

Discrete action space

FTPL selects from a finite set of placements; $O(|\text{space}|)$ per round is feasible.

Noise for exploration

Exponential perturbation $\boldsymbol{\xi}$ introduces randomness: rarely-chosen placements still selected occasionally, preventing getting stuck.

Regret analysis

Standard FTPL regret $O(\sqrt{T \log |\text{space}|})$ . Scales with space but logarithmically — works for huge spaces.

ex-ch20-e10

Hard

Design a coded LRU hybrid: how would you combine LRU's admission with MAN-style delivery structure?

Solution

Admission via LRU

Standard LRU decides which files to cache locally.

Delivery via MAN on cached structure

On a miss: server coordinates with other users' caches; sends coded XOR combinations where possible.

Coordination complexity

Requires server to know each cache's current LRU state. Metadata-heavy; not how real CDN works. Alternative: decentralized caching + coded delivery.

Research status

Hybrid schemes studied in the literature — variable gains depending on workload. Active area.

ex-ch20-e11

Medium

The sliding window for non-stationary learning has trade-off: narrow vs wide. Derive the optimal window size.

Show Hint

Variance-bias tradeoff: narrow window = high variance; wide = high bias.

Solution

Tradeoff

Narrow $W$ : empirical estimate has variance $\sim 1/W$ . Wide $W$ : drift accumulates, bias $\sim V_T \cdot W/T$ .

Optimal W

Balance: $1/W \approx V_T W / T$ $\Rightarrow W \sim \sqrt{T/V_T}$ .

Resulting regret

$O(\sqrt{T V_T})$ . Matches general dynamic-regret bound.

ex-ch20-e12

Medium

Explain how change-point detection (CUSUM) can improve regret.

Solution

CUSUM

Cumulative sum of deviations; alarm when sum exceeds threshold. Detects abrupt shifts in demand distribution.

Algorithmic integration

On alarm: restart empirical estimator. Within stable phase: log-regret applies.

Regret benefit

Restart cost: constant per change point. If $S$ change points: total regret $O(S \log T)$ . Much better than $\sqrt{T V_T}$ if changes are discrete rather than smooth drift.

ex-ch20-e13

Hard

How does online coded caching interact with demand privacy (Ch 12)? Can both be achieved simultaneously?

Solution

Online + privacy

Online: cache updates per round. Privacy: mask demands. Two orthogonal concerns.

Composition

Apply Wan-Caire demand masking per round. Server sees shuffled demands; updates cache based on empirical distribution of shuffled demands.

Privacy preserved

If masking is consistent: server's empirical estimate is over shuffled index space. Cache placement pattern robust under uniform permutation.

Regret bound

Unchanged from non-private online: $O(\sqrt{T})$ . Privacy is free in the online setting too.

ex-ch20-e14

Hard

In a real CDN, can we bound the "coded-caching regret" if we use LRU for admission and coded delivery after? Analyze.

Solution

Hybrid description

Admission: standard LRU (uncoordinated). Delivery: on miss, server picks best coded combinations over requesting users.

Best-case delivery

Coded delivery opportunities depend on cache overlap. LRU's random / per-user structure doesn't guarantee MAN's combinatorial overlap.

Regret

Coded gain reduced: effective $(1 + K\mu_\text{eff})$ where $\mu_\text{eff}$ is the overlap rate (not $\mu$ ). Typically $\mu_\text{eff} \ll \mu$ .

Recovery

Hybrid is $2-5\times$ better than uncoded LRU, but $5-10\times$ worse than coded MAN with proper coordinated placement.

ex-ch20-e15

Hard

State one important open problem in online coded caching.

Solution

Option A: Instance-dependent regret

Current bounds are worst-case. Can we tighten regret based on observed problem instance (e.g., "easy" traces)?

Option B: Coded + Hierarchical

Online coded caching in a hierarchical network (edge + regional caches). Each level has its own online dynamics. Joint algorithm unknown.

Option C: Fairness

Coded delivery favors users with overlapping demands. Users with rare requests may be systematically under-served. Fair online coded caching is open.

Exercises

ex-ch20-e01

Regret

Sublinear

ex-ch20-e02

Formula

Value

Per-round

ex-ch20-e03

LRU per-round cost

MAN per-round

Gap

Interpretation

ex-ch20-e04

Standard FTPL

Applied to CC

Approximation

ex-ch20-e05

Convergence rate

For 1% KL

Numerical

ex-ch20-e06

Setup

Taylor expansion

Concentration

Total

ex-ch20-e07

Adversarial

Stochastic

When stochastic applies

When adversarial applies

ex-ch20-e08

Piecewise-stationary model

Change-point detection

Total regret

Interpretation

ex-ch20-e09

Discrete action space

Noise for exploration

Regret analysis

ex-ch20-e10

Admission via LRU

Delivery via MAN on cached structure

Coordination complexity

Research status

ex-ch20-e11

Tradeoff

Optimal W

Resulting regret

ex-ch20-e12

CUSUM

Algorithmic integration

Regret benefit

ex-ch20-e13

Online + privacy

Composition

Privacy preserved

Regret bound

ex-ch20-e14

Hybrid description

Best-case delivery

Regret

Recovery

ex-ch20-e15

Option A: Instance-dependent regret

Option B: Coded + Hierarchical

Option C: Fairness