Chapter Summary

Key Points

1.
Data shuffling in distributed ML is a large communication cost: each worker receives a new random dataset subset per epoch, sending / receiving large data volumes across the cluster.
2.
Wan-Tuninetti-Caire (2020) CommIT result: coded shuffling reduces per-epoch communication from $K(1-s)D$ to $K(1-s)D/(1+Ks)$ — factor $1+Ks$ improvement.
3.
Coded-caching analogy. Worker memory = cache, new assignment = demand, shuffling = delivery. The MAN-style XOR messages simultaneously shuffle data for $t+1$ workers, where $t = Ks$ .
4.
Practical impact at scale. For $K = 100$ , $s = 0.1$ : 10-fold reduction in shuffling bandwidth. For hyperscale (1000 GPUs, PB datasets): billions of dollars per year in saved inter-DC bandwidth.
5.
Gradient coding. Redundant data storage per worker (factor $r+1$ ) tolerates $r$ stragglers. Different coded-computing primitive; storage-for-reliability tradeoff.
6.
Coded computing umbrella. Coded shuffling, gradient coding, coded matrix multiplication, coded MapReduce — all share the theme that memory/storage can replace communication or recomputation. CommIT framework unifies them.
7.
Deployment reality. Production ML lags theory; gradient coding and coded shuffling are research-stage. Practical adoption awaits cluster-scale integration of coded communication primitives.

Looking Ahead

Chapters 16-18 cover additional extensions — coded computing in detail, secure delivery, multi-access networks. Chapters 19-22 move to research frontiers: ISAC, online coded caching, video streaming, and open problems. This completes the tour of coded caching from foundational MAN theory to practical ML-cluster deployments — a journey through 10 years of CommIT research and its impact on the broader information-theory + computer-science communities.

Deployment in Distributed ML Systems Exercises