Chapter Summary
Chapter Summary
Key Points
- 1.
Data shuffling is an information-theoretic coding problem. SGD convergence theory requires random reshuffling of data across epochs; in distributed training, this induces a per-epoch communication cost that matches or exceeds the gradient-aggregation cost. The problem is formalized in §7.2 with the placement-delivery structure of coded caching.
- 2.
Optimal shuffling rate is , achieved by the Wan / Tuninetti / Caire CommIT-group contribution. The construction uses finite-field interference alignment (Chapter 4) in the delivery phase, XOR-combining per-subset demands into single broadcasts. Each unit of per-worker memory reduces communication by factor .
- 3.
Achievability matches the cut-set converse. Both the upper bound (construction) and the lower bound (cut-set argument) give , closing the rate region of the data-shuffling problem.
- 4.
The framework extends to decentralized placement, demand privacy, and heterogeneous memory. Decentralized placement (random caching without coordination) is asymptotically optimal. Demand privacy adds a ramp-secret-sharing layer at a quantified rate penalty. Heterogeneous memory is the open direction.
- 5.
Demand privacy is the bridge to PIR. The demand-private shuffling rate quantifies the cost of hiding workers' demands from colluders. This is the algebraic predecessor of the PIR capacity formula that Chapter 13 develops in full.
Looking Ahead
Chapter 8 generalizes coded matrix multiplication (Chapter 5) and coded gradient (Chapter 6) to convolutions, matrix-vector products, and general tensor operations — the Lagrange coded computing framework. The data-shuffling story concludes here; Part III (Chapter 9 onward) pivots to privacy-preserving federated learning. The CommIT contributions continue: Chapters 10 and 11 carry two more major CommIT results (Caire et al. secure aggregation optimality and Jahani-Nezhad / Maddah-Ali / Caire ByzSecAgg).