References & Further Reading
References
- K. Wan, D. Tuninetti, M. Ji, G. Caire, and P. Piantanida, A Framework for Coded Data Shuffling in Distributed ML Training, 2021
The CommIT-group paper establishing the optimal rate-memory tradeoff for coded data shuffling. Headline reference for this chapter. Developed at TU Berlin with collaborators.
- K. Wan, D. Tuninetti, and G. Caire, Fundamental Limits of Caching for Demand Privacy Against Colluding Users, 2021
Extension of coded caching to demand privacy — the predecessor of the PIR framework of Chapter 13. Cited in §7.4.2 of this chapter.
- M. A. Maddah-Ali and U. Niesen, Fundamental Limits of Caching, 2014
The coded-caching precursor to the coded-shuffling framework. Chapter 7's construction mirrors this directly with minor adaptations.
- K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, Speeding Up Distributed Machine Learning Using Codes, 2018
General reference for coded computing for ML straggler mitigation. Useful context for Chapter 7's specialization to the shuffling step.
- S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, A Fundamental Tradeoff Between Computation and Communication in Distributed Computing, 2018
The MapReduce coded-shuffling result that inspired this chapter. Chapter 7 addresses a different operational problem (per-epoch ML data re-shuffling), but both rely on the same finite- field IA machinery.
- M. Gürbüzbalaban, A. Ozdaglar, and P. Parrilo, Why Random Reshuffling Beats Stochastic Gradient Descent, 2021
The landmark paper proving that random reshuffling achieves $O(1/T)$ convergence vs. the $O(1/\\sqrt{T})$ of i.i.d. sampling. This is *why* shuffling matters enough to warrant a dedicated chapter.
- L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, 2010
Foundational SGD reference including random-reshuffling discussion. Useful for the convergence-rate context of §7.1.
- J. Jiang, B. Cui, C. Zhang, and L. Yu, Heterogeneity-aware Distributed Parameter Servers, 2018
Systems-level perspective on distributed-ML communication costs, including shuffling. Useful for understanding why coded approaches are of practical interest at production scale.
- M. M. Amiri and D. Gündüz, Federated Learning over Wireless Fading Channels, 2020
Applies coded-shuffling ideas to wireless federated learning. Relevant context for the demand-private extension of §7.4 and the FL chapters of Part III.
- Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, Entangled Polynomial Codes for Coded Distributed Matrix Multiplication, 2020
Related coded-computing construction, useful for understanding how the polynomial-code framework of Chapter 5 composes with shuffling primitives.
- H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]
FedAvg paper — relevant because in federated learning the shuffling problem does *not* arise (each user cycles local data), providing contrast with the data-center training setting of this chapter.
- S. Li and A. S. Avestimehr, Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning, 2020
Chapter on coded shuffling in this survey. Comprehensive secondary reference for the whole framework.
Further Reading
Resources for deepening understanding of coded data shuffling and its connections.
Decentralized coded caching, complete treatment
Maddah-Ali & Niesen, *Decentralized Coded Caching Attains Order-Optimal Memory-Rate Tradeoff*, IEEE T-IT 2015
Rigorously establishes the decentralized variant of coded caching — the closest parallel to §7.4.1's decentralized coded shuffling.
Heterogeneous coded caching
Yu, Maddah-Ali, Avestimehr, *Characterizing the Rate-Memory Tradeoff in Cache Networks within a Factor of 2*, IEEE T-IT 2019
The heterogeneous-memory open problem from §7.4 gets a partial answer here via a factor-of-2 approximation. Good starting point for extending to heterogeneous shuffling.
Coded distributed ML — unified view
Book by Ramchandran & Avestimehr (in preparation), *Coded Machine Learning*, Cambridge UP
Draft-level unified treatment of coded computing in ML, including coded shuffling, coded matrix multiplication, and coded gradient descent. Integrates the material of Chapters 5–7 of this book.
Random-reshuffling SGD convergence
Mishchenko, Khaled, Richtárik, *Random Reshuffling: Simple Analysis with Vast Improvements*, NeurIPS 2020
Tight convergence rates for random-reshuffling SGD on non-convex problems. Justifies the emphasis on proper shuffling in §7.1.