References & Further Reading

References

K. Wan, D. Tuninetti, M. Ji, G. Caire, and P. Piantanida, A Framework for Coded Data Shuffling in Distributed ML Training, 2021
The CommIT-group paper establishing the optimal rate-memory tradeoff for coded data shuffling. Headline reference for this chapter. Developed at TU Berlin with collaborators.
K. Wan, D. Tuninetti, and G. Caire, Fundamental Limits of Caching for Demand Privacy Against Colluding Users, 2021
Extension of coded caching to demand privacy — the predecessor of the PIR framework of Chapter 13. Cited in §7.4.2 of this chapter.
M. A. Maddah-Ali and U. Niesen, Fundamental Limits of Caching, 2014
The coded-caching precursor to the coded-shuffling framework. Chapter 7's construction mirrors this directly with minor adaptations.
K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, Speeding Up Distributed Machine Learning Using Codes, 2018
General reference for coded computing for ML straggler mitigation. Useful context for Chapter 7's specialization to the shuffling step.
S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, A Fundamental Tradeoff Between Computation and Communication in Distributed Computing, 2018
The MapReduce coded-shuffling result that inspired this chapter. Chapter 7 addresses a different operational problem (per-epoch ML data re-shuffling), but both rely on the same finite- field IA machinery.
M. Gürbüzbalaban, A. Ozdaglar, and P. Parrilo, Why Random Reshuffling Beats Stochastic Gradient Descent, 2021
The landmark paper proving that random reshuffling achieves $O(1/T)$ convergence vs. the $O(1/\\sqrt{T})$ of i.i.d. sampling. This is *why* shuffling matters enough to warrant a dedicated chapter.
L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, 2010
Foundational SGD reference including random-reshuffling discussion. Useful for the convergence-rate context of §7.1.
J. Jiang, B. Cui, C. Zhang, and L. Yu, Heterogeneity-aware Distributed Parameter Servers, 2018
Systems-level perspective on distributed-ML communication costs, including shuffling. Useful for understanding why coded approaches are of practical interest at production scale.
M. M. Amiri and D. Gündüz, Federated Learning over Wireless Fading Channels, 2020
Applies coded-shuffling ideas to wireless federated learning. Relevant context for the demand-private extension of §7.4 and the FL chapters of Part III.
Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, Entangled Polynomial Codes for Coded Distributed Matrix Multiplication, 2020
Related coded-computing construction, useful for understanding how the polynomial-code framework of Chapter 5 composes with shuffling primitives.
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]
FedAvg paper — relevant because in federated learning the shuffling problem does *not* arise (each user cycles local data), providing contrast with the data-center training setting of this chapter.
S. Li and A. S. Avestimehr, Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning, 2020
Chapter on coded shuffling in this survey. Comprehensive secondary reference for the whole framework.