References & Further Reading

References

  1. K. Wan, D. Tuninetti, M. Ji, G. Caire, and P. Piantanida, A Framework for Coded Data Shuffling in Distributed ML Training, 2021

    The CommIT-group paper establishing the optimal rate-memory tradeoff for coded data shuffling. Headline reference for this chapter. Developed at TU Berlin with collaborators.

  2. K. Wan, D. Tuninetti, and G. Caire, Fundamental Limits of Caching for Demand Privacy Against Colluding Users, 2021

    Extension of coded caching to demand privacy — the predecessor of the PIR framework of Chapter 13. Cited in §7.4.2 of this chapter.

  3. M. A. Maddah-Ali and U. Niesen, Fundamental Limits of Caching, 2014

    The coded-caching precursor to the coded-shuffling framework. Chapter 7's construction mirrors this directly with minor adaptations.

  4. K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, Speeding Up Distributed Machine Learning Using Codes, 2018

    General reference for coded computing for ML straggler mitigation. Useful context for Chapter 7's specialization to the shuffling step.

  5. S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, A Fundamental Tradeoff Between Computation and Communication in Distributed Computing, 2018

    The MapReduce coded-shuffling result that inspired this chapter. Chapter 7 addresses a different operational problem (per-epoch ML data re-shuffling), but both rely on the same finite- field IA machinery.

  6. M. Gürbüzbalaban, A. Ozdaglar, and P. Parrilo, Why Random Reshuffling Beats Stochastic Gradient Descent, 2021

    The landmark paper proving that random reshuffling achieves $O(1/T)$ convergence vs. the $O(1/\\sqrt{T})$ of i.i.d. sampling. This is *why* shuffling matters enough to warrant a dedicated chapter.

  7. L. Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent, 2010

    Foundational SGD reference including random-reshuffling discussion. Useful for the convergence-rate context of §7.1.

  8. J. Jiang, B. Cui, C. Zhang, and L. Yu, Heterogeneity-aware Distributed Parameter Servers, 2018

    Systems-level perspective on distributed-ML communication costs, including shuffling. Useful for understanding why coded approaches are of practical interest at production scale.

  9. M. M. Amiri and D. Gündüz, Federated Learning over Wireless Fading Channels, 2020

    Applies coded-shuffling ideas to wireless federated learning. Relevant context for the demand-private extension of §7.4 and the FL chapters of Part III.

  10. Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, Entangled Polynomial Codes for Coded Distributed Matrix Multiplication, 2020

    Related coded-computing construction, useful for understanding how the polynomial-code framework of Chapter 5 composes with shuffling primitives.

  11. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]

    FedAvg paper — relevant because in federated learning the shuffling problem does *not* arise (each user cycles local data), providing contrast with the data-center training setting of this chapter.

  12. S. Li and A. S. Avestimehr, Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning, 2020

    Chapter on coded shuffling in this survey. Comprehensive secondary reference for the whole framework.

Further Reading

Resources for deepening understanding of coded data shuffling and its connections.

  • Decentralized coded caching, complete treatment

    Maddah-Ali & Niesen, *Decentralized Coded Caching Attains Order-Optimal Memory-Rate Tradeoff*, IEEE T-IT 2015

    Rigorously establishes the decentralized variant of coded caching — the closest parallel to §7.4.1's decentralized coded shuffling.

  • Heterogeneous coded caching

    Yu, Maddah-Ali, Avestimehr, *Characterizing the Rate-Memory Tradeoff in Cache Networks within a Factor of 2*, IEEE T-IT 2019

    The heterogeneous-memory open problem from §7.4 gets a partial answer here via a factor-of-2 approximation. Good starting point for extending to heterogeneous shuffling.

  • Coded distributed ML — unified view

    Book by Ramchandran & Avestimehr (in preparation), *Coded Machine Learning*, Cambridge UP

    Draft-level unified treatment of coded computing in ML, including coded shuffling, coded matrix multiplication, and coded gradient descent. Integrates the material of Chapters 5–7 of this book.

  • Random-reshuffling SGD convergence

    Mishchenko, Khaled, Richtárik, *Random Reshuffling: Simple Analysis with Vast Improvements*, NeurIPS 2020

    Tight convergence rates for random-reshuffling SGD on non-convex problems. Justifies the emphasis on proper shuffling in §7.1.