Modern ML Architectures and Quantum Frontiers
The ML Landscape Has Moved
The coded-computing framework developed in Part II was inspired by matrix multiplication and gradient averaging — the dominant computations of 2017-2019 FL. Modern ML — transformer attention, mixture-of-experts, retrieval-augmented generation — imposes new computational patterns:
- Attention softmax: non-linear computation.
- Mixture-of-experts: sparse activation of a subset of experts per input; the "computation pattern" is data- dependent.
- Retrieval-augmented generation: each inference involves a PIR-like database lookup against a frozen knowledge base.
Each of these lies outside the coded-computing framework's reach. Extending coded computing to modern ML is a research frontier — and this section maps the major directions, along with a second frontier (quantum PIR) that completes the book's information-theoretic horizon.
Definition: Coded Attention: Problem Statement
Coded Attention: Problem Statement
The transformer attention operation is for query , key , value matrices.
The central question: can attention be distributed across workers with coded redundancy for straggler tolerance?
The three sub-operations:
-
: matrix multiplication — coded computing works (Chapter 5).
-
: non-linear — polynomial approximation or hybrid (Section 18.1).
-
: matrix multiplication of softmax output by — can use Chapter 5 if softmax output is available.
The composition structure allows a partial coded solution: the first and third steps use standard coded computing, and the middle (softmax) uses approximation. End-to-end optimality is open.
Challenges in Coded Attention
Practical challenges in deploying coded attention:
-
Softmax is highly non-linear. Chebyshev approximation degree is needed for error on typical attention distributions. This shrinks the straggler tolerance compared to pure matrix multiplication.
-
Matrix dimensions are large. Attention in modern transformers is , . Coding schemes must scale.
-
Attention patterns are sparse. Many attention values are near-zero; sparse coded computing is more efficient but less studied.
-
End-to-end optimization. The coded scheme should minimize end- to-end attention error, not any intermediate step — a compositional optimization problem.
Preliminary results (Lu et al. 2023) achieve recovery threshold improvement over naive composition; fundamental optimality is open.
Definition: Coded Computing for Mixture-of-Experts
Coded Computing for Mixture-of-Experts
Mixture-of-Experts (MoE) models select, per input, a subset of out of experts. Each input uses a different subset — the "routing" depends on the input.
Coded-computing implications:
-
Different inputs use different subsets. Straggler-aware coding must account for this variability — a straggler on expert only affects inputs that use .
-
Expert imbalance. Some experts are used more than others (load imbalance). Coded schemes should balance experts as well as straggler tolerance.
-
Sparse activation. Only of experts are active per input. The effective computation is sparse — coded schemes that exploit this can reduce communication.
Open framework: no coded-computing theory exists for MoE models. The routing structure creates a data- dependent workload distribution that breaks uniform straggler analysis.
Retrieval-Augmented Generation as PIR
Retrieval-augmented generation (RAG) augments a language model with a retrieval step: given a query, retrieve relevant documents from a database and condition the generation on them.
For privacy-sensitive RAG (e.g., medical RAG systems), the retrieval step is essentially a PIR problem: user wants one document from a database without revealing which one. Chapter 13-15's PIR framework applies directly.
Emerging research directions:
-
PIR-augmented generation. Apply Sun-Jafar PIR (Ch 13) to the retrieval step of RAG. Trade-off: retrieval rate vs. query privacy.
-
Cache-aided RAG. The language model may cache previously-retrieved documents. Chapter 15's cache-aided PIR applies. The CommIT contribution (Wan-Tuninetti-Caire 2021) on demand-private cached delivery is immediately relevant.
-
Federated RAG. Users host local documents; the retrieval is distributed. Combines Part IV's PIR with Part V's wireless FL.
This is a live bridge between information-theoretic PIR (Part IV) and modern generative AI. The CommIT portfolio is well-positioned to contribute to this frontier.
Definition: Quantum PIR: Problem Statement
Quantum PIR: Problem Statement
Quantum PIR (QPIR) considers the PIR problem where databases and/or communication channels are quantum.
Three variants:
-
Classical queries, quantum databases. The databases hold quantum states; user queries classically. Can the user retrieve a classical description of a quantum-stored file at higher rate than classical PIR?
-
Quantum queries, classical databases. User queries with quantum states; databases respond classically or quantumly. Quantum superposition may allow simultaneously querying many files privately.
-
Quantum end-to-end. Both databases and communication are quantum.
Notable results:
- Song-Hayashi 2019: QPIR capacity matches classical PIR in some regimes.
- Allaix et al. 2022: -colluding QPIR offers strictly better rate than classical -colluding PIR in some parameter ranges — a quantum advantage.
- Full quantum-classical PIR capacity gap is open.
The operational implication: for ultra-sensitive retrieval (e.g., state secrets, genomic queries), quantum PIR may offer better privacy-rate trade-off than classical.
Theorem: Quantum PIR Rate Advantage
For -colluding PIR with databases and files, quantum entanglement between databases allows a retrieval rate (for ). Compare with classical for large .
Quantum advantage. For moderate , . The advantage grows with : for , vs. classical — about relative improvement.
Quantum scheme
Use quantum error-correcting codes with entanglement between databases. The user's query is a quantum state; each database responds with a quantum correction.
Entanglement gain
Entanglement allows the databases to share information without violating individual privacy, enabling higher-rate retrieval.
Converse
Matching quantum converse via Holevo-like bounds.
Regime of advantage
The advantage is largest for moderate (significant collusion but not full). For , classical and quantum nearly match.
Open Frontiers Mapped
| Frontier | Status | Key open problem |
|---|---|---|
| Non-linear coded computing | Approximation-based schemes known | Information-theoretic lower bound |
| Three-axis privacy-robustness-DP | 2-axis results known; 3-axis lower bound | Jointly-optimal achievability |
| Decentralized FL | Gossip D-SGD known | Joint privacy-robustness-DP over graphs |
| Coded attention | Partial; hybrid approximations | End-to-end optimal transformer coding |
| Coded MoE | Open — no framework | Data-dependent workload distribution |
| RAG / federated RAG | Open — PIR connection unexplored | Privacy-rate trade-off for RAG |
| Quantum PIR | Advantage known in some regimes | Full quantum-classical gap |
| Quantum secure aggregation | Open — preliminary schemes | Capacity characterization |
Quantum vs. Classical PIR: The Advantage Region
Visualize the regime where quantum - colluding PIR strictly outperforms classical. Sweep and ; observe the quantum advantage growing in certain regimes, then shrinking as . For small , the advantage is negligible — production quantum PIR deployments would be justified only for moderate .
Parameters
Historical Note: Convergence of Fields
Twenty years ago, information theory, cryptography, wireless communications, and machine learning were largely separate disciplines. This book — secure and distributed computing over wireless networks — is the product of their convergence:
- Information theory provides the converse bounds (Shannon capacity, mutual information).
- Cryptography contributes the privacy primitives (secret sharing, commitments).
- Wireless communications supplies the channel models and AirComp.
- Machine learning motivates the computational workloads (gradient descent, representation learning).
The five CommIT contributions in this book span all four fields: coded shuffling (wireless + ML), SecAgg (crypto + wireless + ML), ByzSecAgg (coding + wireless + ML), CCESA (graph theory + wireless + crypto), IT-secure FRL (information theory + wireless + ML). Each closes a gap; each opens new questions.
The open problems in this chapter — non-linear coded computing, joint privacy-robustness-DP, decentralized FL, modern-ML coding, quantum PIR — are joint problems across these fields. The next generation of results will come from readers who bring all four fluencies together.
Key Takeaway
The research frontier spans modern ML and quantum information theory. Coded attention, coded MoE, federated RAG are bridges between this book's framework and the current generative-AI landscape. Quantum PIR offers rate advantages in specific regimes. Each direction involves joint information-theoretic, cryptographic, and ML work — precisely the convergence that produced this book. The next chapter (for the reader who goes beyond this book) is an open research program.
Closing the Book
This book developed the information- theoretic foundations of secure and distributed computing over wireless networks. The golden thread — privacy, robustness, and communication efficiency as fundamentally coupled — ran through each chapter:
- Part I: The three-way coupling in the distributed-computing problem statement and its information-theoretic formulation.
- Part II: Coded computing trades storage (computation load) for communication (straggler tolerance).
- Part III: Secure aggregation trades cryptographic overhead for privacy guarantees.
- Part IV: Private information retrieval trades databases for rate.
- Part V: Wireless aggregation trades synchronization for bandwidth efficiency.
The five CommIT contributions mark the current research frontier; the open problems of this chapter mark where the next generation of results will appear. Information theory provides the converses; engineering gives the achievability; wireless systems bring the reality check; machine learning provides the application pressure.
The next reader — you — is invited to continue the development. There is no finite curriculum for wireless distributed computing; there are only new chapters.
Quick Check
Which of the following is NOT an open direction listed in this chapter?
Non-linear coded computing with information-theoretic recovery- threshold lower bound.
Joint privacy-robustness-DP achievability.
Quantum PIR with classical queries.
Classical Sun-Jafar PIR capacity for non-colluding databases.
This was settled in Chapter 13 by Sun-Jafar 2017 — it is not an open problem but rather a resolved classical result.