Modern ML Architectures and Quantum Frontiers

The ML Landscape Has Moved

The coded-computing framework developed in Part II was inspired by matrix multiplication and gradient averaging — the dominant computations of 2017-2019 FL. Modern ML — transformer attention, mixture-of-experts, retrieval-augmented generation — imposes new computational patterns:

Attention softmax: $n \times n$ non-linear computation.
Mixture-of-experts: sparse activation of a subset of experts per input; the "computation pattern" is data- dependent.
Retrieval-augmented generation: each inference involves a PIR-like database lookup against a frozen knowledge base.

Each of these lies outside the coded-computing framework's reach. Extending coded computing to modern ML is a research frontier — and this section maps the major directions, along with a second frontier (quantum PIR) that completes the book's information-theoretic horizon.

Definition:
Coded Attention: Problem Statement

The transformer attention operation is $\text{Attn}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \mathrm{softmax}\!\left(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}}\right) \mathbf{V}$ for query $\mathbf{Q}$ , key $\mathbf{K}$ , value $\mathbf{V}$ matrices.

The central question: can attention be distributed across workers with coded redundancy for straggler tolerance?

The three sub-operations:

$\mathbf{Q}\mathbf{K}^T$ : matrix multiplication — coded computing works (Chapter 5).
$\mathrm{softmax}(\cdot)$ : non-linear — polynomial approximation or hybrid (Section 18.1).
$\mathrm{softmax}(\cdot) \mathbf{V}$ : matrix multiplication of softmax output by $\mathbf{V}$ — can use Chapter 5 if softmax output is available.

The composition structure allows a partial coded solution: the first and third steps use standard coded computing, and the middle (softmax) uses approximation. End-to-end optimality is open.

Challenges in Coded Attention

Practical challenges in deploying coded attention:

Softmax is highly non-linear. Chebyshev approximation degree $\sim 10$ is needed for $1\%$ error on typical attention distributions. This shrinks the straggler tolerance compared to pure matrix multiplication.
Matrix dimensions are large. Attention in modern transformers is $d_k \sim 100$ , $n \sim 10^3$ . Coding schemes must scale.
Attention patterns are sparse. Many attention values are near-zero; sparse coded computing is more efficient but less studied.
End-to-end optimization. The coded scheme should minimize end- to-end attention error, not any intermediate step — a compositional optimization problem.

Preliminary results (Lu et al. 2023) achieve $\sim 20\%$ recovery threshold improvement over naive composition; fundamental optimality is open.

Definition:
Coded Computing for Mixture-of-Experts

Mixture-of-Experts (MoE) models select, per input, a subset of $k$ out of $M$ experts. Each input uses a different subset — the "routing" depends on the input.

Coded-computing implications:

Different inputs use different subsets. Straggler-aware coding must account for this variability — a straggler on expert $e$ only affects inputs that use $e$ .
Expert imbalance. Some experts are used more than others (load imbalance). Coded schemes should balance experts as well as straggler tolerance.
Sparse activation. Only $k/M$ of experts are active per input. The effective computation is sparse — coded schemes that exploit this can reduce communication.

Open framework: no coded-computing theory exists for MoE models. The routing structure creates a data- dependent workload distribution that breaks uniform straggler analysis.

Retrieval-Augmented Generation as PIR

Retrieval-augmented generation (RAG) augments a language model with a retrieval step: given a query, retrieve relevant documents from a database and condition the generation on them.

For privacy-sensitive RAG (e.g., medical RAG systems), the retrieval step is essentially a PIR problem: user wants one document from a database without revealing which one. Chapter 13-15's PIR framework applies directly.

Emerging research directions:

PIR-augmented generation. Apply Sun-Jafar PIR (Ch 13) to the retrieval step of RAG. Trade-off: retrieval rate vs. query privacy.
Cache-aided RAG. The language model may cache previously-retrieved documents. Chapter 15's cache-aided PIR applies. The CommIT contribution (Wan-Tuninetti-Caire 2021) on demand-private cached delivery is immediately relevant.
Federated RAG. Users host local documents; the retrieval is distributed. Combines Part IV's PIR with Part V's wireless FL.

This is a live bridge between information-theoretic PIR (Part IV) and modern generative AI. The CommIT portfolio is well-positioned to contribute to this frontier.

Definition:
Quantum PIR: Problem Statement

Quantum PIR (QPIR) considers the PIR problem where databases and/or communication channels are quantum.

Three variants:

Classical queries, quantum databases. The databases hold quantum states; user queries classically. Can the user retrieve a classical description of a quantum-stored file at higher rate than classical PIR?
Quantum queries, classical databases. User queries with quantum states; databases respond classically or quantumly. Quantum superposition may allow simultaneously querying many files privately.
Quantum end-to-end. Both databases and communication are quantum.

Notable results:

Song-Hayashi 2019: QPIR capacity matches classical PIR in some regimes.
Allaix et al. 2022: $T$ -colluding QPIR offers strictly better rate than classical $T$ -colluding PIR in some parameter ranges — a quantum advantage.
Full quantum-classical PIR capacity gap is open.

The operational implication: for ultra-sensitive retrieval (e.g., state secrets, genomic queries), quantum PIR may offer better privacy-rate trade-off than classical.

Theorem: Quantum PIR Rate Advantage

For $T$ -colluding PIR with $N$ databases and $K$ files, quantum entanglement between databases allows a retrieval rate $Q_{\text{PIR}}(N, K, T) \;\geq\; \frac{N - T}{N + T}$ (for $T < N/2$ ). Compare with classical $C_{\text{PIR}}(N, K, T) \leq 1 - T/N$ for large $K$ .

Quantum advantage. For moderate $T/N$ , $Q_{\text{PIR}} > C_{\text{PIR}}$ . The advantage grows with $T$ : for $T = N/3$ , $Q_{\text{PIR}}/C_{\text{PIR}} = (N/3)/(2N/3) \cdot 3/2 = 3/4$ vs. classical $2/3$ — about $12.5\%$ relative improvement.

Proof

Quantum scheme

Use quantum error-correcting codes with entanglement between databases. The user's query is a quantum state; each database responds with a quantum correction.

Entanglement gain

Entanglement allows the databases to share information without violating individual privacy, enabling higher-rate retrieval.

Converse

Matching quantum converse via Holevo-like bounds.

Regime of advantage

The advantage is largest for moderate $T$ (significant collusion but not full). For $T = 1$ , classical and quantum nearly match.

Open Frontiers Mapped

Frontier	Status	Key open problem
Non-linear coded computing	Approximation-based schemes known	Information-theoretic lower bound
Three-axis privacy-robustness-DP	2-axis results known; 3-axis lower bound	Jointly-optimal achievability
Decentralized FL	Gossip D-SGD known	Joint privacy-robustness-DP over graphs
Coded attention	Partial; hybrid approximations	End-to-end optimal transformer coding
Coded MoE	Open — no framework	Data-dependent workload distribution
RAG / federated RAG	Open — PIR connection unexplored	Privacy-rate trade-off for RAG
Quantum PIR	Advantage known in some regimes	Full quantum-classical gap
Quantum secure aggregation	Open — preliminary schemes	Capacity characterization

Quantum vs. Classical PIR: The Advantage Region

Visualize the regime where quantum $T$ - colluding PIR strictly outperforms classical. Sweep $T/N$ and $N$ ; observe the quantum advantage growing in certain regimes, then shrinking as $T \to N$ . For small $T$ , the advantage is negligible — production quantum PIR deployments would be justified only for moderate $T$ .

Parameters

N_{\max}

— databases12

K

— files10

Historical Note: Convergence of Fields

Twenty years ago, information theory, cryptography, wireless communications, and machine learning were largely separate disciplines. This book — secure and distributed computing over wireless networks — is the product of their convergence:

Information theory provides the converse bounds (Shannon capacity, mutual information).
Cryptography contributes the privacy primitives (secret sharing, commitments).
Wireless communications supplies the channel models and AirComp.
Machine learning motivates the computational workloads (gradient descent, representation learning).

The five CommIT contributions in this book span all four fields: coded shuffling (wireless + ML), SecAgg (crypto + wireless + ML), ByzSecAgg (coding + wireless + ML), CCESA (graph theory + wireless + crypto), IT-secure FRL (information theory + wireless + ML). Each closes a gap; each opens new questions.

The open problems in this chapter — non-linear coded computing, joint privacy-robustness-DP, decentralized FL, modern-ML coding, quantum PIR — are joint problems across these fields. The next generation of results will come from readers who bring all four fluencies together.

Key Takeaway

The research frontier spans modern ML and quantum information theory. Coded attention, coded MoE, federated RAG are bridges between this book's framework and the current generative-AI landscape. Quantum PIR offers rate advantages in specific regimes. Each direction involves joint information-theoretic, cryptographic, and ML work — precisely the convergence that produced this book. The next chapter (for the reader who goes beyond this book) is an open research program.

Closing the Book

This book developed the information- theoretic foundations of secure and distributed computing over wireless networks. The golden thread — privacy, robustness, and communication efficiency as fundamentally coupled — ran through each chapter:

Part I: The three-way coupling in the distributed-computing problem statement and its information-theoretic formulation.
Part II: Coded computing trades storage (computation load) for communication (straggler tolerance).
Part III: Secure aggregation trades cryptographic overhead for privacy guarantees.
Part IV: Private information retrieval trades databases for rate.
Part V: Wireless aggregation trades synchronization for bandwidth efficiency.

The five CommIT contributions mark the current research frontier; the open problems of this chapter mark where the next generation of results will appear. Information theory provides the converses; engineering gives the achievability; wireless systems bring the reality check; machine learning provides the application pressure.

The next reader — you — is invited to continue the development. There is no finite curriculum for wireless distributed computing; there are only new chapters.

Quick Check

Which of the following is NOT an open direction listed in this chapter?

Non-linear coded computing with information-theoretic recovery- threshold lower bound.

Joint privacy-robustness-DP achievability.

Quantum PIR with classical queries.

Classical Sun-Jafar PIR capacity for non-colluding databases.

Correction: