The Bonawitz Pairwise-Masking Protocol

Pairwise Masks That Cancel in the Aggregate

Section 10.1 established that secure aggregation requires masking each user's gradient before transmission. The question is: which masks? A naive approach — each user adds independent random noise — destroys the aggregate (the server gets G+sum of independent noise\mathbf{G} + \text{sum of independent noise}, not G\mathbf{G}). A correct approach must arrange that the masks cancel when the server sums across users.

Bonawitz et al.'s key insight: for every pair of users (i,j)(i, j), add a common pairwise mask rij\mathbf{r}_{ij} to one and subtract it from the other. Across the pair, the mask vanishes. The server, summing over all users, gets back the aggregate exactly. Each user's upload looks uniform random to the server; no individual gradient is distinguishable. This is the pairwise-masking construction, and it is the foundation of every modern secure-aggregation protocol.

The point is that this is the (n,n)(n, n)-additive secret-sharing primitive from Chapter 3 §3.1, now applied to high-dimensional gradient vectors. The construction is simple; the subtlety is in efficient key distribution (via Diffie–Hellman) and dropout handling (via Shamir shares of the seeds). Section 10.2 develops the construction; §10.3 handles dropouts; §10.4 proves optimality.

Definition:

Pairwise-Masking Secure Aggregation

The pairwise-masking secure-aggregation protocol for nn users and one server operates over a finite field Fq\mathbb{F}_q (typically q=2256q = 2^{256} for cryptographic strength). For every ordered pair (i,j)(i, j) with i<ji < j, the users share a pseudorandom mask seed rijFqd\mathbf{r}_{ij} \in \mathbb{F}_q^d generated via a Diffie–Hellman key exchange. We set rji=rij\mathbf{r}_{ji} = -\mathbf{r}_{ij} (antisymmetric).

Each user kk uploads g~k  =  gk  +  jkrkj    Fqd.\tilde{\mathbf{g}}_k \;=\; \mathbf{g}_k \;+\; \sum_{j \neq k} \mathbf{r}_{kj} \;\in\; \mathbb{F}_q^d. The server aggregates: k=1ng~k  =  kgk  +  kjkrkj  =  G  +  i<j(rij+rji)  =  G  +  0  =  G.\sum_{k=1}^n \tilde{\mathbf{g}}_k \;=\; \sum_k \mathbf{g}_k \;+\; \sum_k \sum_{j \neq k} \mathbf{r}_{kj} \;=\; \mathbf{G} \;+\; \sum_{i < j} (\mathbf{r}_{ij} + \mathbf{r}_{ji}) \;=\; \mathbf{G} \;+\; \mathbf{0} \;=\; \mathbf{G}. The masks cancel in pairs (antisymmetry), so the server receives the aggregate exactly. Each individual upload g~k\tilde{\mathbf{g}}_k is uniform over Fqd\mathbb{F}_q^d and reveals nothing about gk\mathbf{g}_k to any coalition of size n1\leq n - 1 (the mask-seeds are shared only between the two users involved, not with the server).

The construction is a direct generalization of the (n,n)(n, n)-additive secret-sharing scheme from Chapter 3: each user's gradient is split into a "masked part" that the server sees and "mask contributions" that cancel in the aggregate. The protocol's privacy against the server depends on the users' mutual independence (via DH key exchange); privacy against colluding users requires additional machinery (§10.3's Shamir-shared seeds).

Pairwise Masking

The secure-aggregation primitive in which each pair of users shares a random mask, one user adds it to their upload and the other subtracts. Pairwise masks cancel when summed over all users, revealing only the aggregate to the server.

Mask Seed rij\mathbf{r}_{ij}

A pseudorandom vector shared between a pair of users (i,j)(i, j), typically derived from a Diffie–Hellman key exchange. The pair's outputs are +mathbfrij+\\mathbf{r}_{ij} at user ii and mathbfrij-\\mathbf{r}_{ij} at user jj; the masks cancel in the aggregate.

Bonawitz et al. Secure Aggregation (Basic Version)

Complexity: Per-round communication per user: O(n)O(n) DH exchanges + dd scalars uplink. Aggregate per-round: O(n2)O(n^2) messages for DH + O(nd)O(n d) uplink.
Setup (one-time): Public parameters (g,p)(g, p) for
Diffie–Hellman over Fp\mathbb{F}_p^*. Each user kk
generates a DH key pair (skk,pkk)(\text{sk}_k, \text{pk}_k).
Round tt:
1. Public-key distribution. Users broadcast their
public keys {pkk}\{\text{pk}_k\} to the server. Server
forwards the full key list to all users.
2. Pairwise seed derivation. Each user kk, for each
other user jj, computes the shared DH secret
skj=HKDF(pkjskk)s_{kj} = \text{HKDF}(\text{pk}_j^{\text{sk}_k}) and
derives the mask seed
rkjFqd\mathbf{r}_{kj} \in \mathbb{F}_q^d via a PRG
seeded by skjs_{kj}. Set
rkj=rjk\mathbf{r}_{kj} = -\mathbf{r}_{jk} by sign
convention (e.g., ++ if k<jk < j, - otherwise).
3. Masking. User kk computes
g~k=gk+jksign(k,j)rkj\tilde{\mathbf{g}}_k = \mathbf{g}_k + \sum_{j \neq k} \text{sign}(k, j) \mathbf{r}_{kj}.
4. Upload. User kk sends g~k\tilde{\mathbf{g}}_k to
the server.
5. Aggregation. Server computes
G=kg~k\mathbf{G} = \sum_k \tilde{\mathbf{g}}_k.
6. Verification (optional). Server returns
G\mathbf{G} to users, who may verify via local
signed hash checks.

The basic version has a strict honest-but-curious assumption. Any user-dropout handling requires additional machinery — the topic of §10.3.

Theorem: Bonawitz Protocol: Correctness and Privacy

Assume the Diffie–Hellman problem is computationally hard and the pseudorandom generator used for mask derivation is a secure PRG (standard cryptographic assumptions). Then the Bonawitz secure-aggregation protocol satisfies:

  1. Correctness. The server receives the aggregate G=kgk\mathbf{G} = \sum_k \mathbf{g}_k exactly.

  2. Privacy (against honest-but-curious server). Given the uploads {g~k}\{\tilde{\mathbf{g}}_k\} and the server's protocol-specified view, the joint distribution of (g1,,gn)(\mathbf{g}_1, \ldots, \mathbf{g}_n) given the server's view is computationally indistinguishable from the uniform distribution conditioned on kgk=G\sum_k \mathbf{g}_k = \mathbf{G}.

  3. Privacy (against the server + up to Tn1T \leq n - 1 colluding users, informal). Similar guarantee holds against any coalition of Tn1T \leq n - 1 users; the precise bound is established with Shamir shares of the seeds (§10.3).

The protocol's privacy is computational, not information-theoretic, because it relies on DH and PRG hardness. In an information-theoretic model with uniform random mask seeds shared out of band, the privacy is IT-tight.

The key property: each user's upload is a sum of the gradient plus n1n - 1 pairwise masks. Of these, at most n1n - 1 are "unknown" to the server (no seeds shared with server). The mask sum is uniform modulo the pairwise-cancellation constraint, so the upload is indistinguishable from uniform to the server — except for the aggregate constraint, which is exactly the allowed leakage.

Operationally: Bonawitz's protocol reveals the aggregate and only the aggregate to the server, achieving the threat-model goal of §10.1.

Example: Bonawitz Protocol with n=4n = 4 Users

Illustrate the Bonawitz protocol with n=4n = 4 users. Specify the pairwise mask structure, each user's upload, and verify that the server's aggregate matches the true sum.

The O(n2)O(n^2) Communication Overhead

The Bonawitz protocol requires each user to share DH keys and derive mask seeds with every other userO(n)O(n) DH exchanges per user, O(n2)O(n^2) total. For moderate nn (say, n=100n = 100), this is 10410^4 DH computations per round, each taking a few milliseconds — manageable but not small. For large nn (n104n \geq 10^4), the overhead becomes prohibitive: 10810^8 DH exchanges per round, several minutes of per-round DH overhead on commodity hardware.

This O(n2)O(n^2) scaling is the structural cost of pairwise masking. CCESA (Chapter 12, another CommIT contribution) reduces it to O(nn/logn)O(n\sqrt{n/\log n}) by using a sparse random graph of pairwise masks instead of the complete graph. §10.4 (Caire et al.) proves that the O(n2)O(n^2) bound is tight within the class of uncoded groupwise-key schemes; CCESA's improvement comes from leaving this class.

Bonawitz Protocol Communication Overhead

Plot the per-round communication overhead of Bonawitz's protocol as a function of the number of users nn, for several model sizes dd. The total communication scales as O(n2)O(n^2) for the pairwise DH exchanges + O(nd)O(n d) for the gradient uploads. The n2n^2 scaling dominates for small models and large user populations.

Parameters
500

Range of users to plot

10_000_000

Number of parameters

Pairwise Masking: Masks That Cancel

Animation of the pairwise-masking construction: each pair of users adds and subtracts a common random mask. The masks visibly cancel when the server sums all uploads, revealing only the aggregate gradient.
⚠️Engineering Note

Bonawitz in Google Gboard

The Bonawitz protocol is deployed in Google's Gboard federated-learning system. Production specifics:

  • User count per round: n[50,500]n \in [50, 500] (constrained by O(n2)O(n^2) overhead).
  • DH exchange: P-256 elliptic curve, 100μ\sim 100 \mus per exchange.
  • Mask derivation: AES-based PRG with 128-bit seed, 10\sim 10 MB/s per-device throughput.
  • Per-round latency: 5\sim 52020 seconds for mask generation + upload.
  • Dropout handling: Shamir secret sharing of seeds (§10.3), threshold t=2n/3t = \lceil 2n/3\rceil.

The n500n \leq 500 ceiling reflects the O(n2)O(n^2) overhead. For larger populations, CCESA (Chapter 12) is the standard alternative.

Practical Constraints
  • Production nn: up to 500\sim 500 users per round

  • DH exchange cost: 100μ100\mus on mobile

  • Per-round overhead: 5–20 seconds added to SGD round

  • Threshold: t=2n/3t = \lceil 2n/3 \rceil for dropout tolerance

📋 Ref: Google Gboard FL deployment; Bonawitz et al. 2019

Common Mistake: Masks Must Cancel, Not Just Be Independent

Mistake:

Use independent random masks per user without the pairwise-cancellation structure.

Correction:

Adding independent random noise per user gives kg~k=G+kηk\sum_k \tilde{\mathbf{g}}_k = \mathbf{G} + \sum_k \boldsymbol{\eta}_k — the noise accumulates, and the server gets a noisy aggregate, not the true one. The pairwise-masking construction ensures exact cancellation. This is a common confusion when students first encounter the protocol; remember that the antisymmetry rkj=rjk\mathbf{r}_{kj} = -\mathbf{r}_{jk} is structural, not incidental.

Historical Note: The Bonawitz et al. Protocol

2017–present

Kallista Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth introduced the pairwise-masking secure-aggregation protocol at ACM CCS 2017. The paper combined several ideas from cryptography — Diffie–Hellman key exchange, Shamir secret sharing (Chapter 3), pseudorandom generators — into a single practical FL protocol. The work was motivated directly by the production need at Google Gboard: millions of users, but strong privacy guarantees.

The protocol has since become the de facto standard for production secure aggregation. Extensions and improvements — including the CommIT-group contributions of §10.4 (Caire et al. optimality), Chapter 11 (ByzSecAgg), and Chapter 12 (CCESA) — all build on the Bonawitz foundation.

Key Takeaway

Pairwise masking achieves I(gk;server view)=I(gk;G)I(\mathbf{g}_k; \text{server view}) = I(\mathbf{g}_k; \mathbf{G}) — only the aggregate leaks. The (n,n)(n, n)-additive secret-sharing primitive of Chapter 3 is applied at high dimension via Diffie–Hellman-derived pairwise seeds. The cost is O(n2)O(n^2) key exchanges per round. Section 10.3 handles dropouts; §10.4 proves the communication cost is information-theoretically tight.

Quick Check

In the Bonawitz protocol, why do the pairwise masks rij\mathbf{r}_{ij} cancel when the server aggregates?

Because the server subtracts them explicitly after receiving all uploads.

By antisymmetry: user ii adds +mathbfrij+\\mathbf{r}_{ij} and user jj adds mathbfrij-\\mathbf{r}_{ij}, so the pair sums to zero.

The masks are independent random variables, so their sum concentrates around zero by the law of large numbers.

Because they are derived from the DH key exchange, which guarantees cancellation.