The Bonawitz Pairwise-Masking Protocol
Pairwise Masks That Cancel in the Aggregate
Section 10.1 established that secure aggregation requires masking each user's gradient before transmission. The question is: which masks? A naive approach — each user adds independent random noise — destroys the aggregate (the server gets , not ). A correct approach must arrange that the masks cancel when the server sums across users.
Bonawitz et al.'s key insight: for every pair of users , add a common pairwise mask to one and subtract it from the other. Across the pair, the mask vanishes. The server, summing over all users, gets back the aggregate exactly. Each user's upload looks uniform random to the server; no individual gradient is distinguishable. This is the pairwise-masking construction, and it is the foundation of every modern secure-aggregation protocol.
The point is that this is the -additive secret-sharing primitive from Chapter 3 §3.1, now applied to high-dimensional gradient vectors. The construction is simple; the subtlety is in efficient key distribution (via Diffie–Hellman) and dropout handling (via Shamir shares of the seeds). Section 10.2 develops the construction; §10.3 handles dropouts; §10.4 proves optimality.
Definition: Pairwise-Masking Secure Aggregation
Pairwise-Masking Secure Aggregation
The pairwise-masking secure-aggregation protocol for users and one server operates over a finite field (typically for cryptographic strength). For every ordered pair with , the users share a pseudorandom mask seed generated via a Diffie–Hellman key exchange. We set (antisymmetric).
Each user uploads The server aggregates: The masks cancel in pairs (antisymmetry), so the server receives the aggregate exactly. Each individual upload is uniform over and reveals nothing about to any coalition of size (the mask-seeds are shared only between the two users involved, not with the server).
The construction is a direct generalization of the -additive secret-sharing scheme from Chapter 3: each user's gradient is split into a "masked part" that the server sees and "mask contributions" that cancel in the aggregate. The protocol's privacy against the server depends on the users' mutual independence (via DH key exchange); privacy against colluding users requires additional machinery (§10.3's Shamir-shared seeds).
Pairwise Masking
The secure-aggregation primitive in which each pair of users shares a random mask, one user adds it to their upload and the other subtracts. Pairwise masks cancel when summed over all users, revealing only the aggregate to the server.
Mask Seed
A pseudorandom vector shared between a pair of users , typically derived from a Diffie–Hellman key exchange. The pair's outputs are at user and at user ; the masks cancel in the aggregate.
Bonawitz et al. Secure Aggregation (Basic Version)
Complexity: Per-round communication per user: DH exchanges + scalars uplink. Aggregate per-round: messages for DH + uplink.The basic version has a strict honest-but-curious assumption. Any user-dropout handling requires additional machinery — the topic of §10.3.
Theorem: Bonawitz Protocol: Correctness and Privacy
Assume the Diffie–Hellman problem is computationally hard and the pseudorandom generator used for mask derivation is a secure PRG (standard cryptographic assumptions). Then the Bonawitz secure-aggregation protocol satisfies:
-
Correctness. The server receives the aggregate exactly.
-
Privacy (against honest-but-curious server). Given the uploads and the server's protocol-specified view, the joint distribution of given the server's view is computationally indistinguishable from the uniform distribution conditioned on .
-
Privacy (against the server + up to colluding users, informal). Similar guarantee holds against any coalition of users; the precise bound is established with Shamir shares of the seeds (§10.3).
The protocol's privacy is computational, not information-theoretic, because it relies on DH and PRG hardness. In an information-theoretic model with uniform random mask seeds shared out of band, the privacy is IT-tight.
The key property: each user's upload is a sum of the gradient plus pairwise masks. Of these, at most are "unknown" to the server (no seeds shared with server). The mask sum is uniform modulo the pairwise-cancellation constraint, so the upload is indistinguishable from uniform to the server — except for the aggregate constraint, which is exactly the allowed leakage.
Operationally: Bonawitz's protocol reveals the aggregate and only the aggregate to the server, achieving the threat-model goal of §10.1.
Correctness via mask-cancellation
. By antisymmetry , every pair contributes . Sum over pairs: zero. Hence .
Privacy: the ideal case
In an ideal (information-theoretic) version, the pairwise seeds are drawn independently uniform over . The server sees uploads . Fixing the aggregate , the joint distribution of given is uniform over . This reveals nothing about individual beyond the aggregate constraint.
Reduction to DH / PRG
In the Bonawitz protocol, the seeds are not uniform random but derived from DH via a PRG. Under the DDH assumption (Decisional Diffie–Hellman), the PRG output is computationally indistinguishable from uniform. Hence the protocol's privacy is computationally indistinguishable from the ideal case — giving the claimed guarantee.
Example: Bonawitz Protocol with Users
Illustrate the Bonawitz protocol with users. Specify the pairwise mask structure, each user's upload, and verify that the server's aggregate matches the true sum.
Pairwise mask structure
Six pairs: . Seeds shared via DH. Antisymmetry: .
User uploads
User 1: . User 2: . User 3: . User 4: .
Server aggregates
. All pairwise masks cancel. ✓
Privacy check
Server sees 4 uploads. Each is the sum of and 3 independent pairwise masks. Since the masks are uniform (via DH+PRG), each upload is uniform modulo the aggregate constraint. The server learns and cannot distinguish individual gradients.
The Communication Overhead
The Bonawitz protocol requires each user to share DH keys and derive mask seeds with every other user — DH exchanges per user, total. For moderate (say, ), this is DH computations per round, each taking a few milliseconds — manageable but not small. For large (), the overhead becomes prohibitive: DH exchanges per round, several minutes of per-round DH overhead on commodity hardware.
This scaling is the structural cost of pairwise masking. CCESA (Chapter 12, another CommIT contribution) reduces it to by using a sparse random graph of pairwise masks instead of the complete graph. §10.4 (Caire et al.) proves that the bound is tight within the class of uncoded groupwise-key schemes; CCESA's improvement comes from leaving this class.
Bonawitz Protocol Communication Overhead
Plot the per-round communication overhead of Bonawitz's protocol as a function of the number of users , for several model sizes . The total communication scales as for the pairwise DH exchanges + for the gradient uploads. The scaling dominates for small models and large user populations.
Parameters
Range of users to plot
Number of parameters
Pairwise Masking: Masks That Cancel
Bonawitz in Google Gboard
The Bonawitz protocol is deployed in Google's Gboard federated-learning system. Production specifics:
- User count per round: (constrained by overhead).
- DH exchange: P-256 elliptic curve, s per exchange.
- Mask derivation: AES-based PRG with 128-bit seed, MB/s per-device throughput.
- Per-round latency: – seconds for mask generation + upload.
- Dropout handling: Shamir secret sharing of seeds (§10.3), threshold .
The ceiling reflects the overhead. For larger populations, CCESA (Chapter 12) is the standard alternative.
- •
Production : up to users per round
- •
DH exchange cost: s on mobile
- •
Per-round overhead: 5–20 seconds added to SGD round
- •
Threshold: for dropout tolerance
Common Mistake: Masks Must Cancel, Not Just Be Independent
Mistake:
Use independent random masks per user without the pairwise-cancellation structure.
Correction:
Adding independent random noise per user gives — the noise accumulates, and the server gets a noisy aggregate, not the true one. The pairwise-masking construction ensures exact cancellation. This is a common confusion when students first encounter the protocol; remember that the antisymmetry is structural, not incidental.
Historical Note: The Bonawitz et al. Protocol
2017–presentKallista Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth introduced the pairwise-masking secure-aggregation protocol at ACM CCS 2017. The paper combined several ideas from cryptography — Diffie–Hellman key exchange, Shamir secret sharing (Chapter 3), pseudorandom generators — into a single practical FL protocol. The work was motivated directly by the production need at Google Gboard: millions of users, but strong privacy guarantees.
The protocol has since become the de facto standard for production secure aggregation. Extensions and improvements — including the CommIT-group contributions of §10.4 (Caire et al. optimality), Chapter 11 (ByzSecAgg), and Chapter 12 (CCESA) — all build on the Bonawitz foundation.
Key Takeaway
Pairwise masking achieves — only the aggregate leaks. The -additive secret-sharing primitive of Chapter 3 is applied at high dimension via Diffie–Hellman-derived pairwise seeds. The cost is key exchanges per round. Section 10.3 handles dropouts; §10.4 proves the communication cost is information-theoretically tight.
Quick Check
In the Bonawitz protocol, why do the pairwise masks cancel when the server aggregates?
Because the server subtracts them explicitly after receiving all uploads.
By antisymmetry: user adds and user adds , so the pair sums to zero.
The masks are independent random variables, so their sum concentrates around zero by the law of large numbers.
Because they are derived from the DH key exchange, which guarantees cancellation.
Exactly: each pair contributes , and summing over all pairs gives zero. The server sees only the true aggregate.