Ferkans — Interactive Telecom Tutor

Pairwise Masks That Cancel in the Aggregate

Section 10.1 established that secure aggregation requires masking each user's gradient before transmission. The question is: which masks? A naive approach — each user adds independent random noise — destroys the aggregate (the server gets $\mathbf{G} + \text{sum of independent noise}$ , not $\mathbf{G}$ ). A correct approach must arrange that the masks cancel when the server sums across users.

Bonawitz et al.'s key insight: for every pair of users $(i, j)$ , add a common pairwise mask $\mathbf{r}_{ij}$ to one and subtract it from the other. Across the pair, the mask vanishes. The server, summing over all users, gets back the aggregate exactly. Each user's upload looks uniform random to the server; no individual gradient is distinguishable. This is the pairwise-masking construction, and it is the foundation of every modern secure-aggregation protocol.

The point is that this is the $(n, n)$ -additive secret-sharing primitive from Chapter 3 §3.1, now applied to high-dimensional gradient vectors. The construction is simple; the subtlety is in efficient key distribution (via Diffie–Hellman) and dropout handling (via Shamir shares of the seeds). Section 10.2 develops the construction; §10.3 handles dropouts; §10.4 proves optimality.

Definition:
Pairwise-Masking Secure Aggregation

The pairwise-masking secure-aggregation protocol for $n$ users and one server operates over a finite field $\mathbb{F}_q$ (typically $q = 2^{256}$ for cryptographic strength). For every ordered pair $(i, j)$ with $i < j$ , the users share a pseudorandom mask seed $\mathbf{r}_{ij} \in \mathbb{F}_q^d$ generated via a Diffie–Hellman key exchange. We set $\mathbf{r}_{ji} = -\mathbf{r}_{ij}$ (antisymmetric).

Each user $k$ uploads $\tilde{\mathbf{g}}_k \;=\; \mathbf{g}_k \;+\; \sum_{j \neq k} \mathbf{r}_{kj} \;\in\; \mathbb{F}_q^d.$ The server aggregates: $\sum_{k=1}^n \tilde{\mathbf{g}}_k \;=\; \sum_k \mathbf{g}_k \;+\; \sum_k \sum_{j \neq k} \mathbf{r}_{kj} \;=\; \mathbf{G} \;+\; \sum_{i < j} (\mathbf{r}_{ij} + \mathbf{r}_{ji}) \;=\; \mathbf{G} \;+\; \mathbf{0} \;=\; \mathbf{G}.$ The masks cancel in pairs (antisymmetry), so the server receives the aggregate exactly. Each individual upload $\tilde{\mathbf{g}}_k$ is uniform over $\mathbb{F}_q^d$ and reveals nothing about $\mathbf{g}_k$ to any coalition of size $\leq n - 1$ (the mask-seeds are shared only between the two users involved, not with the server).

The construction is a direct generalization of the $(n, n)$ -additive secret-sharing scheme from Chapter 3: each user's gradient is split into a "masked part" that the server sees and "mask contributions" that cancel in the aggregate. The protocol's privacy against the server depends on the users' mutual independence (via DH key exchange); privacy against colluding users requires additional machinery (§10.3's Shamir-shared seeds).

Pairwise Masking

The secure-aggregation primitive in which each pair of users shares a random mask, one user adds it to their upload and the other subtracts. Pairwise masks cancel when summed over all users, revealing only the aggregate to the server.

Mask Seed $\mathbf{r}_{ij}$

A pseudorandom vector shared between a pair of users $(i, j)$ , typically derived from a Diffie–Hellman key exchange. The pair's outputs are $+\\mathbf{r}_{ij}$ at user $i$ and $-\\mathbf{r}_{ij}$ at user $j$ ; the masks cancel in the aggregate.

Bonawitz et al. Secure Aggregation (Basic Version)

Complexity: Per-round communication per user:

O(n)

DH exchanges +

d

scalars uplink. Aggregate per-round:

O(n^2)

messages for DH +

O(n d)

uplink.

Setup (one-time): Public parameters

(g, p)

for

Diffie–Hellman over

\mathbb{F}_p^*

. Each user

k

generates a DH key pair

(\text{sk}_k, \text{pk}_k)

.

Round $t$ :

1. Public-key distribution. Users broadcast their

public keys

\{\text{pk}_k\}

to the server. Server

forwards the full key list to all users.

2. Pairwise seed derivation. Each user

k

, for each

other user

j

, computes the shared DH secret

s_{kj} = \text{HKDF}(\text{pk}_j^{\text{sk}_k})

and

derives the mask seed

\mathbf{r}_{kj} \in \mathbb{F}_q^d

via a PRG

seeded by

s_{kj}

. Set

\mathbf{r}_{kj} = -\mathbf{r}_{jk}

by sign

convention (e.g.,

+

if

k < j

,

-

otherwise).

3. Masking. User

k

computes

\tilde{\mathbf{g}}_k = \mathbf{g}_k + \sum_{j \neq k} \text{sign}(k, j) \mathbf{r}_{kj}

.

4. Upload. User

k

sends

\tilde{\mathbf{g}}_k

to

the server.

5. Aggregation. Server computes

\mathbf{G} = \sum_k \tilde{\mathbf{g}}_k

.

6. Verification (optional). Server returns

\mathbf{G}

to users, who may verify via local

signed hash checks.

The basic version has a strict honest-but-curious assumption. Any user-dropout handling requires additional machinery — the topic of §10.3.

Theorem: Bonawitz Protocol: Correctness and Privacy

Assume the Diffie–Hellman problem is computationally hard and the pseudorandom generator used for mask derivation is a secure PRG (standard cryptographic assumptions). Then the Bonawitz secure-aggregation protocol satisfies:

Correctness. The server receives the aggregate $\mathbf{G} = \sum_k \mathbf{g}_k$ exactly.
Privacy (against honest-but-curious server). Given the uploads $\{\tilde{\mathbf{g}}_k\}$ and the server's protocol-specified view, the joint distribution of $(\mathbf{g}_1, \ldots, \mathbf{g}_n)$ given the server's view is computationally indistinguishable from the uniform distribution conditioned on $\sum_k \mathbf{g}_k = \mathbf{G}$ .
Privacy (against the server + up to $T \leq n - 1$ colluding users, informal). Similar guarantee holds against any coalition of $T \leq n - 1$ users; the precise bound is established with Shamir shares of the seeds (§10.3).

The protocol's privacy is computational, not information-theoretic, because it relies on DH and PRG hardness. In an information-theoretic model with uniform random mask seeds shared out of band, the privacy is IT-tight.

The key property: each user's upload is a sum of the gradient plus $n - 1$ pairwise masks. Of these, at most $n - 1$ are "unknown" to the server (no seeds shared with server). The mask sum is uniform modulo the pairwise-cancellation constraint, so the upload is indistinguishable from uniform to the server — except for the aggregate constraint, which is exactly the allowed leakage.

Operationally: Bonawitz's protocol reveals the aggregate and only the aggregate to the server, achieving the threat-model goal of §10.1.

Proof

Correctness via mask-cancellation

$\sum_k \tilde{\mathbf{g}}_k = \sum_k \mathbf{g}_k + \sum_k \sum_{j \neq k} \text{sign}(k,j) \mathbf{r}_{kj}$ . By antisymmetry $\mathbf{r}_{kj} = -\mathbf{r}_{jk}$ , every pair contributes $\mathbf{r}_{ij} - \mathbf{r}_{ij} = 0$ . Sum over pairs: zero. Hence $\sum_k \tilde{\mathbf{g}}_k = \mathbf{G}$ .

Privacy: the ideal case

In an ideal (information-theoretic) version, the pairwise seeds $\mathbf{r}_{ij}$ are drawn independently uniform over $\mathbb{F}_q^d$ . The server sees $n$ uploads $\tilde{\mathbf{g}}_k = \mathbf{g}_k + \sum_j \mathbf{r}_{kj}$ . Fixing the aggregate $\mathbf{G}$ , the joint distribution of $(\tilde{\mathbf{g}}_1, \ldots, \tilde{\mathbf{g}}_n)$ given $\mathbf{G}$ is uniform over $\{(\mathbf{x}_1, \ldots, \mathbf{x}_n): \sum_k \mathbf{x}_k = \mathbf{G}\}$ . This reveals nothing about individual $\mathbf{g}_k$ beyond the aggregate constraint.

Reduction to DH / PRG

In the Bonawitz protocol, the seeds are not uniform random but derived from DH via a PRG. Under the DDH assumption (Decisional Diffie–Hellman), the PRG output is computationally indistinguishable from uniform. Hence the protocol's privacy is computationally indistinguishable from the ideal case — giving the claimed guarantee. $\blacksquare$

Example: Bonawitz Protocol with $n = 4$ Users

Illustrate the Bonawitz protocol with $n = 4$ users. Specify the pairwise mask structure, each user's upload, and verify that the server's aggregate matches the true sum.

Solution

Pairwise mask structure

Six pairs: $(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)$ . Seeds $\mathbf{r}_{12}, \mathbf{r}_{13}, \ldots, \mathbf{r}_{34}$ shared via DH. Antisymmetry: $\mathbf{r}_{ji} = -\mathbf{r}_{ij}$ .

User uploads

User 1: $\tilde{\mathbf{g}}_1 = \mathbf{g}_1 + \mathbf{r}_{12} + \mathbf{r}_{13} + \mathbf{r}_{14}$ . User 2: $\tilde{\mathbf{g}}_2 = \mathbf{g}_2 - \mathbf{r}_{12} + \mathbf{r}_{23} + \mathbf{r}_{24}$ . User 3: $\tilde{\mathbf{g}}_3 = \mathbf{g}_3 - \mathbf{r}_{13} - \mathbf{r}_{23} + \mathbf{r}_{34}$ . User 4: $\tilde{\mathbf{g}}_4 = \mathbf{g}_4 - \mathbf{r}_{14} - \mathbf{r}_{24} - \mathbf{r}_{34}$ .

Server aggregates

$\sum_k \tilde{\mathbf{g}}_k = \sum_k \mathbf{g}_k + (\mathbf{r}_{12} - \mathbf{r}_{12}) + (\mathbf{r}_{13} - \mathbf{r}_{13}) + \ldots = \mathbf{G}$ . All pairwise masks cancel. ✓

Privacy check

Server sees 4 uploads. Each $\tilde{\mathbf{g}}_k$ is the sum of $\mathbf{g}_k$ and 3 independent pairwise masks. Since the masks are uniform (via DH+PRG), each upload is uniform modulo the aggregate constraint. The server learns $\mathbf{G}$ and cannot distinguish individual gradients.

The $O(n^2)$ Communication Overhead

The Bonawitz protocol requires each user to share DH keys and derive mask seeds with every other user — $O(n)$ DH exchanges per user, $O(n^2)$ total. For moderate $n$ (say, $n = 100$ ), this is $10^4$ DH computations per round, each taking a few milliseconds — manageable but not small. For large $n$ ( $n \geq 10^4$ ), the overhead becomes prohibitive: $10^8$ DH exchanges per round, several minutes of per-round DH overhead on commodity hardware.

This $O(n^2)$ scaling is the structural cost of pairwise masking. CCESA (Chapter 12, another CommIT contribution) reduces it to $O(n\sqrt{n/\log n})$ by using a sparse random graph of pairwise masks instead of the complete graph. §10.4 (Caire et al.) proves that the $O(n^2)$ bound is tight within the class of uncoded groupwise-key schemes; CCESA's improvement comes from leaving this class.

Bonawitz Protocol Communication Overhead

Plot the per-round communication overhead of Bonawitz's protocol as a function of the number of users $n$ , for several model sizes $d$ . The total communication scales as $O(n^2)$ for the pairwise DH exchanges + $O(n d)$ for the gradient uploads. The $n^2$ scaling dominates for small models and large user populations.

Parameters

n

max — users500

Range of users to plot

d

— model size10_000_000

Number of parameters

Pairwise Masking: Masks That Cancel

Animation of the pairwise-masking construction: each pair of users adds and subtracts a common random mask. The masks visibly cancel when the server sums all uploads, revealing only the aggregate gradient.

⚠️Engineering Note

Bonawitz in Google Gboard

The Bonawitz protocol is deployed in Google's Gboard federated-learning system. Production specifics:

User count per round: $n \in [50, 500]$ (constrained by $O(n^2)$ overhead).
DH exchange: P-256 elliptic curve, $\sim 100 \mu$ s per exchange.
Mask derivation: AES-based PRG with 128-bit seed, $\sim 10$ MB/s per-device throughput.
Per-round latency: $\sim 5$ – $20$ seconds for mask generation + upload.
Dropout handling: Shamir secret sharing of seeds (§10.3), threshold $t = \lceil 2n/3\rceil$ .

The $n \leq 500$ ceiling reflects the $O(n^2)$ overhead. For larger populations, CCESA (Chapter 12) is the standard alternative.

Practical Constraints

•
Production $n$ : up to $\sim 500$ users per round
•
DH exchange cost: $100\mu$ s on mobile
•
Per-round overhead: 5–20 seconds added to SGD round
•
Threshold: $t = \lceil 2n/3 \rceil$ for dropout tolerance

📋 Ref: Google Gboard FL deployment; Bonawitz et al. 2019

Common Mistake: Masks Must Cancel, Not Just Be Independent

Mistake:

Use independent random masks per user without the pairwise-cancellation structure.

Correction:

Adding independent random noise per user gives $\sum_k \tilde{\mathbf{g}}_k = \mathbf{G} + \sum_k \boldsymbol{\eta}_k$ — the noise accumulates, and the server gets a noisy aggregate, not the true one. The pairwise-masking construction ensures exact cancellation. This is a common confusion when students first encounter the protocol; remember that the antisymmetry $\mathbf{r}_{kj} = -\mathbf{r}_{jk}$ is structural, not incidental.

Historical Note: The Bonawitz et al. Protocol

2017–present

Kallista Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth introduced the pairwise-masking secure-aggregation protocol at ACM CCS 2017. The paper combined several ideas from cryptography — Diffie–Hellman key exchange, Shamir secret sharing (Chapter 3), pseudorandom generators — into a single practical FL protocol. The work was motivated directly by the production need at Google Gboard: millions of users, but strong privacy guarantees.

The protocol has since become the de facto standard for production secure aggregation. Extensions and improvements — including the CommIT-group contributions of §10.4 (Caire et al. optimality), Chapter 11 (ByzSecAgg), and Chapter 12 (CCESA) — all build on the Bonawitz foundation.

Key Takeaway

Pairwise masking achieves $I(\mathbf{g}_k; \text{server view}) = I(\mathbf{g}_k; \mathbf{G})$ — only the aggregate leaks. The $(n, n)$ -additive secret-sharing primitive of Chapter 3 is applied at high dimension via Diffie–Hellman-derived pairwise seeds. The cost is $O(n^2)$ key exchanges per round. Section 10.3 handles dropouts; §10.4 proves the communication cost is information-theoretically tight.

Quick Check

In the Bonawitz protocol, why do the pairwise masks $\mathbf{r}_{ij}$ cancel when the server aggregates?

Because the server subtracts them explicitly after receiving all uploads.

By antisymmetry: user $i$ adds $+\\mathbf{r}_{ij}$ and user $j$ adds $-\\mathbf{r}_{ij}$ , so the pair sums to zero.

The masks are independent random variables, so their sum concentrates around zero by the law of large numbers.

Because they are derived from the DH key exchange, which guarantees cancellation.