Ferkans — Interactive Telecom Tutor

Why $O(n^2)$ Is a Problem at Scale

Chapter 10 established that Bonawitz's secure aggregation has $O(n^2)$ communication overhead: each pair of users must exchange a Diffie–Hellman key and derive a pairwise mask. Chapter 10's §10.4 (Caire et al. 2022) proved this overhead is information-theoretically optimal within the class of uncoded groupwise-key schemes. For $n \leq 500$ — Google's Gboard-scale FL — this is tolerable. For $n \geq 10^4$ — envisioned large-scale FL with millions of devices — it is prohibitive.

The point is that $O(n^2)$ scaling fundamentally limits FL deployability. At $n = 10^6$ mobile users with $100\mu$ s per DH exchange, the key-distribution phase alone would take $10^{12} \cdot 10^{-4} = 10^8$ seconds = 3 years to complete. Clearly not workable.

CCESA (Communication-Efficient Secure Aggregation), the fourth CommIT-group contribution of Part III, breaks the $O(n^2)$ barrier by using a sparse random graph of pairwise masks instead of the complete graph. The construction achieves $O(n\sqrt{n/\log n})$ communication — asymptotically sub-quadratic — while preserving the information-theoretic privacy guarantees of Bonawitz. Section 12.1 motivates the problem; §12.2 introduces CCESA; §12.3 analyzes its guarantees; §12.4 compares with alternatives.

Bonawitz's Cost Components Revisited

Bonawitz's per-round communication breaks down as:

Gradient upload: $n \cdot d$ bits — unavoidable, each user transmits $d$ scalars.
DH key exchange: $\binom{n}{2} \approx n^2/2$ pairwise exchanges, $O(n^2)$ total.
Mask derivation: $O(n d)$ PRG evaluations per user, $O(n^2 d)$ total.
Shamir share exchange (for dropout handling): $O(n^2)$ share transmissions.

The $n^2$ terms dominate for moderate $n$ and become catastrophic for large $n$ . The question is whether we can avoid the $n^2$ terms while preserving the privacy guarantee.

The Caire et al. (2022) optimality result (Chapter 10 §10.4) says that within the uncoded groupwise- key class, we cannot. But the theorem's scope is precisely the uncoded class: schemes where the masks are uncoded over pairwise or group keys. CCESA leaves this class by randomizing which pairs share keys, achieving $O(n\sqrt{n/\log n})$ total communication with high probability.

Theorem: Bonawitz Is Infeasible for Large $n$

The Bonawitz protocol's per-round communication is $\Theta(n^2 + nd)$ where $d$ is the gradient dimension. For practical FL workloads:

$n = 100, d = 10^7$ : $10^4 + 10^9 = 10^9$ bits per round. DH exchanges are negligible; gradient dominates. Feasible.
$n = 10^4, d = 10^7$ : $10^8 + 10^{11}$ . DH terms become significant but still dominated by gradient. Marginal.
$n = 10^6, d = 10^7$ : $10^{12} + 10^{13}$ . DH exchanges at $\sim 100 \mu$ s each: $10^{12} \cdot 10^{-4} = 10^8$ seconds = years. Infeasible.

For large- $n$ deployments, a sub-quadratic alternative is required. CCESA is the CommIT-group answer.

The $n^2$ scaling is not a constant-factor engineering issue — it is a fundamental communication-complexity limit within the Bonawitz framework. Once $n$ exceeds a few thousand, the DH-exchange phase alone takes longer than a reasonable per-round latency (seconds).

Operationally: Google's Gboard deployment stays at $n \leq 500$ per round, explicitly to avoid hitting this wall. Scaling to larger populations requires a different protocol class.

Proof

Breakdown of costs

Per-round communication is the sum of:

$n \cdot d$ bits for gradient uploads,
$\binom{n}{2} \cdot C_{\text{DH}}$ bits for pairwise DH exchanges ( $C_{\text{DH}}$ is the DH-message size),
Shamir-share exchange: also $\binom{n}{2}$ pairwise transmissions.

Total: $nd + n^2 \cdot O(1)$ .

Large-$n$ dominance

When $n^2 \gg nd$ , the $n^2$ term dominates. This happens when $n \gg d$ , which is common for small models or very large user pools.

Latency argument

Each DH exchange costs $\sim 100\mu$ s on mobile. For $n = 10^6$ : total DH time $\binom{10^6}{2} \cdot 100\mu\text{s} \approx 5 \cdot 10^{11} \cdot 10^{-4}$ s $= 5 \cdot 10^7$ s. Not deployable. $\blacksquare$

,

Example: The Bonawitz Wall at $n = 10^6$

A hypothetical FL deployment with $n = 10^6$ mobile users and model size $d = 10^7$ scalars. Compute the per-round communication overhead of Bonawitz and identify why it fails.

Solution

Pairs

$\binom{n}{2} = \binom{10^6}{2} \approx 5 \cdot 10^{11}$ . Each pair: 1 DH exchange + 1 Shamir share transmission per gradient scalar.

DH + Shamir

DH alone: $5 \cdot 10^{11} \cdot 10^{-4}\text{s} = 5 \cdot 10^7$ seconds $\approx 1.5$ years per round. No deployment can afford this.

Gradient upload (comparison)

$n d = 10^{13}$ scalars = $4 \cdot 10^{14}$ bits at 32-bit precision = 50 TB aggregate. Over a mobile network at 10 Mbps per-user, this is $\sim 5$ seconds per user. Much faster than Bonawitz's pairwise overhead.

Conclusion

At $n = 10^6$ , Bonawitz's pairwise overhead ( $n^2$ ) dominates by many orders of magnitude. Any secure-aggregation protocol for this scale must have sub-quadratic overhead. CCESA is one answer.

The $O(n^2)$ Wall: Bonawitz vs. CCESA

Plot the per-round communication overhead as a function of $n$ for Bonawitz ( $O(n^2)$ ) and CCESA ( $O(n\sqrt{n/\log n})$ ). At small $n$ the curves are close; at $n \geq 10^3$ Bonawitz diverges steeply while CCESA stays manageable. The crossover point depends on constants; for typical production parameters it is around $n = 200$ – $500$ .

Parameters

n

max — users10000

Range of user counts to plot

d

— model size10_000_000

Number of parameters per gradient

Scaling Alternatives for Secure Aggregation

Protocol	Communication	Privacy guarantee	Max practical $n$
Bonawitz (Ch. 10)	$O(n^2 + nd)$	IT, $T$ colluders	$\sim 500$
ByzSecAgg (Ch. 11)	$O(n\log n + Bd)$	IT, $T$ colluders + $B$ Byzantines	$\sim 200$ (for Byzantine)
CCESA (this chapter)	$O(n\sqrt{n/\log n} + nd)$	IT w.h.p., $T$ colluders	$\sim 10^5$
FastSecAgg (Kadhe 2020)	$O(n \log n + nd)$	Partial	$\sim 10^4$
Homomorphic encryption	$O(nd \kappa)$ ( $\kappa$ = ciphertext overhead)	Computational	Varies

Leaving the Uncoded Groupwise-Key Class

Caire et al. (2022) proved that within the uncoded groupwise-key class, Bonawitz's $O(n^2)$ is information-theoretically tight. To go below this bound requires leaving the class. CCESA does this in a specific way: the mask-pair structure is randomized. Which pairs share keys is determined by a random graph (Erdős–Rényi), rather than being a deterministic complete graph.

The tradeoff: the privacy guarantee becomes probabilistic (holds with high probability over the random graph) rather than deterministic. In practice this distinction is negligible — the failure probability is $\leq n^{-c}$ for some constant $c > 0$ , much smaller than any real-world failure. But from an information-theoretic standpoint, the scheme class has changed.

CCESA is therefore not a contradiction of Caire et al.'s optimality — it is a different scheme with different properties. The optimality bound applies within its class; CCESA sits in a strictly larger class (randomized schemes).

Historical Note: From Regular Graphs to Random Graphs

2020–2022

The idea of sparser mask-pairing goes back to FastSecAgg (Kadhe et al. 2020), which used regular graphs (each user masking with $O(\log n)$ others) to reduce overhead to $O(n \log n)$ . FastSecAgg preserved privacy within its class but at a weaker guarantee than Bonawitz.

CCESA (2022), the CommIT-group contribution carried in Chapter 12, improved the construction by using Erdős–Rényi random graphs. The random-graph approach provides the same (probabilistic) privacy guarantee as Bonawitz at only $O(n\sqrt{n/\log n})$ overhead — closer to the information-theoretic lower bound for the sparse-graph class.

The shift from regular graphs to random graphs is conceptually simple but has subtle consequences for the privacy analysis: random graphs concentrate on their expected degree distribution, providing probabilistic but tight guarantees. Section 12.3 develops the analysis.

,

⚠️Engineering Note

Production Impact of the $O(n^2)$ Wall

Production FL deployments navigate the $O(n^2)$ wall in different ways:

Google Gboard: hard-limits round size to $n \leq 500$ . Multiple "cohorts" processed sequentially. Limits parallelism; extends training wall-clock.
Apple Siri: smaller round sizes ( $n \leq 200$ ) with more rounds. Privacy- accounting complexity increases.
Cross-silo FL (hospitals, banks): small $n$ ( $\leq 20$ ) per round; $n^2$ not a bottleneck. Bonawitz is fine.
Hypothetical large- $n$ FL: CCESA or similar sub-quadratic protocols required. Production adoption growing as $n$ increases.

The $O(n^2)$ wall is not just theoretical; it constrains real production choices. CCESA (and similar protocols) are the path to FL at million-user scale.

Practical Constraints

•
Gboard production: $n \leq 500$ per round
•
Cross-silo: $n \leq 20$ , $n^2$ fine
•
Million-user FL: requires CCESA or equivalent

📋 Ref: Google FL whitepaper; Bonawitz et al. 2019 §V

Key Takeaway

Bonawitz's $O(n^2)$ overhead is the scaling bottleneck for large- $n$ FL. At $n = 10^3$ , the overhead is $10^6$ — tolerable. At $n = 10^6$ , the overhead is $10^{12}$ — infeasible. CCESA's sparse-graph construction (§12.2) breaks this wall by moving to a randomized scheme class, achieving $O(n\sqrt{n/\log n})$ at essentially the same privacy guarantee.

Common Mistake: Constant-Factor Optimization Won't Save Bonawitz

Mistake:

Argue that Bonawitz's $n^2$ overhead can be made tractable by optimizing the DH exchange or using faster cryptographic primitives.

Correction:

The $n^2$ scaling is algebraic, not implementation-dependent. No amount of constant-factor optimization changes the asymptotic. Even if DH exchanges were instantaneous, the $n^2$ pairwise communication messages (key material, Shamir shares) would still require $\Omega(n^2)$ network bandwidth.

The only way to beat $n^2$ is to change the scheme class — use fewer pairs (CCESA) or different primitives (homomorphic encryption, differential privacy). Constant-factor engineering on Bonawitz itself cannot.

Quick Check

At $n = 10^5$ users, Bonawitz's per-round communication overhead is approximately how many pairwise DH exchanges?

$\sim 10^5$ (linear)

$\sim 5 \cdot 10^9$ (quadratic)

$\sim 10^7$ ( $n \log n$ )

$\sim 10^{10}$ (cubic)