The ByzSecAgg Protocol (CommIT Contribution)

The Three Primitives Behind ByzSecAgg

Section 11.1 established the problem: combine Bonawitz-style information-theoretic privacy with active Byzantine defense. The CommIT-group ByzSecAgg protocol (Jahani-Nezhad, Maddah-Ali, Caire 2023) composes three primitives developed earlier in this book:

  1. Ramp secret sharing (Chapter 3 Β§3.4): each user's gradient is shared as a ramp (t1,t2,nβˆ’1)(t_1, t_2, n - 1)-scheme with width g=t2βˆ’t1g = t_2 - t_1. This gives gg-fold smaller shares than Shamir while preserving privacy against t1t_1 colluders.

  2. Polynomial-coded distance computation (Chapter 5, adapted): pairwise distances βˆ₯giβˆ’gjβˆ₯2\|\mathbf{g}_i - \mathbf{g}_j\|^2 are computed on shares via polynomial codes, letting the server detect outliers (Byzantine gradients) without learning individual values.

  3. Vector commitments (cryptographic primitive): each user commits to its gradient before sharing, allowing the server to verify integrity (no shares can be modified after commitment).

The point is that ByzSecAgg is modular: each primitive has a clean interface, and the composition delivers a protocol with all the properties. Section 11.2 specifies the protocol; Β§11.3 explains the coded outlier detection in detail; Β§11.4 analyzes the resulting communication complexity.

πŸŽ“CommIT Contribution(2023)

ByzSecAgg: Byzantine-Resilient Secure Aggregation

T. Jahani-Nezhad, M. A. Maddah-Ali, G. Caire β€” IEEE Journal on Selected Areas in Information Theory

ByzSecAgg, a CommIT-group contribution by Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, and Giuseppe Caire, is a federated-learning aggregation protocol that provides:

  • Information-theoretic privacy against any coalition of TT users (or server + colluding users).
  • Byzantine resilience against any BB malicious users sending arbitrarily corrupted gradients.
  • Communication complexity O(nlog⁑n+Bd)O(n \log n + Bd) β€” asymptotically much better than prior Byzantine- resilient + private schemes.

Key technical contributions:

  1. Ramp secret sharing for gradients. Each user's gradient is split via a (t1,t2,nβˆ’1)(t_1, t_2, n-1)-ramp scheme with t1=Tt_1 = T (privacy threshold) and t2=T+2B+1t_2 = T + 2B + 1 (reconstruction with Byzantine error correction). The ramp width g=t2βˆ’t1=2B+1g = t_2 - t_1 = 2B + 1 gives (2B+1)(2B+1)-fold smaller shares than Shamir.

  2. Coded distance computation. Pairwise Euclidean-distance scores between users are computed on shares using polynomial-code evaluation (Chapter 5 framework, generalized to quadratic functions via Lagrange-coded computing). The server obtains all pairwise distance scores without learning individual gradients.

  3. Outlier detection via distance aggregation. The server uses the distance scores to identify Byzantine users (those with abnormal distance profiles). Krum-style filtering operates on the coded distances, not on plaintext gradients β€” preserving privacy.

  4. Vector commitment for integrity. Each user broadcasts a Merkle / Pedersen commitment to its gradient before sharing. This prevents Byzantine users from modifying shares after the protocol has begun.

Communication complexity breakdown:

  • Per-user share upload: d/g=d/(2B+1)d / g = d / (2B + 1) scalars (ramp savings).
  • Per-user vector commitment: O(log⁑d)O(\log d) bits.
  • Coded distance reconstruction: O(n2)O(n^2) field operations, O(nlog⁑n)O(n \log n) communication.
  • Total: O(nlog⁑n+Bd)O(n \log n + B d) bits per round, where the BdBd term is the privacy / Byzantine cost and the nlog⁑nn \log n term is the structural overhead.

The result is a substantial improvement over the prior Byzantine-resilient secure-aggregation schemes (e.g., Bonawitz + replicated Krum), which paid O(n2+nd)O(n^2 + n d) overhead. ByzSecAgg matches the information-theoretic lower bound for the Byzantine + privacy regime up to logarithmic factors.

The construction is one of the most intricate in Part III of this book and showcases how the coded-computing toolkit (Parts II) extends to the privacy + robustness setting.

byzsecaggcommit-contributionbyzantineView Paper β†’

Definition:

ByzSecAgg Protocol Outline

The ByzSecAgg protocol for nn users with BB Byzantine tolerance and TT privacy threshold over a finite field Fq\mathbb{F}_q runs the following phases per round:

Phase 0: Setup. Public parameters for vector commitment (e.g., Merkle tree depth = ⌈log⁑dβŒ‰\lceil \log d \rceil) and ramp-sharing parameters (t1,t2)=(T,T+2B+1)(t_1, t_2) = (T, T + 2B + 1).

Phase 1: Commitment. Each user kk computes a vector commitment Ck\mathcal{C}_k to its gradient gk\mathbf{g}_k and broadcasts Ck\mathcal{C}_k to the server.

Phase 2: Ramp-Shared Upload. Each user kk splits gk\mathbf{g}_k into a ramp (t1,t2,nβˆ’1)(t_1, t_2, n-1)- scheme over Fq\mathbb{F}_q. Each share sk,js_{k,j} is sent to user jj (peer-to-peer or via the server as relay).

Phase 3: Coded Distance Computation. Users cooperate to compute pairwise distance scores d^ij=βˆ₯giβˆ’gjβˆ₯2\hat d_{ij} = \|\mathbf{g}_i - \mathbf{g}_j\|^2 using polynomial-coded computation on the shares (Β§11.3 details). The server collects the distance scores.

Phase 4: Byzantine Identification. Server runs a Krum-style filter on the distance scores: for each user kk, compute the sum of squared distances to the nβˆ’Bβˆ’1n - B - 1 nearest other users; users with the largest sums are flagged as Byzantine.

Phase 5: Aggregation on Filtered Set. Server requests the surviving users' shares, reconstructs the gradient sum over the honest set, and verifies via the vector commitments.

Phase 6: Output. Server delivers the verified aggregate Gβˆ—=βˆ‘kβˆ‰Bgk\mathbf{G}^* = \sum_{k \notin \mathcal{B}} \mathbf{g}_k.

Each phase composes one of the three primitives. The ramp sharing keeps individual gradients private (Phase 2); the coded distance computation lets the server detect Byzantine users without learning gradients (Phase 3–4); the vector commitments prevent Byzantine users from forging different shares (Phase 1, 5).

ByzSecAgg

The CommIT-group Byzantine-resilient secure-aggregation protocol that combines ramp secret sharing, coded distance computation, and vector commitments to defend against BB Byzantine users while preserving privacy. Communication complexity O(nlog⁑n+Bd)O(n \log n + Bd).

Vector Commitment

A cryptographic primitive that lets a party commit to a vector v\mathbf{v} such that (i) the commitment hides v\mathbf{v}, (ii) the committer cannot later change v\mathbf{v}, (iii) individual entries can be "opened" (proven) efficiently. Examples: Merkle trees, Pedersen vector commitments, KZG commitments.

ByzSecAgg β€” Server-Side Operations

Complexity: O(n2log⁑n+nd/g)O(n^2 \log n + n d / g) server work; O(n2d/g+Bd)O(n^2 d / g + B d) bandwidth.
Input: Round identifier, expected user set [n][n],
Byzantine bound BB, privacy threshold TT, ramp
parameters (t1,t2)(t_1, t_2).
Output: Honest aggregate Gβˆ—=βˆ‘kβˆ‰Bgk\mathbf{G}^* = \sum_{k \notin \mathcal{B}} \mathbf{g}_k.
1. Receive commitments {Ck}\{\mathcal{C}_k\} from
all users.
2. Receive distance shares. For each pair (i,j)(i, j),
receive the coded computation shares from users
(these encode βˆ₯giβˆ’gjβˆ₯2\|\mathbf{g}_i - \mathbf{g}_j\|^2).
Reconstruct d^ij\hat d_{ij} via Lagrange interpolation
(the polynomial structure of the coded computation,
Chapter 5 framework adapted).
3. Krum-style filtering. For each user kk:
- Compute the sum of squared distances to the nβˆ’Bβˆ’1n - B - 1 nearest other users:
Sk=βˆ‘j∈nearest(k,nβˆ’Bβˆ’1)d^kjS_k = \sum_{j \in \text{nearest}(k, n-B-1)} \hat d_{kj}.
- Users with the BB largest SkS_k values are flagged
as Byzantine (set B\mathcal{B}).
4. Request reconstruction shares for the surviving
set H=[n]βˆ–B\mathcal{H} = [n] \setminus \mathcal{B} from
all surviving users.
5. Reconstruct aggregate. From the ramp shares,
reconstruct Gβˆ—=βˆ‘k∈Hgk\mathbf{G}^* = \sum_{k \in \mathcal{H}} \mathbf{g}_k via Lagrange interpolation
(threshold t2βˆ’2B=T+1t_2 - 2B = T + 1 shares suffice
since at most 2B2B are corrupted from the
Byzantine users still in H\mathcal{H} if any
slipped through filtering β€” Reed-Solomon decoding
handles this).
6. Verify Gβˆ—\mathbf{G}^* against the vector
commitments using the openings Ο€k\pi_k from
surviving users.
7. Output Gβˆ—\mathbf{G}^*.

The protocol is more involved than Bonawitz, with multiple rounds of communication. The key efficiency is in the combined cost: O(nlog⁑n+Bd)O(n \log n + Bd) per round vs. O(n2+nd)O(n^2 + nd) for naive composition of Bonawitz with Krum.

Theorem: ByzSecAgg: Privacy + Byzantine + Communication

The ByzSecAgg protocol (Algorithm above) with parameters (n,B,T,g=2B+1)(n, B, T, g = 2B + 1) over Fq\mathbb{F}_q satisfies:

  1. Correctness under Byzantine attack. For any set B\mathcal{B} of Byzantine users with ∣Bβˆ£β‰€B≀(nβˆ’Tβˆ’1)/3|\mathcal{B}| \leq B \leq (n - T - 1)/3, the server's output Gβˆ—\mathbf{G}^* equals βˆ‘k∈Hgk\sum_{k \in \mathcal{H}} \mathbf{g}_k exactly, where H\mathcal{H} is the honest set.

  2. Information-theoretic privacy. Server + any coalition of TT users learns nothing about individual gk\mathbf{g}_k (for honest kk) beyond what is implied by Gβˆ—\mathbf{G}^*.

  3. Communication complexity. Per-round per-user upload: O(d/(2B+1)+log⁑d)O(d / (2B + 1) + \log d) scalars. Aggregate: O(nlog⁑n+Bd)O(n \log n + B d) bits.

The Byzantine bound B≀(nβˆ’Tβˆ’1)/3B \leq (n - T - 1)/3 is the feasibility constraint for combined privacy and Byzantine resilience.

Each piece of the construction has a precise role:

  • Ramp width g=2B+1g = 2B + 1 matches Reed-Solomon error correction: any 2B2B corrupted shares can be detected and corrected.
  • Coded distance computation lets the server observe pairwise gradient relationships without individual values β€” enough to filter Byzantine via Krum-style logic but not enough to invert.
  • Vector commitments prevent Byzantine users from changing their gradient between commitment and reconstruction phases.

The communication complexity O(nlog⁑n+Bd)O(n \log n + Bd) is the right asymptotic: the BdBd term is the Byzantine cost (each Byzantine user "uses up" dd bits of error-correction overhead); the nlog⁑nn \log n is the structural protocol overhead. No prior scheme matched both bounds.

Example: ByzSecAgg in Production: Numbers

Compute the per-round communication for ByzSecAgg with n=100n = 100 users, B=10B = 10 Byzantine bound, T=20T = 20 privacy threshold, d=107d = 10^7 gradient dimension. Compare with Bonawitz + Krum naively composed.

ByzSecAgg: Six-Phase Protocol Flow

Animation of the ByzSecAgg protocol's six phases: commitment, ramp-shared upload, coded distance computation, Krum filtering, aggregation on filtered set, and verification. Highlights how each primitive composes for the combined guarantee.

ByzSecAgg vs. Naive Composition: Communication Cost

Plot the per-round communication of ByzSecAgg (O(nlog⁑n+Bd)O(n \log n + Bd)) and naive Bonawitz + Krum composition (O(n2+nBd)O(n^2 + nBd)) as a function of nn, for fixed B/nB/n ratio and dd. ByzSecAgg's asymptotic advantage is dramatic at large nn.

Parameters
500
0.1
10000000

Common Mistake: Each Primitive Is Necessary

Mistake:

Drop one of the three primitives (e.g., omit vector commitments) for "simplicity".

Correction:

Each primitive plays a distinct role:

  • Ramp sharing: privacy + Byzantine error correction structure.
  • Coded distance computation: detection without inspection.
  • Vector commitments: integrity (prevent share modification).

Omitting any one breaks at least one guarantee: no commitments β†’ Byzantine users can change shares after distance computation; no ramp β†’ privacy gone or shares too large; no coded distances β†’ must use plaintext gradients (privacy lost). The composition is required for the combined guarantee.

πŸ”§Engineering Note

ByzSecAgg in Production: Status

ByzSecAgg's production adoption is limited as of 2024:

  • Research deployments: TU Berlin (CommIT group), USC, NVIDIA Clara research division.
  • Engineering challenges: Multi-phase protocol with peer-to-peer communication is harder to orchestrate than single-round Bonawitz; vector commitment cryptography adds engineering complexity.
  • Latency: ∼5\sim 5–10Γ—10\times a Bonawitz round due to the multi-phase structure.
  • Fit: Best for cross-silo FL with moderate nn and high adversarial-resilience requirements.

For lighter-weight deployments, alternatives like Bonawitz + DP noise (with weaker Byzantine tolerance) remain dominant. The ByzSecAgg result establishes the information-theoretic frontier; production tooling is catching up.

Practical Constraints
  • β€’

    Latency: 5–10Γ—\times Bonawitz per round

  • β€’

    Cryptographic complexity: vector commitments add engineering work

  • β€’

    Best fit: cross-silo FL with n∼10n \sim 10–100100, high security needs

πŸ“‹ Ref: Jahani-Nezhad et al. 2023; CommIT group implementation

Key Takeaway

ByzSecAgg achieves Byzantine + privacy + O(nlog⁑n+Bd)O(n \log n + Bd) communication. The protocol composes ramp secret sharing, polynomial-coded distance computation, and vector commitments. The construction is the third CommIT contribution of Part III and a major step toward full privacy-and-robustness guarantees in production FL. Section 11.3 develops the coded distance computation in detail.

Quick Check

The ByzSecAgg protocol uses three primitives. Match each primitive to its role:

Ramp secret sharing: integrity / commitment.

Ramp sharing: privacy + Byzantine-error correction structure. Coded distance computation: outlier detection without learning individual gradients. Vector commitments: integrity (prevents share modification after the fact).

Vector commitments: privacy.

Coded distance computation: encryption.