Ferkans — Interactive Telecom Tutor

The Three Primitives Behind ByzSecAgg

Section 11.1 established the problem: combine Bonawitz-style information-theoretic privacy with active Byzantine defense. The CommIT-group ByzSecAgg protocol (Jahani-Nezhad, Maddah-Ali, Caire 2023) composes three primitives developed earlier in this book:

Ramp secret sharing (Chapter 3 §3.4): each user's gradient is shared as a ramp $(t_1, t_2, n - 1)$ -scheme with width $g = t_2 - t_1$ . This gives $g$ -fold smaller shares than Shamir while preserving privacy against $t_1$ colluders.
Polynomial-coded distance computation (Chapter 5, adapted): pairwise distances $\|\mathbf{g}_i - \mathbf{g}_j\|^2$ are computed on shares via polynomial codes, letting the server detect outliers (Byzantine gradients) without learning individual values.
Vector commitments (cryptographic primitive): each user commits to its gradient before sharing, allowing the server to verify integrity (no shares can be modified after commitment).

The point is that ByzSecAgg is modular: each primitive has a clean interface, and the composition delivers a protocol with all the properties. Section 11.2 specifies the protocol; §11.3 explains the coded outlier detection in detail; §11.4 analyzes the resulting communication complexity.

🎓CommIT Contribution(2023)

ByzSecAgg: Byzantine-Resilient Secure Aggregation

T. Jahani-Nezhad, M. A. Maddah-Ali, G. Caire — IEEE Journal on Selected Areas in Information Theory

ByzSecAgg, a CommIT-group contribution by Tayyebeh Jahani-Nezhad, Mohammad Ali Maddah-Ali, and Giuseppe Caire, is a federated-learning aggregation protocol that provides:

Information-theoretic privacy against any coalition of $T$ users (or server + colluding users).
Byzantine resilience against any $B$ malicious users sending arbitrarily corrupted gradients.
Communication complexity $O(n \log n + Bd)$ — asymptotically much better than prior Byzantine- resilient + private schemes.

Key technical contributions:

Ramp secret sharing for gradients. Each user's gradient is split via a $(t_1, t_2, n-1)$ -ramp scheme with $t_1 = T$ (privacy threshold) and $t_2 = T + 2B + 1$ (reconstruction with Byzantine error correction). The ramp width $g = t_2 - t_1 = 2B + 1$ gives $(2B+1)$ -fold smaller shares than Shamir.
Coded distance computation. Pairwise Euclidean-distance scores between users are computed on shares using polynomial-code evaluation (Chapter 5 framework, generalized to quadratic functions via Lagrange-coded computing). The server obtains all pairwise distance scores without learning individual gradients.
Outlier detection via distance aggregation. The server uses the distance scores to identify Byzantine users (those with abnormal distance profiles). Krum-style filtering operates on the coded distances, not on plaintext gradients — preserving privacy.
Vector commitment for integrity. Each user broadcasts a Merkle / Pedersen commitment to its gradient before sharing. This prevents Byzantine users from modifying shares after the protocol has begun.

Communication complexity breakdown:

Per-user share upload: $d / g = d / (2B + 1)$ scalars (ramp savings).
Per-user vector commitment: $O(\log d)$ bits.
Coded distance reconstruction: $O(n^2)$ field operations, $O(n \log n)$ communication.
Total: $O(n \log n + B d)$ bits per round, where the $Bd$ term is the privacy / Byzantine cost and the $n \log n$ term is the structural overhead.

The result is a substantial improvement over the prior Byzantine-resilient secure-aggregation schemes (e.g., Bonawitz + replicated Krum), which paid $O(n^2 + n d)$ overhead. ByzSecAgg matches the information-theoretic lower bound for the Byzantine + privacy regime up to logarithmic factors.

The construction is one of the most intricate in Part III of this book and showcases how the coded-computing toolkit (Parts II) extends to the privacy + robustness setting.

byzsecaggcommit-contributionbyzantineView Paper →

Definition:
ByzSecAgg Protocol Outline

The ByzSecAgg protocol for $n$ users with $B$ Byzantine tolerance and $T$ privacy threshold over a finite field $\mathbb{F}_q$ runs the following phases per round:

Phase 0: Setup. Public parameters for vector commitment (e.g., Merkle tree depth = $\lceil \log d \rceil$ ) and ramp-sharing parameters $(t_1, t_2) = (T, T + 2B + 1)$ .

Phase 1: Commitment. Each user $k$ computes a vector commitment $\mathcal{C}_k$ to its gradient $\mathbf{g}_k$ and broadcasts $\mathcal{C}_k$ to the server.

Phase 2: Ramp-Shared Upload. Each user $k$ splits $\mathbf{g}_k$ into a ramp $(t_1, t_2, n-1)$ - scheme over $\mathbb{F}_q$ . Each share $s_{k,j}$ is sent to user $j$ (peer-to-peer or via the server as relay).

Phase 3: Coded Distance Computation. Users cooperate to compute pairwise distance scores $\hat d_{ij} = \|\mathbf{g}_i - \mathbf{g}_j\|^2$ using polynomial-coded computation on the shares (§11.3 details). The server collects the distance scores.

Phase 4: Byzantine Identification. Server runs a Krum-style filter on the distance scores: for each user $k$ , compute the sum of squared distances to the $n - B - 1$ nearest other users; users with the largest sums are flagged as Byzantine.

Phase 5: Aggregation on Filtered Set. Server requests the surviving users' shares, reconstructs the gradient sum over the honest set, and verifies via the vector commitments.

Phase 6: Output. Server delivers the verified aggregate $\mathbf{G}^* = \sum_{k \notin \mathcal{B}} \mathbf{g}_k$ .

Each phase composes one of the three primitives. The ramp sharing keeps individual gradients private (Phase 2); the coded distance computation lets the server detect Byzantine users without learning gradients (Phase 3–4); the vector commitments prevent Byzantine users from forging different shares (Phase 1, 5).

ByzSecAgg

The CommIT-group Byzantine-resilient secure-aggregation protocol that combines ramp secret sharing, coded distance computation, and vector commitments to defend against $B$ Byzantine users while preserving privacy. Communication complexity $O(n \log n + Bd)$ .

Vector Commitment

A cryptographic primitive that lets a party commit to a vector $\mathbf{v}$ such that (i) the commitment hides $\mathbf{v}$ , (ii) the committer cannot later change $\mathbf{v}$ , (iii) individual entries can be "opened" (proven) efficiently. Examples: Merkle trees, Pedersen vector commitments, KZG commitments.

ByzSecAgg — Server-Side Operations

Complexity:

O(n^2 \log n + n d / g)

server work;

O(n^2 d / g + B d)

bandwidth.

Input: Round identifier, expected user set

[n]

,

Byzantine bound

B

, privacy threshold

T

, ramp

parameters

(t_1, t_2)

.

Output: Honest aggregate

\mathbf{G}^* = \sum_{k \notin \mathcal{B}} \mathbf{g}_k

.

1. Receive commitments

\{\mathcal{C}_k\}

from

all users.

2. Receive distance shares. For each pair

(i, j)

,

receive the coded computation shares from users

(these encode

\|\mathbf{g}_i - \mathbf{g}_j\|^2

).

Reconstruct

\hat d_{ij}

via Lagrange interpolation

(the polynomial structure of the coded computation,

Chapter 5 framework adapted).

3. Krum-style filtering. For each user

k

:

- Compute the sum of squared distances to the

n - B - 1

nearest other users:

S_k = \sum_{j \in \text{nearest}(k, n-B-1)} \hat d_{kj}

.

- Users with the

B

largest

S_k

values are flagged

as Byzantine (set

\mathcal{B}

).

4. Request reconstruction shares for the surviving

set

\mathcal{H} = [n] \setminus \mathcal{B}

from

all surviving users.

5. Reconstruct aggregate. From the ramp shares,

reconstruct

\mathbf{G}^* = \sum_{k \in \mathcal{H}} \mathbf{g}_k

via Lagrange interpolation

(threshold

t_2 - 2B = T + 1

shares suffice

since at most

2B

are corrupted from the

Byzantine users still in

\mathcal{H}

if any

slipped through filtering — Reed-Solomon decoding

handles this).

6. Verify

\mathbf{G}^*

against the vector

commitments using the openings

\pi_k

from

surviving users.

7. Output

\mathbf{G}^*

.

The protocol is more involved than Bonawitz, with multiple rounds of communication. The key efficiency is in the combined cost: $O(n \log n + Bd)$ per round vs. $O(n^2 + nd)$ for naive composition of Bonawitz with Krum.

Theorem: ByzSecAgg: Privacy + Byzantine + Communication

The ByzSecAgg protocol (Algorithm above) with parameters $(n, B, T, g = 2B + 1)$ over $\mathbb{F}_q$ satisfies:

Correctness under Byzantine attack. For any set $\mathcal{B}$ of Byzantine users with $|\mathcal{B}| \leq B \leq (n - T - 1)/3$ , the server's output $\mathbf{G}^*$ equals $\sum_{k \in \mathcal{H}} \mathbf{g}_k$ exactly, where $\mathcal{H}$ is the honest set.
Information-theoretic privacy. Server + any coalition of $T$ users learns nothing about individual $\mathbf{g}_k$ (for honest $k$ ) beyond what is implied by $\mathbf{G}^*$ .
Communication complexity. Per-round per-user upload: $O(d / (2B + 1) + \log d)$ scalars. Aggregate: $O(n \log n + B d)$ bits.

The Byzantine bound $B \leq (n - T - 1)/3$ is the feasibility constraint for combined privacy and Byzantine resilience.

Each piece of the construction has a precise role:

Ramp width $g = 2B + 1$ matches Reed-Solomon error correction: any $2B$ corrupted shares can be detected and corrected.
Coded distance computation lets the server observe pairwise gradient relationships without individual values — enough to filter Byzantine via Krum-style logic but not enough to invert.
Vector commitments prevent Byzantine users from changing their gradient between commitment and reconstruction phases.

The communication complexity $O(n \log n + Bd)$ is the right asymptotic: the $Bd$ term is the Byzantine cost (each Byzantine user "uses up" $d$ bits of error-correction overhead); the $n \log n$ is the structural protocol overhead. No prior scheme matched both bounds.

Proof

Privacy

Ramp $(t_1, t_2, n-1)$ -sharing with $t_1 = T$ gives perfect privacy against $T$ colluders (Chapter 3 §3.4). The coded distance computation reveals only Euclidean distances $\|\mathbf{g}_i - \mathbf{g}_j\|^2$ , which are symmetric and distance-only — no individual gradient is leaked beyond the aggregate.

Byzantine correctness

The ramp threshold $t_2 = T + 2B + 1$ is chosen so that the corresponding Reed-Solomon code has minimum distance $2B + 2$ , allowing decoding with up to $B$ errors (Singleton bound). Byzantine users may corrupt up to $B$ of the shares, but the remaining $n - B \geq t_2 - B = T + B + 1 \geq T + 1$ shares suffice for correct decoding.

Communication

Per-user share upload: each gradient scalar is ramp-shared into pieces of size $1/g = 1/(2B+1)$ of the original. Aggregate: $n \cdot d / g = nd / (2B+1)$ scalars + commitment overhead. Coded-distance reconstruction adds $O(n^2)$ field operations + $O(n \log n)$ messages (Lagrange interpolation).

Combined

Total communication: $O(nd / (2B+1) + n \log n + B d) = O(n \log n + Bd)$ . The first term dominates for small $B$ ; the second for large $B$ . $\blacksquare$

Example: ByzSecAgg in Production: Numbers

Compute the per-round communication for ByzSecAgg with $n = 100$ users, $B = 10$ Byzantine bound, $T = 20$ privacy threshold, $d = 10^7$ gradient dimension. Compare with Bonawitz + Krum naively composed.

Solution

ByzSecAgg cost

Per-user upload: $d / (2B+1) = 10^7 / 21 \approx 5 \cdot 10^5$ scalars + $\log d \approx 23$ bits for commitment. Aggregate per round: $n \cdot 5 \cdot 10^5 \cdot 32 + n \log n + B d = 1.6 \cdot 10^9 + n \log n + B d$ bits = $\sim 200$ MB.

Bonawitz + Krum

Bonawitz per-user: $d \cdot 32 = 3.2 \cdot 10^8$ bits = $40$ MB. Plus Krum requires plaintext gradients — destroys privacy.

Naive replication of gradients across users (to provide both privacy via SecAgg AND Byzantine defense) would multiply by a Byzantine-tolerance factor: per-user $\sim B \cdot d \cdot 32 / 8 = 4 \cdot 10^8$ bits = $50$ MB times $n = 100 \cdot 50$ MB = $5$ GB aggregate. Almost $25\times$ ByzSecAgg's overhead.

Conclusion

ByzSecAgg is asymptotically much better, but at small $n$ the constant factors matter. For $n = 100$ , both are tractable; ByzSecAgg's advantage grows quickly with $n$ .

ByzSecAgg: Six-Phase Protocol Flow

Animation of the ByzSecAgg protocol's six phases: commitment, ramp-shared upload, coded distance computation, Krum filtering, aggregation on filtered set, and verification. Highlights how each primitive composes for the combined guarantee.

ByzSecAgg vs. Naive Composition: Communication Cost

Plot the per-round communication of ByzSecAgg ( $O(n \log n + Bd)$ ) and naive Bonawitz + Krum composition ( $O(n^2 + nBd)$ ) as a function of $n$ , for fixed $B/n$ ratio and $d$ . ByzSecAgg's asymptotic advantage is dramatic at large $n$ .

Parameters

n

max500

B/n

— Byzantine fraction0.1

d

— model size10000000

Common Mistake: Each Primitive Is Necessary

Mistake:

Drop one of the three primitives (e.g., omit vector commitments) for "simplicity".

Correction:

Each primitive plays a distinct role:

Ramp sharing: privacy + Byzantine error correction structure.
Coded distance computation: detection without inspection.
Vector commitments: integrity (prevent share modification).

Omitting any one breaks at least one guarantee: no commitments → Byzantine users can change shares after distance computation; no ramp → privacy gone or shares too large; no coded distances → must use plaintext gradients (privacy lost). The composition is required for the combined guarantee.

🔧Engineering Note

ByzSecAgg in Production: Status

ByzSecAgg's production adoption is limited as of 2024:

Research deployments: TU Berlin (CommIT group), USC, NVIDIA Clara research division.
Engineering challenges: Multi-phase protocol with peer-to-peer communication is harder to orchestrate than single-round Bonawitz; vector commitment cryptography adds engineering complexity.
Latency: $\sim 5$ – $10\times$ a Bonawitz round due to the multi-phase structure.
Fit: Best for cross-silo FL with moderate $n$ and high adversarial-resilience requirements.

For lighter-weight deployments, alternatives like Bonawitz + DP noise (with weaker Byzantine tolerance) remain dominant. The ByzSecAgg result establishes the information-theoretic frontier; production tooling is catching up.

Practical Constraints

•
Latency: 5–10 $\times$ Bonawitz per round
•
Cryptographic complexity: vector commitments add engineering work
•
Best fit: cross-silo FL with $n \sim 10$ – $100$ , high security needs

📋 Ref: Jahani-Nezhad et al. 2023; CommIT group implementation

Key Takeaway

ByzSecAgg achieves Byzantine + privacy + $O(n \log n + Bd)$ communication. The protocol composes ramp secret sharing, polynomial-coded distance computation, and vector commitments. The construction is the third CommIT contribution of Part III and a major step toward full privacy-and-robustness guarantees in production FL. Section 11.3 develops the coded distance computation in detail.

Quick Check

The ByzSecAgg protocol uses three primitives. Match each primitive to its role:

Ramp secret sharing: integrity / commitment.

Ramp sharing: privacy + Byzantine-error correction structure. Coded distance computation: outlier detection without learning individual gradients. Vector commitments: integrity (prevents share modification after the fact).

Vector commitments: privacy.

Coded distance computation: encryption.

Correction:

Ramp sharing: privacy + Byzantine-error correction structure. Coded distance computation: outlier detection without learning individual gradients. Vector commitments: integrity (prevents share modification after the fact).

Each primitive has a distinct role; the composition delivers all guarantees simultaneously.

The ByzSecAgg Protocol (CommIT Contribution)

The Three Primitives Behind ByzSecAgg

ByzSecAgg: Byzantine-Resilient Secure Aggregation

Definition: ByzSecAgg Protocol Outline

ByzSecAgg

Vector Commitment

ByzSecAgg — Server-Side Operations

Theorem: ByzSecAgg: Privacy + Byzantine + Communication

Privacy

Byzantine correctness

Communication

Combined

Example: ByzSecAgg in Production: Numbers

ByzSecAgg cost

Bonawitz + Krum

Conclusion

ByzSecAgg: Six-Phase Protocol Flow

ByzSecAgg vs. Naive Composition: Communication Cost

Parameters

Common Mistake: Each Primitive Is Necessary

ByzSecAgg in Production: Status

Key Takeaway

Quick Check

Definition:
ByzSecAgg Protocol Outline