Ferkans — Interactive Telecom Tutor

Why Matrix Multiplication Is Not Enough

Chapters 5 and 6 gave optimal coded-computing schemes for matrix multiplication and gradient aggregation. The obvious question is: what about convolutions? What about general multivariate-polynomial tensor operations? Modern ML workloads are dominated by these: convolutions power every CNN layer; higher-order tensor contractions appear in attention mechanisms and in Einstein-summation-style ops. If coded computing cannot handle them, its practical impact is limited.

The point is that the polynomial-code framework extends naturally to these operations. The key generalization is to encode inputs as polynomials whose product (or, more generally, composition) has the desired output blocks as coefficients. For convolutions, this is the Yu / Maddah-Ali / Avestimehr "entangled polynomial code" (§8.2). For arbitrary multivariate polynomials, this is the Lagrange Coded Computing framework of Yu / Raviv / Soleymani / Avestimehr (§8.3).

The extensions have finite limits: coded computing handles polynomial (multilinear) operations well, but non-polynomial operations (ReLU activations, softmax) require approximations or hybrid approaches. Section 8.4 closes the chapter with these limitations and open problems.

Definition:
Convolution as Polynomial Multiplication

For vectors $\mathbf{a} = (a_0, \ldots, a_{d-1})$ and $\mathbf{b} = (b_0, \ldots, b_{d'-1})$ over $\mathbb{F}_q$ , the linear convolution $\mathbf{c} = \mathbf{a} * \mathbf{b}$ is the vector of length $d + d' - 1$ with entries $c_k = \sum_{i + j = k} a_i b_j, \qquad k = 0, 1, \ldots, d + d' - 2.$ Equivalently, defining generating polynomials $p_{\mathbf{a}}(x) = \sum_i a_i x^i$ and $p_{\mathbf{b}}(x) = \sum_j b_j x^j$ , the product $p_{\mathbf{a}}(x) \cdot p_{\mathbf{b}}(x) = p_{\mathbf{c}}(x) = \sum_k c_k x^k$ has the convolution output as its coefficients.

This algebraic identity is what makes coded convolution possible: evaluate the two input polynomials at $N$ points, let each worker compute the local product, then interpolate the output polynomial from the responses.

Why Naive Coded-Matrix-Mult Is Not Coded Convolution

Convolution can be expressed as a matrix-vector product: $\mathbf{c} = \mathbf{M}(\mathbf{a}) \mathbf{b}$ where $\mathbf{M}(\mathbf{a})$ is the circulant or Toeplitz matrix built from $\mathbf{a}$ . One could therefore apply the polynomial-code matrix-multiplication framework of Chapter 5 with $\mathbf{M}(\mathbf{a})$ as the data matrix.

The catch: $\mathbf{M}(\mathbf{a})$ has size $(d + d' - 1) \times d'$ and is itself a function of $\mathbf{a}$ . Encoding $\mathbf{M}(\mathbf{a})$ via polynomial codes requires evaluating $\mathbf{M}$ at multiple points, which means committing to which $\mathbf{a}$ before distributing. If both $\mathbf{a}$ and $\mathbf{b}$ are inputs (and we want to code over both), the naive approach fails.

The point is that convolution is bilinear in its two inputs, not merely linear in one. Entangled polynomial codes (§8.2) handle this by encoding both inputs jointly in a single polynomial whose product structure matches the convolution.

Definition:
Tensor Contraction

Let $\mathbf{T}_A \in \mathbb{F}_q^{d_1 \times d_2 \times d_3}$ and $\mathbf{T}_B \in \mathbb{F}_q^{d_3 \times d_4 \times d_5}$ be tensors of order 3 each. Their tensor contraction along the common dimension $d_3$ is $\mathbf{T}_C \in \mathbb{F}_q^{d_1 \times d_2 \times d_4 \times d_5}$ with entries $T_C[i_1, i_2, i_4, i_5] = \sum_{i_3 = 1}^{d_3} T_A[i_1, i_2, i_3] \cdot T_B[i_3, i_4, i_5].$ Matrix multiplication is the special case with $d_1, d_2, d_4, d_5 = m, d, d'$ and $d_3 = d$ . Convolution is the special case with a Toeplitz-structured $T_B$ .

More general multilinear operations — multi-tensor contractions, polynomial evaluations on tensor inputs — can be composed from these primitives. The Lagrange Coded Computing framework of §8.3 handles arbitrary multivariate polynomial functions.

Tensor Contraction

Summation over a shared dimension between two tensors, generalizing matrix multiplication. Convolutions, attention mechanisms, and higher-order Einstein sums all reduce to tensor contractions.

Entangled Polynomial Code

A polynomial-code variant in which both input matrices are encoded jointly (their exponents "entangled" across $x$ -powers) to achieve a lower recovery threshold $K = p + q - 1$ than the standard polynomial code's $K = pq$ . Chapter 5 §5.4's MatDot codes are the prototype.

Theorem: Recovery Threshold for Coded Convolution

For the $(p, q)$ -partitioned convolution $\mathbf{c} = \mathbf{a} * \mathbf{b}$ with $\mathbf{a}$ split into $p$ contiguous chunks and $\mathbf{b}$ split into $q$ contiguous chunks, the minimum recovery threshold is $K_{\text{conv}}^* \;=\; p + q - 1.$ This is achieved by the entangled polynomial code (MatDot variant) of §5.4, specialized to the convolution setting. The recovery threshold is smaller than the polynomial code's $K = pq$ for general matrix multiplication because convolution's output has fewer "degrees of freedom" — its structure is richer than a full-rank matrix product.

A general matrix product has $pq$ distinct output blocks, all independent. The convolution output has only $p + q - 1$ distinct "convolution coefficients" (the degree of the product polynomial is $d + d' - 2$ , which with $p + q - 1$ distinct block positions gives a smaller output size). So the recovery threshold scales linearly in $p + q$ , not quadratically.

Operationally, this means that convolutions can be distributed with better straggler tolerance than matrix multiplication at the same per-worker storage. This is why convolutions in CNN training have been a successful target for coded-computing deployments.

Proof

Convolution polynomial structure

$p_{\mathbf{a}}(x) p_{\mathbf{b}}(x)$ has degree $(p - 1) + (q - 1) = p + q - 2$ . Its coefficients are the $p + q - 1$ distinct convolution outputs.

Achievability via MatDot

Encoding $\mathbf{a}$ as a polynomial of degree $p - 1$ and $\mathbf{b}$ as a polynomial of degree $q - 1$ with entangled exponent choices: each worker $k$ computes $p_{\mathbf{a}}(\alpha_k) \cdot p_{\mathbf{b}}(\alpha_k) = p_{\mathbf{c}}(\alpha_k)$ , a degree- $(p + q - 2)$ polynomial evaluation. Any $p + q - 1$ evaluations interpolate the polynomial.

Converse

The output has $p + q - 1$ distinct coefficients; each worker response is one evaluation of the product polynomial. By the rank/dimension counting argument (Chapter 5 §5.3), any scheme needs at least $p + q - 1$ responses. Hence $K^* = p + q - 1$ . $\blacksquare$

Example: Coded Convolution: $p = 2, q = 3$

Construct an entangled polynomial code for convolution with $\mathbf{a}$ split into $p = 2$ chunks and $\mathbf{b}$ split into $q = 3$ chunks. Specify the encoding polynomials and verify the recovery threshold.

Solution

Encoding

$p_{\mathbf{a}}(x) = \mathbf{a}_1 + x \mathbf{a}_2$ (degree 1). $p_{\mathbf{b}}(x) = \mathbf{b}_1 + x \mathbf{b}_2 + x^2 \mathbf{b}_3$ (degree 2).

Product polynomial

$p_{\mathbf{c}}(x) = p_{\mathbf{a}}(x) p_{\mathbf{b}}(x)$ has degree $1 + 2 = 3$ . Its four coefficients are $\mathbf{c}_0 = \mathbf{a}_1 \mathbf{b}_1$ , $\mathbf{c}_1 = \mathbf{a}_1 \mathbf{b}_2 + \mathbf{a}_2 \mathbf{b}_1$ , $\mathbf{c}_2 = \mathbf{a}_1 \mathbf{b}_3 + \mathbf{a}_2 \mathbf{b}_2$ , $\mathbf{c}_3 = \mathbf{a}_2 \mathbf{b}_3$ .

Recovery

$K_{\text{conv}}^* = p + q - 1 = 4$ distinct evaluations suffice to interpolate $p_{\mathbf{c}}$ . With $N$ workers, the master tolerates $N - 4$ stragglers.

Comparison

Standard polynomial code for matrix multiplication with the same $(p, q)$ would need $K = pq = 6$ responses. Convolution saves $pq - (p + q - 1) = 2$ responses thanks to its richer output structure.

Convolution vs. Matrix Multiplication Recovery Threshold

Plot the recovery threshold $K$ for coded convolution ( $K = p + q - 1$ ) vs. coded matrix multiplication ( $K = pq$ ) as a function of the partition counts. Convolution's linear-in-partition-sum threshold is dramatically better than matrix multiplication's quadratic-in-partition-product for large partitions — though both schemes use the same per-worker storage.

Parameters

p

max8

Range of partition counts (with q = p)

N

— workers24

Number of workers

Common Mistake: Coded Convolution is Not MatDot Coded Matrix-Multiplication

Mistake:

Assume the MatDot coded-matrix-mult threshold $K = p + q - 1$ applies to general matrix multiplication.

Correction:

MatDot coded matrix multiplication (Chapter 5 §5.4) achieves $K = p + q - 1$ only at higher per-worker storage than the standard polynomial code, relaxing the $\mu = 1/p + 1/q$ optimality of polynomial codes. For convolution, $K = p + q - 1$ is the tight bound at any per-worker storage $\geq 1/p + 1/q$ — because convolution's output structure has fewer degrees of freedom. The distinction matters: convolution benefits from entangled polynomial codes "for free"; general matrix multiplication pays a storage penalty.

🔧Engineering Note

Coded Convolutions in CNN Training

CNN training distributes both the forward pass (where each layer is a convolution) and the backward pass (which involves transposed convolutions — also convolutions under Fourier conjugation). Coded convolution can be applied at each layer, distributing the $d \times d$ filter convolutions across workers. Early results (Dutta et al. 2018) showed $2$ – $3\times$ speedup on AlexNet-scale CNNs with moderate stragglers.

The practical adoption has been limited: the per-layer coding overhead is not negligible, and many production systems use simple data-parallel replication instead. For very large convolutional networks (Transformer-variants with attention-as-convolution), coded computing becomes attractive when the per-layer dimension exceeds a few thousand.

Practical Constraints

•
Dutta et al. 2018: AlexNet with $2$ – $3\times$ straggler-resilience speedup
•
Per-layer coding overhead: $\sim 10$ – $20\%$ wall-clock
•
Production systems (PyTorch, TF) do not routinely deploy coded convolutions

📋 Ref: Dutta et al. 2018 IEEE T-IT §III; NVIDIA cuDNN

Historical Note: The Rise of Coded Tensor Operations

2017–2019

The polynomial-code framework of Yu et al. (2017) for matrix multiplication was quickly extended: Dutta et al. (2018) added convolutions; Yu et al. (2019) generalized to arbitrary multivariate-polynomial functions via Lagrange Coded Computing (LCC). By 2020, coded computing covered essentially every linear and multilinear operation in standard ML pipelines. The extensions to non-polynomial operations (ReLU, softmax, cross-entropy) remain limited to approximations and hybrid schemes — a frontier discussed in §8.4.

,

Key Takeaway

Convolution's recovery threshold is $p + q - 1$ , linear in the partitions; matrix multiplication's is $pq$ , quadratic. The improvement comes from the richer algebraic structure of the convolution output: fewer degrees of freedom per output coefficient. Section 8.2 makes this precise via entangled polynomial codes; §8.3 generalizes to arbitrary multivariate polynomials via LCC.

Quick Check

For a $(p = 4, q = 5)$ -partitioned convolution, the coded-convolution recovery threshold is:

$K = 8$ ( $p + q - 1$ )

$K = 20$ ( $pq$ )

$K = 9$ ( $p + q$ )

$K = \max(p, q) = 5$