Beyond Matrix Multiplication
Why Matrix Multiplication Is Not Enough
Chapters 5 and 6 gave optimal coded-computing schemes for matrix multiplication and gradient aggregation. The obvious question is: what about convolutions? What about general multivariate-polynomial tensor operations? Modern ML workloads are dominated by these: convolutions power every CNN layer; higher-order tensor contractions appear in attention mechanisms and in Einstein-summation-style ops. If coded computing cannot handle them, its practical impact is limited.
The point is that the polynomial-code framework extends naturally to these operations. The key generalization is to encode inputs as polynomials whose product (or, more generally, composition) has the desired output blocks as coefficients. For convolutions, this is the Yu / Maddah-Ali / Avestimehr "entangled polynomial code" (Β§8.2). For arbitrary multivariate polynomials, this is the Lagrange Coded Computing framework of Yu / Raviv / Soleymani / Avestimehr (Β§8.3).
The extensions have finite limits: coded computing handles polynomial (multilinear) operations well, but non-polynomial operations (ReLU activations, softmax) require approximations or hybrid approaches. Section 8.4 closes the chapter with these limitations and open problems.
Definition: Convolution as Polynomial Multiplication
Convolution as Polynomial Multiplication
For vectors and over , the linear convolution is the vector of length with entries Equivalently, defining generating polynomials and , the product has the convolution output as its coefficients.
This algebraic identity is what makes coded convolution possible: evaluate the two input polynomials at points, let each worker compute the local product, then interpolate the output polynomial from the responses.
Why Naive Coded-Matrix-Mult Is Not Coded Convolution
Convolution can be expressed as a matrix-vector product: where is the circulant or Toeplitz matrix built from . One could therefore apply the polynomial-code matrix-multiplication framework of Chapter 5 with as the data matrix.
The catch: has size and is itself a function of . Encoding via polynomial codes requires evaluating at multiple points, which means committing to which before distributing. If both and are inputs (and we want to code over both), the naive approach fails.
The point is that convolution is bilinear in its two inputs, not merely linear in one. Entangled polynomial codes (Β§8.2) handle this by encoding both inputs jointly in a single polynomial whose product structure matches the convolution.
Definition: Tensor Contraction
Tensor Contraction
Let and be tensors of order 3 each. Their tensor contraction along the common dimension is with entries Matrix multiplication is the special case with and . Convolution is the special case with a Toeplitz-structured .
More general multilinear operations β multi-tensor contractions, polynomial evaluations on tensor inputs β can be composed from these primitives. The Lagrange Coded Computing framework of Β§8.3 handles arbitrary multivariate polynomial functions.
Tensor Contraction
Summation over a shared dimension between two tensors, generalizing matrix multiplication. Convolutions, attention mechanisms, and higher-order Einstein sums all reduce to tensor contractions.
Entangled Polynomial Code
A polynomial-code variant in which both input matrices are encoded jointly (their exponents "entangled" across -powers) to achieve a lower recovery threshold than the standard polynomial code's . Chapter 5 Β§5.4's MatDot codes are the prototype.
Theorem: Recovery Threshold for Coded Convolution
For the -partitioned convolution with split into contiguous chunks and split into contiguous chunks, the minimum recovery threshold is This is achieved by the entangled polynomial code (MatDot variant) of Β§5.4, specialized to the convolution setting. The recovery threshold is smaller than the polynomial code's for general matrix multiplication because convolution's output has fewer "degrees of freedom" β its structure is richer than a full-rank matrix product.
A general matrix product has distinct output blocks, all independent. The convolution output has only distinct "convolution coefficients" (the degree of the product polynomial is , which with distinct block positions gives a smaller output size). So the recovery threshold scales linearly in , not quadratically.
Operationally, this means that convolutions can be distributed with better straggler tolerance than matrix multiplication at the same per-worker storage. This is why convolutions in CNN training have been a successful target for coded-computing deployments.
Convolution polynomial structure
has degree . Its coefficients are the distinct convolution outputs.
Achievability via MatDot
Encoding as a polynomial of degree and as a polynomial of degree with entangled exponent choices: each worker computes , a degree- polynomial evaluation. Any evaluations interpolate the polynomial.
Converse
The output has distinct coefficients; each worker response is one evaluation of the product polynomial. By the rank/dimension counting argument (Chapter 5 Β§5.3), any scheme needs at least responses. Hence .
Example: Coded Convolution:
Construct an entangled polynomial code for convolution with split into chunks and split into chunks. Specify the encoding polynomials and verify the recovery threshold.
Encoding
(degree 1). (degree 2).
Product polynomial
has degree . Its four coefficients are , , , .
Recovery
distinct evaluations suffice to interpolate . With workers, the master tolerates stragglers.
Comparison
Standard polynomial code for matrix multiplication with the same would need responses. Convolution saves responses thanks to its richer output structure.
Convolution vs. Matrix Multiplication Recovery Threshold
Plot the recovery threshold for coded convolution () vs. coded matrix multiplication () as a function of the partition counts. Convolution's linear-in-partition-sum threshold is dramatically better than matrix multiplication's quadratic-in-partition-product for large partitions β though both schemes use the same per-worker storage.
Parameters
Range of partition counts (with q = p)
Number of workers
Common Mistake: Coded Convolution is Not MatDot Coded Matrix-Multiplication
Mistake:
Assume the MatDot coded-matrix-mult threshold applies to general matrix multiplication.
Correction:
MatDot coded matrix multiplication (Chapter 5 Β§5.4) achieves only at higher per-worker storage than the standard polynomial code, relaxing the optimality of polynomial codes. For convolution, is the tight bound at any per-worker storage β because convolution's output structure has fewer degrees of freedom. The distinction matters: convolution benefits from entangled polynomial codes "for free"; general matrix multiplication pays a storage penalty.
Coded Convolutions in CNN Training
CNN training distributes both the forward pass (where each layer is a convolution) and the backward pass (which involves transposed convolutions β also convolutions under Fourier conjugation). Coded convolution can be applied at each layer, distributing the filter convolutions across workers. Early results (Dutta et al. 2018) showed β speedup on AlexNet-scale CNNs with moderate stragglers.
The practical adoption has been limited: the per-layer coding overhead is not negligible, and many production systems use simple data-parallel replication instead. For very large convolutional networks (Transformer-variants with attention-as-convolution), coded computing becomes attractive when the per-layer dimension exceeds a few thousand.
- β’
Dutta et al. 2018: AlexNet with β straggler-resilience speedup
- β’
Per-layer coding overhead: β wall-clock
- β’
Production systems (PyTorch, TF) do not routinely deploy coded convolutions
Historical Note: The Rise of Coded Tensor Operations
2017β2019The polynomial-code framework of Yu et al. (2017) for matrix multiplication was quickly extended: Dutta et al. (2018) added convolutions; Yu et al. (2019) generalized to arbitrary multivariate-polynomial functions via Lagrange Coded Computing (LCC). By 2020, coded computing covered essentially every linear and multilinear operation in standard ML pipelines. The extensions to non-polynomial operations (ReLU, softmax, cross-entropy) remain limited to approximations and hybrid schemes β a frontier discussed in Β§8.4.
Key Takeaway
Convolution's recovery threshold is , linear in the partitions; matrix multiplication's is , quadratic. The improvement comes from the richer algebraic structure of the convolution output: fewer degrees of freedom per output coefficient. Section 8.2 makes this precise via entangled polynomial codes; Β§8.3 generalizes to arbitrary multivariate polynomials via LCC.
Quick Check
For a -partitioned convolution, the coded-convolution recovery threshold is:
()
()
()
. Linear in the partition sum β much better than the matrix- multiplication threshold .