Nomographic Functions and Pre-Processing

Beyond the Sum

Section §16.2 computed ksk\sum_k s_k. Many federated learning, distributed sensing, and consensus tasks want other aggregates: the arithmetic mean, the geometric mean, the max, the empirical variance. At first glance these are different problems — the wireless MAC only adds. Which functions can AirComp compute in one channel use?

The answer is all nomographic functions: those admitting the decomposition f(s1,,sn)  =  ψ ⁣(k=1nφk(sk)),f(s_1, \ldots, s_n) \;=\; \psi\!\left(\sum_{k=1}^{n} \varphi_k(s_k)\right), with pre-processing φk\varphi_k per user and post-processing ψ\psi at the receiver. The receiver gets a sum, then un-does the pre-processing. Kolmogorov-Arnold and Sprecher established that every continuous function of nn variables admits a nomographic representation with universal ψ\psi — so the AirComp class is, in principle, all continuous aggregates. The practical question is whether φk\varphi_k, ψ\psi are nice (e.g., Lipschitz, monotone) and whether the noise w/η\mathbf{w}/\eta is amplified by ψ\psi'. This section builds the catalog.

,

Definition:

Nomographic Function

A function f:XnRf: \mathcal{X}^n \to \mathbb{R} is nomographic if there exist pre-processing maps φ1,,φn:XR\varphi_1, \ldots, \varphi_n : \mathcal{X} \to \mathbb{R} and a post-processing map ψ:RR\psi : \mathbb{R} \to \mathbb{R} such that f(s1,,sn)  =  ψ ⁣(k=1nφk(sk)).f(s_1, \ldots, s_n) \;=\; \psi\!\left(\sum_{k=1}^{n} \varphi_k(s_k)\right).

The wireless channel computes the inner sum via MAC superposition. Each user transmits xk=bkφk(sk)x_k = b_k \varphi_k(s_k) (possibly with different transmit powers across users because φk(sk)\varphi_k(s_k) has different dynamic range), the receiver recovers kφk(sk)\sum_k \varphi_k(s_k) (via the analysis of §16.2), and applies ψ\psi.

Nomographic functions are AirComp's native computation class.

Example: A Catalog of Nomographic Aggregates

Express each of the following aggregates in nomographic form, identifying φk\varphi_k and ψ\psi.

(a) Arithmetic mean: sˉ=1nk=1nsk\bar{s} = \frac{1}{n}\sum_{k=1}^{n} s_k

(b) Weighted mean: k=1nwksk\sum_{k=1}^{n} w_k s_k with known weights wkw_k

(c) Geometric mean: (k=1nsk)1/n\left(\prod_{k=1}^{n} s_k\right)^{1/n} for sk>0s_k > 0

(d) Empirical second moment: 1nk=1nsk2\frac{1}{n}\sum_{k=1}^{n} s_k^2

(e) Maximum (approximate): maxksk\max_k s_k

Theorem: Kolmogorov–Arnold Representation

Every continuous function f:[0,1]nRf: [0,1]^n \to \mathbb{R} admits a nomographic representation: there exist continuous functions ψ\psi and φk,j\varphi_{k,j}, and constants λj\lambda_j, such that f(s1,,sn)  =  j=12n+1ψj ⁣(k=1nλk,jφj(sk)).f(s_1, \ldots, s_n) \;=\; \sum_{j=1}^{2n+1} \psi_j\!\left(\sum_{k=1}^{n} \lambda_{k,j}\, \varphi_{j}(s_k)\right). In particular, ff decomposes as a finite sum of nomographic terms, each of which can be computed over the MAC in one channel use. The whole computation takes at most 2n+12n + 1 channel uses.

,

Theorem: Post-Processing Noise Amplification

Let the AirComp receiver estimate u=kφk(sk)u = \sum_k \varphi_k(s_k) with MSE MSEu=σ2/η2\mathsf{MSE}_u = \sigma^2/\eta^2 (Theorem 16.2.1), and let the final estimate be f^=ψ(u^)\hat{f} = \psi(\hat{u}). Assume ψ\psi is differentiable at the true u=kφk(sk)u^{\star} = \sum_k \varphi_k(s_k) with derivative ψ(u)\psi'(u^{\star}). For small noise (high SNR), the first-order MSE on ff is MSEf    ψ(u)2MSEu  =  ψ(u)2σ2η2.\mathsf{MSE}_f \;\approx\; |\psi'(u^{\star})|^2 \cdot \mathsf{MSE}_u \;=\; \frac{|\psi'(u^{\star})|^2 \, \sigma^2}{\eta^2}.

Effective MSE for Common Nomographic Aggregates

Compare the end-to-end AirComp MSE MSEf=ψ2σ2/η2\mathsf{MSE}_f = |\psi'|^2 \cdot \sigma^2/\eta^2 for three nomographic aggregates: the arithmetic mean (ψ=1/n\psi'=1/n), the geometric mean (ψ=sˉ/n\psi'=\bar{s}/n at the true value sˉ\bar{s}), and the second moment (ψ=1\psi'=1). Sweep transmit power; observe how the choice of pre/post-processing shapes the MSE.

Parameters
10
1
30
⚠️Engineering Note

Designing φk\varphi_k in Practice

Practical nomographic AirComp design:

  • Match dynamic range. If φk(sk)\varphi_k(s_k) varies by orders of magnitude across users, the weak-user bottleneck (§16.2) worsens. Center and scale φk\varphi_k to approximately match across users before applying power control.

  • Log pre-processing for multiplicative aggregates. Geometric mean, product, likelihood aggregation all use φk=log\varphi_k = \log. Numerical care: clamp skϵ>0s_k \geq \epsilon > 0 to avoid -\infty.

  • Choose ψ\psi to be Lipschitz at operating point. Smooth-max (φk=sp\varphi_k = s^p, ψ=u1/p\psi = u^{1/p}) has ψ(u)=u(1p)/p/p\psi'(u^{\star}) = u^{\star(1-p)/p}/plarge when uu^{\star} is small and pp is large. The approximation sharpens the max but amplifies noise. Balance empirically.

  • Universality vs. tailoring. Kolmogorov's universal φ\varphi works for every ff but is only Hölder-continuous — noise amplification is uncontrolled. Bespoke φk\varphi_k for specific aggregates (FL gradient averaging, for example) is nearly always better in practice.

  • Dither for debiasing. For non-linear ψ\psi, AirComp is biased (Jensen's gap). Small dither at the transmitters can reduce this bias at modest MSE cost.

Practical Constraints
  • Dynamic range equalization across users

  • Log pre-processing for multiplicative aggregates; clamp skϵs_k \geq \epsilon

  • Choose ψ\psi Lipschitz at operating point

  • Dither for bias reduction

📋 Ref: Goldenbaum 2013; Yang et al. 2020
,

Common Mistake: AirComp of Non-Linear Aggregates Is Biased

Mistake:

Treat the post-processed estimate f^=ψ(u^)\hat{f} = \psi(\hat{u}) as unbiased when ψ\psi is non-linear.

Correction:

For non-linear ψ\psi, E[ψ(u^)]=ψ(u)+12ψ(u)MSEu+O(MSEu3/2)\mathbb{E}[\psi(\hat{u})] = \psi(u^{\star}) + \tfrac{1}{2}\psi''(u^{\star}) \mathsf{MSE}_u + O(\mathsf{MSE}_u^{3/2}) (Jensen's second-order gap). The second term is the bias. For quadratic ψ\psi (e.g., variance estimation) the bias is exactly 12ψ(u)MSEu\tfrac{1}{2}\psi''(u^{\star}) \mathsf{MSE}_u and does not vanish as nn grows. Only linear post-processing (ψ\psi affine) produces strictly unbiased AirComp. Design around this: (i) dither with known statistics and debias analytically; (ii) use linear ψ\psi when possible; (iii) accept a small bias in exchange for noise-efficient non-linear AirComp.

Key Takeaway

AirComp natively computes any nomographic function f=ψ(kφk(sk))f = \psi(\sum_k \varphi_k(s_k)) in one channel use. This spans means, weighted sums, geometric means, second moments, and smooth maxima. Kolmogorov-Arnold guarantees every continuous aggregate has a nomographic representation (possibly multi-channel-use). Noise is amplified by ψ|\psi'| at the operating point; non-linear ψ\psi introduces an O(MSEu)O(\mathsf{MSE}_u) bias. The function class is wide; the engineering is in matching pre-processing to aggregate.

Historical Note: From Kolmogorov's 13th Problem to the Wireless MAC

Kolmogorov's 1957 theorem on function superposition — originally a resolution of Hilbert's 13th problem on solving polynomial equations via compositions of continuous functions — lay dormant in pure math for four decades. Nazer and Gastpar 2007 noticed that the wireless MAC's natural summation makes it a physical computer for the nomographic form, coining "computation over multiple-access channels." Goldenbaum 2013 made the connection explicit for engineering practice: every continuous FL aggregate decomposes into at most 2n+12n+1 nomographic terms, each computable in one AirComp channel use. The bridge between classical mathematics and wireless systems turned a 55-year-old existence theorem into a practical aggregation framework.

, ,

Quick Check

Which of the following aggregates can AirComp compute in a single channel use?

The median of s1,,sns_1, \ldots, s_n.

The empirical variance 1nk(sksˉ)2\frac{1}{n}\sum_k (s_k - \bar{s})^2.

The geometric mean (ksk)1/n(\prod_k s_k)^{1/n}, sk>0s_k > 0.

Sorting s1,,sns_1, \ldots, s_n into increasing order.