Ferkans — Interactive Telecom Tutor

Beyond the Sum

Section §16.2 computed $\sum_k s_k$ . Many federated learning, distributed sensing, and consensus tasks want other aggregates: the arithmetic mean, the geometric mean, the max, the empirical variance. At first glance these are different problems — the wireless MAC only adds. Which functions can AirComp compute in one channel use?

The answer is all nomographic functions: those admitting the decomposition $f(s_1, \ldots, s_n) \;=\; \psi\!\left(\sum_{k=1}^{n} \varphi_k(s_k)\right),$ with pre-processing $\varphi_k$ per user and post-processing $\psi$ at the receiver. The receiver gets a sum, then un-does the pre-processing. Kolmogorov-Arnold and Sprecher established that every continuous function of $n$ variables admits a nomographic representation with universal $\psi$ — so the AirComp class is, in principle, all continuous aggregates. The practical question is whether $\varphi_k$ , $\psi$ are nice (e.g., Lipschitz, monotone) and whether the noise $\mathbf{w}/\eta$ is amplified by $\psi'$ . This section builds the catalog.

,

Definition:
Nomographic Function

A function $f: \mathcal{X}^n \to \mathbb{R}$ is nomographic if there exist pre-processing maps $\varphi_1, \ldots, \varphi_n : \mathcal{X} \to \mathbb{R}$ and a post-processing map $\psi : \mathbb{R} \to \mathbb{R}$ such that $f(s_1, \ldots, s_n) \;=\; \psi\!\left(\sum_{k=1}^{n} \varphi_k(s_k)\right).$

The wireless channel computes the inner sum via MAC superposition. Each user transmits $x_k = b_k \varphi_k(s_k)$ (possibly with different transmit powers across users because $\varphi_k(s_k)$ has different dynamic range), the receiver recovers $\sum_k \varphi_k(s_k)$ (via the analysis of §16.2), and applies $\psi$ .

Nomographic functions are AirComp's native computation class.

Example: A Catalog of Nomographic Aggregates

Express each of the following aggregates in nomographic form, identifying $\varphi_k$ and $\psi$ .

(a) Arithmetic mean: $\bar{s} = \frac{1}{n}\sum_{k=1}^{n} s_k$

(b) Weighted mean: $\sum_{k=1}^{n} w_k s_k$ with known weights $w_k$

(c) Geometric mean: $\left(\prod_{k=1}^{n} s_k\right)^{1/n}$ for $s_k > 0$

(d) Empirical second moment: $\frac{1}{n}\sum_{k=1}^{n} s_k^2$

(e) Maximum (approximate): $\max_k s_k$

Solution

(a) Arithmetic mean

$\varphi_k(s) = s$ , $\psi(u) = u/n$ . The receiver divides by $n$ — trivial.

(b) Weighted mean

$\varphi_k(s) = w_k s$ , $\psi(u) = u$ . Users pre-multiply by their weight; the MAC adds the weighted terms.

(c) Geometric mean

$\varphi_k(s) = \log s$ , $\psi(u) = \exp(u/n)$ . Users transmit $\log s_k$ ; MAC accumulates $\sum_k \log s_k$ ; receiver exponentiates and divides exponent by $n$ .

(d) Second moment

$\varphi_k(s) = s^2 / n$ , $\psi(u) = u$ . Users send their squared values scaled by $1/n$ .

(e) Approximate maximum

For large $p$ , $\max_k s_k \approx \left(\sum_k s_k^p\right)^{1/p}$ (smooth-max approximation). $\varphi_k(s) = s^p$ , $\psi(u) = u^{1/p}$ . Higher $p$ gives a sharper max at the cost of numerical dynamic range — $\varphi_k$ amplifies outliers.

Operational interpretation

AirComp's native aggregates cover all moments, weighted sums, and products (via logarithm). Combined with post-processing, this spans almost every FL/sensing aggregate. The non-nomographic operations — e.g., sorting, median — require multiple channel uses.

Theorem: Kolmogorov–Arnold Representation

Every continuous function $f: [0,1]^n \to \mathbb{R}$ admits a nomographic representation: there exist continuous functions $\psi$ and $\varphi_{k,j}$ , and constants $\lambda_j$ , such that $f(s_1, \ldots, s_n) \;=\; \sum_{j=1}^{2n+1} \psi_j\!\left(\sum_{k=1}^{n} \lambda_{k,j}\, \varphi_{j}(s_k)\right).$ In particular, $f$ decomposes as a finite sum of nomographic terms, each of which can be computed over the MAC in one channel use. The whole computation takes at most $2n + 1$ channel uses.

Proof

Kolmogorov 1957

Kolmogorov's original superposition theorem showed existence of the representation; the $\varphi_j$ can be chosen universal (independent of $f$ ) and Hölder-continuous.

Sprecher's constructive form

Sprecher 1965 made the representation explicit: the universal functions $\varphi_j$ are piecewise linear, computable, and Lipschitz; the $\psi_j$ absorb the specific $f$ .

Consequence for AirComp

Every continuous aggregate can be computed in $2n+1$ AirComp rounds. For nomographic aggregates (one sum), a single round suffices. The theorem guarantees universal pre-processing — no need to re-design $\varphi_k$ per function $f$ .

Practical caveat

Kolmogorov's $\varphi_j$ are Hölder-continuous but not smooth. They amplify perturbations non-uniformly. Practical AirComp systems use smooth $\varphi_k$ tailored to specific aggregates (mean, log, power) — trading universality for noise tolerance.

,

Theorem: Post-Processing Noise Amplification

Let the AirComp receiver estimate $u = \sum_k \varphi_k(s_k)$ with MSE $\mathsf{MSE}_u = \sigma^2/\eta^2$ (Theorem 16.2.1), and let the final estimate be $\hat{f} = \psi(\hat{u})$ . Assume $\psi$ is differentiable at the true $u^{\star} = \sum_k \varphi_k(s_k)$ with derivative $\psi'(u^{\star})$ . For small noise (high SNR), the first-order MSE on $f$ is $\mathsf{MSE}_f \;\approx\; |\psi'(u^{\star})|^2 \cdot \mathsf{MSE}_u \;=\; \frac{|\psi'(u^{\star})|^2 \, \sigma^2}{\eta^2}.$

Proof

Linearization of $\psi$

Write $\hat{u} = u^{\star} + \delta$ with $\delta \sim \mathcal{CN}(0, \mathsf{MSE}_u)$ . Taylor-expand: $\psi(\hat{u}) = \psi(u^{\star}) + \psi'(u^{\star})\, \delta + O(\delta^2)$ .

MSE

$\mathbb{E}[|\psi(\hat{u}) - \psi(u^{\star})|^2] = |\psi'(u^{\star})|^2\, \mathbb{E}[|\delta|^2] + O(\mathsf{MSE}_u^2)$ . The first-order term dominates for small $\mathsf{MSE}_u$ .

Operational interpretation

Nomographic aggregates with steep $\psi$ amplify noise: the geometric mean uses $\psi = \exp(u/n)/n$ ; its derivative at the true value is $(\prod_k s_k)^{1/n}/n$ . For small source values, this derivative is small — noise is attenuated. For large values, the derivative is large — noise is amplified. Pre-processing choice matters.

Effective MSE for Common Nomographic Aggregates

Compare the end-to-end AirComp MSE $\mathsf{MSE}_f = |\psi'|^2 \cdot \sigma^2/\eta^2$ for three nomographic aggregates: the arithmetic mean ( $\psi'=1/n$ ), the geometric mean ( $\psi'=\bar{s}/n$ at the true value $\bar{s}$ ), and the second moment ( $\psi'=1$ ). Sweep transmit power; observe how the choice of pre/post-processing shapes the MSE.

Parameters

n

— number of users10

\bar{s}

— typical source value1

P_{\max}

(dB)30

⚠️Engineering Note

Designing $\varphi_k$ in Practice

Practical nomographic AirComp design:

Match dynamic range. If $\varphi_k(s_k)$ varies by orders of magnitude across users, the weak-user bottleneck (§16.2) worsens. Center and scale $\varphi_k$ to approximately match across users before applying power control.
Log pre-processing for multiplicative aggregates. Geometric mean, product, likelihood aggregation all use $\varphi_k = \log$ . Numerical care: clamp $s_k \geq \epsilon > 0$ to avoid $-\infty$ .
Choose $\psi$ to be Lipschitz at operating point. Smooth-max ( $\varphi_k = s^p$ , $\psi = u^{1/p}$ ) has $\psi'(u^{\star}) = u^{\star(1-p)/p}/p$ — large when $u^{\star}$ is small and $p$ is large. The approximation sharpens the max but amplifies noise. Balance empirically.
Universality vs. tailoring. Kolmogorov's universal $\varphi$ works for every $f$ but is only Hölder-continuous — noise amplification is uncontrolled. Bespoke $\varphi_k$ for specific aggregates (FL gradient averaging, for example) is nearly always better in practice.
Dither for debiasing. For non-linear $\psi$ , AirComp is biased (Jensen's gap). Small dither at the transmitters can reduce this bias at modest MSE cost.

Practical Constraints

•
Dynamic range equalization across users
•
Log pre-processing for multiplicative aggregates; clamp $s_k \geq \epsilon$
•
Choose $\psi$ Lipschitz at operating point
•
Dither for bias reduction

📋 Ref: Goldenbaum 2013; Yang et al. 2020

,

Common Mistake: AirComp of Non-Linear Aggregates Is Biased

Mistake:

Treat the post-processed estimate $\hat{f} = \psi(\hat{u})$ as unbiased when $\psi$ is non-linear.

Correction:

For non-linear $\psi$ , $\mathbb{E}[\psi(\hat{u})] = \psi(u^{\star}) + \tfrac{1}{2}\psi''(u^{\star}) \mathsf{MSE}_u + O(\mathsf{MSE}_u^{3/2})$ (Jensen's second-order gap). The second term is the bias. For quadratic $\psi$ (e.g., variance estimation) the bias is exactly $\tfrac{1}{2}\psi''(u^{\star}) \mathsf{MSE}_u$ and does not vanish as $n$ grows. Only linear post-processing ( $\psi$ affine) produces strictly unbiased AirComp. Design around this: (i) dither with known statistics and debias analytically; (ii) use linear $\psi$ when possible; (iii) accept a small bias in exchange for noise-efficient non-linear AirComp.

Key Takeaway

AirComp natively computes any nomographic function $f = \psi(\sum_k \varphi_k(s_k))$ in one channel use. This spans means, weighted sums, geometric means, second moments, and smooth maxima. Kolmogorov-Arnold guarantees every continuous aggregate has a nomographic representation (possibly multi-channel-use). Noise is amplified by $|\psi'|$ at the operating point; non-linear $\psi$ introduces an $O(\mathsf{MSE}_u)$ bias. The function class is wide; the engineering is in matching pre-processing to aggregate.

Historical Note: From Kolmogorov's 13th Problem to the Wireless MAC

Kolmogorov's 1957 theorem on function superposition — originally a resolution of Hilbert's 13th problem on solving polynomial equations via compositions of continuous functions — lay dormant in pure math for four decades. Nazer and Gastpar 2007 noticed that the wireless MAC's natural summation makes it a physical computer for the nomographic form, coining "computation over multiple-access channels." Goldenbaum 2013 made the connection explicit for engineering practice: every continuous FL aggregate decomposes into at most $2n+1$ nomographic terms, each computable in one AirComp channel use. The bridge between classical mathematics and wireless systems turned a 55-year-old existence theorem into a practical aggregation framework.

, ,

Quick Check

Which of the following aggregates can AirComp compute in a single channel use?

The median of $s_1, \ldots, s_n$ .

The empirical variance $\frac{1}{n}\sum_k (s_k - \bar{s})^2$ .

The geometric mean $(\prod_k s_k)^{1/n}$ , $s_k > 0$ .

Sorting $s_1, \ldots, s_n$ into increasing order.

Correction:

The geometric mean

(\prod_k s_k)^{1/n}

,

s_k > 0

.

$\varphi_k(s) = \log s$ , $\psi(u) = \exp(u/n)$ . One MAC use suffices.

Nomographic Functions and Pre-Processing

Beyond the Sum

Definition: Nomographic Function

Example: A Catalog of Nomographic Aggregates

(a) Arithmetic mean

(b) Weighted mean

(c) Geometric mean

(d) Second moment

(e) Approximate maximum

Operational interpretation

Theorem: Kolmogorov–Arnold Representation

Kolmogorov 1957

Sprecher's constructive form

Consequence for AirComp

Practical caveat

Theorem: Post-Processing Noise Amplification

Linearization of $\psi$

MSE

Operational interpretation

Effective MSE for Common Nomographic Aggregates

Parameters

Designing φk\varphi_kφk​ in Practice

Common Mistake: AirComp of Non-Linear Aggregates Is Biased

Key Takeaway

Historical Note: From Kolmogorov's 13th Problem to the Wireless MAC

Quick Check

Definition:
Nomographic Function

Designing $\varphi_k$ in Practice