Model-Based vs Data-Driven Design

Choosing the Right ML Paradigm for Wireless

The preceding sections presented three paradigms for applying machine learning to wireless communications:

  1. Black-box neural networks (Section 31.1): Learn mappings directly from data without incorporating domain knowledge.
  2. Deep unfolding / model-based ML (Section 31.2): Embed the structure of known algorithms into the network architecture.
  3. Reinforcement learning (Section 31.3): Learn policies through interaction without explicit supervision.

This section provides a systematic comparison of model-based and data-driven (black-box) approaches along four axes: sample efficiency, generalisation, interpretability, and computational cost. The goal is to equip the reader with design guidelines for selecting the appropriate paradigm for a given wireless problem.

The comparison is particularly important in wireless, where domain knowledge (channel models, signal processing theory, information-theoretic bounds) is rich and well-established. The central question is: how much domain knowledge should be injected into the ML model, and in what form?

Definition:

The Model-Based to Data-Driven Spectrum

ML approaches for wireless can be placed on a continuum from fully model-based to fully data-driven:

Classical algorithmno learning    Algorithm + learned hyperparamsminimal learning    Deep unfoldingstructured learning    NN with domain featuresinformed learning    Black-box NNfull learning\underbrace{\text{Classical algorithm}}_{\text{no learning}} \;\longrightarrow\; \underbrace{\text{Algorithm + learned hyperparams}}_{\text{minimal learning}} \;\longrightarrow\; \underbrace{\text{Deep unfolding}}_{\text{structured learning}} \;\longrightarrow\; \underbrace{\text{NN with domain features}}_{\text{informed learning}} \;\longrightarrow\; \underbrace{\text{Black-box NN}}_{\text{full learning}}

Each step to the right trades domain knowledge for flexibility:

  • More domain knowledge \Rightarrow fewer parameters \Rightarrow better sample efficiency, interpretability
  • More flexibility \Rightarrow more parameters \Rightarrow better potential performance under model mismatch, but higher data requirement

The optimal operating point depends on:

  1. How accurate the domain model is
  2. How much training data is available
  3. The computational budget for training and inference
  4. Whether interpretability/certifiability is required

Sample Efficiency: The Key Advantage of Model-Based ML

Sample efficiency measures how many training examples are needed to achieve a target performance level. It is arguably the most important criterion in wireless, where labelled training data is expensive (requires over-the-air measurements or high-fidelity simulation).

Consider sparse channel estimation with N=32N = 32 subcarriers, M=16M = 16 measurements, and sparsity s=3s = 3:

  • Black-box NN (M32NM \to 32 \to N network): 16×32+32+32×32+32=160016 \times 32 + 32 + 32 \times 32 + 32 = 1600 parameters. Needs \sim500--1000 training samples to converge.

  • LISTA (10 layers, only thresholds learned): 10 scalar thresholds =10= 10 parameters. Needs \sim20--50 training samples (or even 0 if initialised from ISTA and used without fine-tuning).

This 20100×20\text{--}100\times advantage in sample efficiency arises because LISTA encodes the structure of the problem (the measurement matrix A\mathbf{A}, the proximal operator) into the architecture. The black-box NN must learn this structure from data, requiring far more examples.

Rule of thumb: As a rough guide, the number of training samples should be at least 510×5\text{--}10\times the number of learnable parameters for good generalisation. Model-based approaches, by having fewer parameters, directly reduce the data requirement.

Model-Based vs Black-Box Sample Efficiency

Compare the NMSE (normalised mean-squared error) of a model-based approach (LISTA with known measurement matrix) and a black-box neural network, as a function of the number of training samples. The model-based approach leverages the measurement matrix A\mathbf{A} to initialise its weights, achieving low NMSE even with few training samples. The black-box NN must learn everything from data and requires substantially more samples to match. Increase the SNR to see both methods improve (less noise), but the relative advantage of the model-based approach persists, especially in the low-data regime.

Parameters
100
15

Generalisation: Robustness to Distribution Shift

Generalisation refers to the ability to perform well on data that differs from the training distribution. In wireless, distribution shifts are the rule rather than the exception:

  • A model trained in urban macrocell channels is deployed in an indoor small cell.
  • A model trained at 3.5 GHz is used at 28 GHz.
  • A model trained with 4 users encounters 8 users.

Model-based approaches generalise better because their inductive bias captures physical invariants that hold across distributions:

  • LISTA inherits the ISTA convergence guarantee for any measurement matrix, so even with learned thresholds it remains a valid proximal operator.
  • An unfolded WMMSE algorithm preserves the alternating optimisation structure that guarantees monotone sum-rate improvement.

Black-box NNs, in contrast, may memorise training-distribution- specific patterns that fail under shift. Empirically, deep unfolding methods degrade gracefully (e.g., 2--5 dB NMSE increase) under moderate distribution shift, while black-box networks can catastrophically fail (10--20 dB degradation or worse).

Mitigation strategies for black-box models:

  • Domain randomisation: Train on a diverse mixture of channel models, SNR levels, and system configurations.
  • Meta-learning: Train a model that can quickly adapt to new distributions with a few gradient steps (MAML, Reptile).
  • Online fine-tuning: Continuously update the model on incoming data at deployment.

Systematic Comparison

The following table summarises the key trade-offs:

Criterion Model-Based (Deep Unfolding) Data-Driven (Black-Box NN)
Sample efficiency Excellent (few parameters to learn) Poor (many parameters)
Generalisation Good (physics-based inductive bias) Variable (depends on training diversity)
Performance ceiling Limited by algorithm structure Potentially higher (more flexible)
Interpretability High (each layer = one algorithm iteration) Low (opaque mapping)
Design effort High (must derive the unfolded algorithm) Low (standard NN architecture)
Inference speed Fast (LL layers, structured operations) Variable (depends on architecture)
Adaptability Moderate (re-train thresholds) High (re-train entire network)
Theoretical analysis Easier (convergence guarantees exist) Harder (few guarantees)

When to use model-based ML:

  • The underlying algorithm is known and performs reasonably well
  • Training data is scarce (<1000< 1000 labelled samples)
  • Interpretability or safety certification is required
  • The deployment environment may differ from training

When to use black-box NN:

  • No good algorithm exists for the problem
  • Abundant training data is available (>10000> 10\,000 samples)
  • The channel model is too complex for closed-form treatment
  • Maximum performance is the priority over interpretability

Example: Choosing an ML Approach for MIMO Detection

A 16×1616 \times 16 MIMO system with 64-QAM modulation requires a detector. The classical MMSE detector achieves acceptable BER at high SNR but degrades at low SNR and under channel estimation errors. The engineering team has access to 1000 labelled training samples (transmitted symbol, received signal pairs) and requires the detector to work across a range of channel conditions.

Recommend an ML approach and justify your choice.

Hybrid Approaches: The Best of Both Worlds

In practice, the most successful wireless ML systems are hybrid, combining model-based and data-driven components:

  1. Model-based backbone + NN refinement: Use a classical algorithm (MMSE, OFDM, LDPC decoder) as the primary processing pipeline and attach a small NN to correct residual errors (e.g., non-linear PA distortion, imperfect CSI). This is robust (the classical algorithm handles the bulk of the processing) and data-efficient (the NN only needs to learn the residual).

  2. NN feature extraction + model-based decision: Use a CNN or transformer to extract features from raw I/Q samples, then feed these features into a model-based algorithm (e.g., beamforming based on extracted AOA/AOD). The NN handles the hard perception task; the algorithm handles the structured optimisation.

  3. Learned hyperparameters: Keep the algorithm fixed but learn its hyperparameters (step sizes, regularisation weights, constellation scaling) from data. This is the lightest-weight ML integration and often provides surprisingly large gains.

The overarching principle is: inject as much domain knowledge as possible into the architecture, and let the data fill in what the model misses.

Open Challenges in ML for Wireless

Despite the rapid progress, several fundamental challenges remain open:

1. Standardisation and deployment. Unlike computer vision (where ImageNet and ResNet are universal benchmarks), wireless ML lacks standardised datasets, channel models, and evaluation protocols. The O-RAN Alliance is working toward open interfaces that enable ML-based RAN intelligent controllers (RICs), but deployment in commercial networks remains limited.

2. Real-time constraints. Physical-layer processing must complete within microseconds (OFDM symbol duration \sim70 μ\mus in 5G). NN inference on specialised hardware (FPGAs, ASICs) can meet this, but training and adaptation at such timescales is an open problem.

3. Safety and robustness. Adversarial examples can fool NN-based detectors and decoders. Providing formal guarantees (e.g., worst-case BER bounds) for ML-based systems is largely unsolved.

4. Transfer across systems. A model trained for one carrier frequency, antenna configuration, or environment must adapt to new conditions. Meta-learning and few-shot adaptation are promising but not yet mature for wireless.

5. Energy efficiency. Training large ML models has a significant carbon footprint. Developing green AI methods that achieve good performance with minimal computation is critical for sustainable deployment.

Model-Based vs Data-Driven ML for Wireless

CriterionModel-Based (Deep Unfolding)Data-Driven (Black-Box NN)
Sample efficiencyExcellent (10-100 parameters)Poor (1000+ parameters)
GeneralisationGood (physics-based bias)Variable (depends on training)
Performance ceilingLimited by algorithmPotentially higher
InterpretabilityHigh (layer = iteration)Low (opaque)
Design effortHigh (derive unfolded form)Low (standard architecture)
Inference speedFast (structured ops)Variable
Theoretical analysisConvergence guaranteesFew guarantees
Best whenScarce data, known algorithmAbundant data, complex model

Key Takeaway

The central principle for ML in wireless is: inject as much domain knowledge as possible into the architecture, and let the data fill in what the model misses. Deep unfolding achieves 10--100×\times better sample efficiency than black-box networks by encoding algorithm structure as an inductive bias. In the data-scarce regime that characterises wireless (labelled samples cost over-the-air measurements or high-fidelity simulation), model-based ML is almost always the right starting point. Black-box networks should be reserved for problems where no suitable algorithm exists or the channel model is too complex for analytical treatment.

Why This Matters: Secure and Distributed Computing in the SC Book

The federated learning and secure aggregation techniques in this chapter connect to the broader theory of secure and distributed computing developed in the SC (Secure Computing) book:

  • Secure aggregation protocols: MPC-based approaches, Shamir's secret sharing, and the ByzSecAgg framework (CommIT contribution: Jahani-Nezhad, Maddah-Ali, Caire)
  • Differential privacy: Formal privacy guarantees and the privacy-utility trade-off in gradient sharing
  • Byzantine fault tolerance: Detecting and mitigating adversarial client updates in distributed training
  • Over-the-air computation: Using the wireless MAC channel for native gradient aggregation

Readers interested in the theoretical foundations of privacy- preserving distributed learning should consult the SC book.

Quick Check

A research team develops a deep unfolding network for sparse channel estimation. The network has 15 learnable thresholds (one per layer) and is initialised with the known measurement matrix. A competing team trains a fully connected black-box network with 5000 parameters from random initialisation. Both teams have access to 200 training channel realisations. Which outcome is most likely?

The black-box network significantly outperforms deep unfolding because it has more parameters and thus more capacity

Both networks perform identically because 200 samples is sufficient for either approach

The deep unfolding network outperforms the black-box because 200 samples provides ample data for 15 parameters but severely under-trains 5000 parameters

Neither network will work because 200 samples is always insufficient for ML