Model-Based vs Data-Driven Design

Choosing the Right ML Paradigm for Wireless

The preceding sections presented three paradigms for applying machine learning to wireless communications:

Black-box neural networks (Section 31.1): Learn mappings directly from data without incorporating domain knowledge.
Deep unfolding / model-based ML (Section 31.2): Embed the structure of known algorithms into the network architecture.
Reinforcement learning (Section 31.3): Learn policies through interaction without explicit supervision.

This section provides a systematic comparison of model-based and data-driven (black-box) approaches along four axes: sample efficiency, generalisation, interpretability, and computational cost. The goal is to equip the reader with design guidelines for selecting the appropriate paradigm for a given wireless problem.

The comparison is particularly important in wireless, where domain knowledge (channel models, signal processing theory, information-theoretic bounds) is rich and well-established. The central question is: how much domain knowledge should be injected into the ML model, and in what form?

Definition:
The Model-Based to Data-Driven Spectrum

ML approaches for wireless can be placed on a continuum from fully model-based to fully data-driven:

$\underbrace{\text{Classical algorithm}}_{\text{no learning}} \;\longrightarrow\; \underbrace{\text{Algorithm + learned hyperparams}}_{\text{minimal learning}} \;\longrightarrow\; \underbrace{\text{Deep unfolding}}_{\text{structured learning}} \;\longrightarrow\; \underbrace{\text{NN with domain features}}_{\text{informed learning}} \;\longrightarrow\; \underbrace{\text{Black-box NN}}_{\text{full learning}}$

Each step to the right trades domain knowledge for flexibility:

More domain knowledge $\Rightarrow$ fewer parameters $\Rightarrow$ better sample efficiency, interpretability
More flexibility $\Rightarrow$ more parameters $\Rightarrow$ better potential performance under model mismatch, but higher data requirement

The optimal operating point depends on:

How accurate the domain model is
How much training data is available
The computational budget for training and inference
Whether interpretability/certifiability is required

Sample Efficiency: The Key Advantage of Model-Based ML

Sample efficiency measures how many training examples are needed to achieve a target performance level. It is arguably the most important criterion in wireless, where labelled training data is expensive (requires over-the-air measurements or high-fidelity simulation).

Consider sparse channel estimation with $N = 32$ subcarriers, $M = 16$ measurements, and sparsity $s = 3$ :

Black-box NN ( $M \to 32 \to N$ network): $16 \times 32 + 32 + 32 \times 32 + 32 = 1600$ parameters. Needs $\sim$ 500--1000 training samples to converge.
LISTA (10 layers, only thresholds learned): 10 scalar thresholds $= 10$ parameters. Needs $\sim$ 20--50 training samples (or even 0 if initialised from ISTA and used without fine-tuning).

This $20\text{--}100\times$ advantage in sample efficiency arises because LISTA encodes the structure of the problem (the measurement matrix $\mathbf{A}$ , the proximal operator) into the architecture. The black-box NN must learn this structure from data, requiring far more examples.

Rule of thumb: As a rough guide, the number of training samples should be at least $5\text{--}10\times$ the number of learnable parameters for good generalisation. Model-based approaches, by having fewer parameters, directly reduce the data requirement.

Model-Based vs Black-Box Sample Efficiency

Compare the NMSE (normalised mean-squared error) of a model-based approach (LISTA with known measurement matrix) and a black-box neural network, as a function of the number of training samples. The model-based approach leverages the measurement matrix $\mathbf{A}$ to initialise its weights, achieving low NMSE even with few training samples. The black-box NN must learn everything from data and requires substantially more samples to match. Increase the SNR to see both methods improve (less noise), but the relative advantage of the model-based approach persists, especially in the low-data regime.

Parameters

Max training samples100

SNR (dB)15

Generalisation: Robustness to Distribution Shift

Generalisation refers to the ability to perform well on data that differs from the training distribution. In wireless, distribution shifts are the rule rather than the exception:

A model trained in urban macrocell channels is deployed in an indoor small cell.
A model trained at 3.5 GHz is used at 28 GHz.
A model trained with 4 users encounters 8 users.

Model-based approaches generalise better because their inductive bias captures physical invariants that hold across distributions:

LISTA inherits the ISTA convergence guarantee for any measurement matrix, so even with learned thresholds it remains a valid proximal operator.
An unfolded WMMSE algorithm preserves the alternating optimisation structure that guarantees monotone sum-rate improvement.

Black-box NNs, in contrast, may memorise training-distribution- specific patterns that fail under shift. Empirically, deep unfolding methods degrade gracefully (e.g., 2--5 dB NMSE increase) under moderate distribution shift, while black-box networks can catastrophically fail (10--20 dB degradation or worse).

Mitigation strategies for black-box models:

Domain randomisation: Train on a diverse mixture of channel models, SNR levels, and system configurations.
Meta-learning: Train a model that can quickly adapt to new distributions with a few gradient steps (MAML, Reptile).
Online fine-tuning: Continuously update the model on incoming data at deployment.

Systematic Comparison

The following table summarises the key trade-offs:

Criterion	Model-Based (Deep Unfolding)	Data-Driven (Black-Box NN)
Sample efficiency	Excellent (few parameters to learn)	Poor (many parameters)
Generalisation	Good (physics-based inductive bias)	Variable (depends on training diversity)
Performance ceiling	Limited by algorithm structure	Potentially higher (more flexible)
Interpretability	High (each layer = one algorithm iteration)	Low (opaque mapping)
Design effort	High (must derive the unfolded algorithm)	Low (standard NN architecture)
Inference speed	Fast ( $L$ layers, structured operations)	Variable (depends on architecture)
Adaptability	Moderate (re-train thresholds)	High (re-train entire network)
Theoretical analysis	Easier (convergence guarantees exist)	Harder (few guarantees)

When to use model-based ML:

The underlying algorithm is known and performs reasonably well
Training data is scarce ( $< 1000$ labelled samples)
Interpretability or safety certification is required
The deployment environment may differ from training

When to use black-box NN:

No good algorithm exists for the problem
Abundant training data is available ( $> 10\,000$ samples)
The channel model is too complex for closed-form treatment
Maximum performance is the priority over interpretability

Example: Choosing an ML Approach for MIMO Detection

A $16 \times 16$ MIMO system with 64-QAM modulation requires a detector. The classical MMSE detector achieves acceptable BER at high SNR but degrades at low SNR and under channel estimation errors. The engineering team has access to 1000 labelled training samples (transmitted symbol, received signal pairs) and requires the detector to work across a range of channel conditions.

Recommend an ML approach and justify your choice.

Solution

Assess the problem characteristics

Known algorithm exists: MMSE detection, $\hat{\mathbf{x}} = (\mathbf{H}^{H}\mathbf{H} + \sigma^2\mathbf{I})^{-1}\mathbf{H}^{H}\mathbf{y}$ . Also iterative detectors: gradient descent projected onto the constellation (projected gradient descent, PGD).
Training data: 1000 samples --- moderate.
Deployment: Multiple channel conditions --- robustness is important.
Dimensions: $\mathbf{y} \in \mathbb{C}^{16}$ , $\mathbf{x} \in \{64\text{-QAM}\}^{16}$ --- high-dimensional discrete output.

Evaluate options

Black-box NN ( $32 \to 128 \to 64 \to 16$ per real dim): $\sim$ 12,000 parameters. With 1000 training samples and a rule-of-thumb of $5\text{--}10\times$ params, this is data-starved. Likely to overfit.
Deep unfolding of PGD ("DetNet"): Unfold 10 iterations of projected gradient descent. Per-layer parameters: step size $\alpha^{(l)}$ and a learned perturbation $\mathbf{v}^{(l)}$ . Total: $\sim$ 200 parameters. Well within the data budget.
MMSE + learned residual: Use MMSE as a first stage, then train a small NN to correct the residual. $\sim$ 500 parameters. Also feasible.

Recommendation

Deep unfolding (DetNet) is the recommended approach:

Sample efficiency: 200 parameters vs 1000 samples gives a comfortable $5\times$ ratio.
Generalisation: The PGD structure ensures that the detector always moves toward lower detection cost, providing a safety net under distribution shift.
Interpretability: Each layer has a clear meaning (one gradient descent step + projection).
Performance: Published results show that 10-layer DetNet matches near-ML performance at a fraction of the ML detector's complexity.

The black-box NN would be preferred only if the channel model were highly non-standard (e.g., severe hardware impairments that invalidate the linear model $\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n}$ ) and abundant data were available.

Hybrid Approaches: The Best of Both Worlds

In practice, the most successful wireless ML systems are hybrid, combining model-based and data-driven components:

Model-based backbone + NN refinement: Use a classical algorithm (MMSE, OFDM, LDPC decoder) as the primary processing pipeline and attach a small NN to correct residual errors (e.g., non-linear PA distortion, imperfect CSI). This is robust (the classical algorithm handles the bulk of the processing) and data-efficient (the NN only needs to learn the residual).
NN feature extraction + model-based decision: Use a CNN or transformer to extract features from raw I/Q samples, then feed these features into a model-based algorithm (e.g., beamforming based on extracted AOA/AOD). The NN handles the hard perception task; the algorithm handles the structured optimisation.
Learned hyperparameters: Keep the algorithm fixed but learn its hyperparameters (step sizes, regularisation weights, constellation scaling) from data. This is the lightest-weight ML integration and often provides surprisingly large gains.

The overarching principle is: inject as much domain knowledge as possible into the architecture, and let the data fill in what the model misses.

Open Challenges in ML for Wireless

Despite the rapid progress, several fundamental challenges remain open:

1. Standardisation and deployment. Unlike computer vision (where ImageNet and ResNet are universal benchmarks), wireless ML lacks standardised datasets, channel models, and evaluation protocols. The O-RAN Alliance is working toward open interfaces that enable ML-based RAN intelligent controllers (RICs), but deployment in commercial networks remains limited.

2. Real-time constraints. Physical-layer processing must complete within microseconds (OFDM symbol duration $\sim$ 70 $\mu$ s in 5G). NN inference on specialised hardware (FPGAs, ASICs) can meet this, but training and adaptation at such timescales is an open problem.

3. Safety and robustness. Adversarial examples can fool NN-based detectors and decoders. Providing formal guarantees (e.g., worst-case BER bounds) for ML-based systems is largely unsolved.

4. Transfer across systems. A model trained for one carrier frequency, antenna configuration, or environment must adapt to new conditions. Meta-learning and few-shot adaptation are promising but not yet mature for wireless.

5. Energy efficiency. Training large ML models has a significant carbon footprint. Developing green AI methods that achieve good performance with minimal computation is critical for sustainable deployment.

Model-Based vs Data-Driven ML for Wireless

Criterion	Model-Based (Deep Unfolding)	Data-Driven (Black-Box NN)
Sample efficiency	Excellent (10-100 parameters)	Poor (1000+ parameters)
Generalisation	Good (physics-based bias)	Variable (depends on training)
Performance ceiling	Limited by algorithm	Potentially higher
Interpretability	High (layer = iteration)	Low (opaque)
Design effort	High (derive unfolded form)	Low (standard architecture)
Inference speed	Fast (structured ops)	Variable
Theoretical analysis	Convergence guarantees	Few guarantees
Best when	Scarce data, known algorithm	Abundant data, complex model

Key Takeaway

The central principle for ML in wireless is: inject as much domain knowledge as possible into the architecture, and let the data fill in what the model misses. Deep unfolding achieves 10--100 $\times$ better sample efficiency than black-box networks by encoding algorithm structure as an inductive bias. In the data-scarce regime that characterises wireless (labelled samples cost over-the-air measurements or high-fidelity simulation), model-based ML is almost always the right starting point. Black-box networks should be reserved for problems where no suitable algorithm exists or the channel model is too complex for analytical treatment.

Why This Matters: Secure and Distributed Computing in the SC Book

The federated learning and secure aggregation techniques in this chapter connect to the broader theory of secure and distributed computing developed in the SC (Secure Computing) book:

Secure aggregation protocols: MPC-based approaches, Shamir's secret sharing, and the ByzSecAgg framework (CommIT contribution: Jahani-Nezhad, Maddah-Ali, Caire)
Differential privacy: Formal privacy guarantees and the privacy-utility trade-off in gradient sharing
Byzantine fault tolerance: Detecting and mitigating adversarial client updates in distributed training
Over-the-air computation: Using the wireless MAC channel for native gradient aggregation

Readers interested in the theoretical foundations of privacy- preserving distributed learning should consult the SC book.

Quick Check

A research team develops a deep unfolding network for sparse channel estimation. The network has 15 learnable thresholds (one per layer) and is initialised with the known measurement matrix. A competing team trains a fully connected black-box network with 5000 parameters from random initialisation. Both teams have access to 200 training channel realisations. Which outcome is most likely?

The black-box network significantly outperforms deep unfolding because it has more parameters and thus more capacity

Both networks perform identically because 200 samples is sufficient for either approach

The deep unfolding network outperforms the black-box because 200 samples provides ample data for 15 parameters but severely under-trains 5000 parameters

Neither network will work because 200 samples is always insufficient for ML

Correction: