Training and Deployment Considerations

Making ML Work in Production

The theoretical sections of this chapter showed that ML receivers can match or beat classical. The practical question is: how do you actually train, deploy, and maintain them? Training data, infrastructure, drift, privacy, and adversarial robustness are all real challenges. This section discusses them end-to-end and offers deployment patterns for 6G OTFS with ML.

Definition:
Training Data for ML OTFS

Training data for ML OTFS receivers comes in three categories:

Simulated: synthetic channels from 3GPP models (Urban Micro, Rural Macro, HST). Cheap, diverse, controllable. ~ $10^6$ samples, few hours generation.

Emulated: channel emulator (hardware or high-fidelity software) reproduces real-world conditions. Intermediate fidelity.

Real-world: measured channels from BS/UE logs. Most authentic but expensive and privacy-constrained. Typical: $10^4$ - $10^5$ samples.

Best practice: train on simulated for bulk, fine-tune on emulated for site-specific, adapt online on real-world for final tuning. Combined: 80% simulation, 15% emulation, 5% real-world.

Theorem: Simulation-to-Real Gap

A NN trained on pure simulation typically shows $\sim 2$ - $3$ dB BER degradation when deployed on real channels (the "sim-to-real gap"). The gap arises from:

Hardware non-idealities not captured in simulation.
Fractional Doppler structure different from idealized models.
Interference patterns from real co-deployed services.

Mitigation:

Domain randomization: train on wide parameter distributions (not fixed values). Reduces gap to $\sim 1$ dB.
Domain adaptation: fine-tune on small real-world data. Recovers most of the gap.
Adversarial training: inject synthetic perturbations. Improves OOD robustness.

Total: with these mitigations, sim-to-real gap drops to $\sim 0.5$ - $1$ dB. Acceptable for deployment.

Simulation is the friend of ML training (infinite data, perfect labels), but a naive simulator produces non-representative data. Good practice: make the simulator as realistic as feasible (include HW imperfections, non-Gaussian noise, specific band behaviors). Then fine-tune on real-world. The sim-to-real gap is the standard ML deployment challenge, well-studied.

Proof

Distribution shift

Training distribution $P(x)$ ≠ deployment distribution $P'(x)$ . NN performance on $P'$ : bound by Wasserstein distance between $P$ and $P'$ .

Simulation weakness

Standard 3GPP models miss: HW effects (IQ imbalance, PA clipping), real spectra, interference statistics.

Mitigation effects

Domain randomization: broadens $P$ . Domain adaptation: shifts $P$ toward $P'$ .

Remaining gap

0.5-1 dB is the intrinsic sim-real gap for well-designed systems. Accept and plan for it. $\blacksquare$

Definition:
Federated Learning for OTFS

Federated learning (FL) trains ML models without centralizing training data:

Each UE/BS trains locally on its own channel observations.
Periodically, local model updates (gradients or weights) are aggregated at a server.
Server averages updates and distributes the new global model.
Cycle repeats.

Advantages:

Privacy: UE data never leaves the UE.
Efficiency: no centralized data collection.
Personalization: each UE can fine-tune the global model locally.

Disadvantages:

Communication overhead for model updates.
Convergence slower than centralized.
Federated clients may have biased data.

OTFS application: federated training of NN detectors and learned pilots. Model updates: $\sim 10$ MB per UE per round. 100 rounds to convergence. Total: 1 GB — acceptable.

Theorem: Federated vs Centralized Performance

Federated learning with $K$ clients, each with $N$ samples, achieves performance comparable to centralized training on $K \cdot N$ samples, subject to:

Clients have independent data (i.i.d. assumption).
Aggregation happens frequently enough.

Gap: FL converges in $\mathcal{O}(K)$ more rounds than centralized, but total compute same.

Practical 6G: $K = 100$ - $1000$ UEs contributing. Each has $10^3$ - $10^4$ frames. Total: $10^5$ - $10^7$ samples equivalent. Sufficient for NN OTFS detector training.

FL achieves a Privacy-Performance trade-off. Centralized has access to all data but violates privacy. FL protects privacy at slight training cost. For commercial 6G where privacy regulation (GDPR, CCPA) forbids centralized data collection, FL is the only viable approach.

Proof

Gradient aggregation

Each client's gradient $\nabla L_k$ : unbiased estimator of $\nabla L$ from $K N$ samples.

FedAvg

Server averages $\sum_k \nabla L_k / K$ . Expected: same as centralized gradient.

Communication overhead

Each round: $K \cdot |\theta|$ bytes. Total rounds: $\mathcal{O}(T)$ . Typically 1 GB total.

Convergence rate

FL converges in $\mathcal{O}(T)$ rounds (same as centralized). Per-round compute same as central. Total: same. $\blacksquare$

Example: Federated OTFS Receiver Deployment

A Tier-1 operator deploys federated learned-pilot for 6G OTFS. 1000 UEs across 10 cities. Describe the training/deployment flow.

Solution

Initialization

Vendor trains initial NN model on simulated data. Deploys to all UEs and BSs. Convergence: hand-crafted baseline.

Federated training

Each UE trains local model on its channel logs. Sends model update to operator CPU. Aggregator averages updates. Federated round: every few hours. Convergence: weeks.

Personalization

Global model + per-UE fine-tune. Each UE stores small personalization head. Improves UE-specific performance by 1-2 dB.

Deployment

Monthly global model update. UE-side personalization continuous.

Performance

After 10 weeks: federated model at 95% of best-central performance. Privacy preserved. Continuous improvement as more UEs participate.

Federated vs Centralized Training Convergence

Plot training loss and inference BER vs epoch for centralized and federated learning. Sliders: client count $K$ , rounds per epoch.

Parameters

K

clients100

Rounds/epoch10

Non-IID factor0.2

Definition:
Adversarial Robustness

Adversarial examples are small, specifically-crafted perturbations to input that cause misclassification: $\hat{x}_{\mathrm{adv}} = x + \epsilon \cdot \mathrm{sign}(\nabla_x L(x))$ for adversarial loss $L$ . For OTFS receivers, adversarial perturbations can be injected into the received signal by a malicious transmitter (jamming-attack analog).

OTFS susceptibility:

Pure-NN detector: high susceptibility. Perturbation of 0.5 dB signal can cause BER to 10x worse.
Classical MP: robust. Perturbation unlikely to cause ML-specific failures.
Unfolded MP: between. Structure provides robustness.

Defenses:

Adversarial training: include adversarial examples in training data. NN learns to resist.
Detection: flag anomalous inputs and fall back to classical.
Input smoothing: denoise before NN detector. Reduces perturbation magnitude.

Theorem: Adversarial Training Performance

An NN trained with adversarial examples achieves: $\mathrm{BER}_{\mathrm{adv}} \leq 2 \cdot \mathrm{BER}_{\mathrm{clean}},$ whereas a naively-trained NN exhibits $\mathrm{BER}_{\mathrm{adv}} \geq 10 \cdot \mathrm{BER}_{\mathrm{clean}}$ under attack.

Cost: $\sim 2\times$ training time. $\sim 0.5$ dB worse on clean data.

For 6G V2X safety, adversarial training is mandatory.

Adversarial robustness is critical for safety-critical applications. The extra training time is a one-time cost; the inference remains unchanged. Clean-data performance: slight degradation. Under attack: huge improvement.

Proof

Adversarial risk

Standard training minimizes expected loss over $P$ . Adversarial training minimizes max over $\epsilon$ -ball of each input.

Convergence

Adversarial loss is a min-max problem. Solved by projected gradient or Madry's method. Converges in ~ $2\times$ rounds.

Performance

Under attack: NN bounded. $\mathrm{BER}_{\mathrm{adv}} \leq 2 \cdot \mathrm{BER}_{\mathrm{clean}}$ . Clean: slight penalty. $\blacksquare$

🔧Engineering Note

ML OTFS Deployment Patterns

ML OTFS deployment patterns:

Pattern 1: Pre-trained vendor models

Vendor (Qualcomm, MediaTek) trains NN detectors on broad simulated data.
Deployed to UE chip. Fixed. No UE-side training.
Pro: simple. Con: no personalization.

Pattern 2: Federated personalization

Vendor pre-trains. UE fine-tunes on local data.
Federated updates for global improvement.
Pro: privacy-preserving. Con: complex coordination.

Pattern 3: Online learning

UE continuously fine-tunes NN on measured channel.
Fastest adaptation. Privacy-preserving.
Pro: best performance in stable environments. Con: sensitive to environment changes.

Pattern 4: Hybrid

Vendor model + Federated updates + online fine-tuning.
Maximum flexibility. Standard for 6G commercial deployment.

2026 reality: Pattern 1 dominates (simpler). Pattern 2 in pilot. Pattern 4 expected 2030+.

Practical Constraints

•
Pattern 1: vendor-only (simplest, fixed)
•
Pattern 2: federated (privacy + personalization)
•
Pattern 3: online (adaptive, noisy)
•
Pattern 4: hybrid (6G commercial)

Common Mistake: Respect Privacy Constraints

Mistake:

Centralizing UE channel data for ML training, violating GDPR, CCPA, or similar privacy regulations.

Correction:

Design ML training for privacy from the ground up:

Federated learning (§3): training happens at UE.
Differential privacy: add noise to model updates.
Secure aggregation: cryptographic sum of updates.
Anonymization: remove UE identifiers before any data handling.

Compliance costs: training complexity increases by $\sim 20\%$ . Performance gap vs non-private: $\sim 1$ dB. Acceptable for regulatory compliance. Commercial deployments 2028+ will mandate privacy-preserving ML for physical layer.

Model-Based Deep Unfolding Chapter Summary

Training and Deployment Considerations

Making ML Work in Production

Definition: Training Data for ML OTFS

Theorem: Simulation-to-Real Gap

Distribution shift

Simulation weakness

Mitigation effects

Remaining gap

Definition: Federated Learning for OTFS

Theorem: Federated vs Centralized Performance

Gradient aggregation

FedAvg

Communication overhead

Convergence rate

Example: Federated OTFS Receiver Deployment

Initialization

Federated training

Personalization

Deployment

Performance

Federated vs Centralized Training Convergence

Parameters

Definition: Adversarial Robustness

Theorem: Adversarial Training Performance

Adversarial risk

Convergence

Performance

ML OTFS Deployment Patterns

Common Mistake: Respect Privacy Constraints

Definition:
Training Data for ML OTFS

Definition:
Federated Learning for OTFS

Definition:
Adversarial Robustness