Training and Deployment Considerations
Making ML Work in Production
The theoretical sections of this chapter showed that ML receivers can match or beat classical. The practical question is: how do you actually train, deploy, and maintain them? Training data, infrastructure, drift, privacy, and adversarial robustness are all real challenges. This section discusses them end-to-end and offers deployment patterns for 6G OTFS with ML.
Definition: Training Data for ML OTFS
Training Data for ML OTFS
Training data for ML OTFS receivers comes in three categories:
Simulated: synthetic channels from 3GPP models (Urban Micro, Rural Macro, HST). Cheap, diverse, controllable. ~ samples, few hours generation.
Emulated: channel emulator (hardware or high-fidelity software) reproduces real-world conditions. Intermediate fidelity.
Real-world: measured channels from BS/UE logs. Most authentic but expensive and privacy-constrained. Typical: - samples.
Best practice: train on simulated for bulk, fine-tune on emulated for site-specific, adapt online on real-world for final tuning. Combined: 80% simulation, 15% emulation, 5% real-world.
Theorem: Simulation-to-Real Gap
A NN trained on pure simulation typically shows - dB BER degradation when deployed on real channels (the "sim-to-real gap"). The gap arises from:
- Hardware non-idealities not captured in simulation.
- Fractional Doppler structure different from idealized models.
- Interference patterns from real co-deployed services.
Mitigation:
- Domain randomization: train on wide parameter distributions (not fixed values). Reduces gap to dB.
- Domain adaptation: fine-tune on small real-world data. Recovers most of the gap.
- Adversarial training: inject synthetic perturbations. Improves OOD robustness.
Total: with these mitigations, sim-to-real gap drops to - dB. Acceptable for deployment.
Simulation is the friend of ML training (infinite data, perfect labels), but a naive simulator produces non-representative data. Good practice: make the simulator as realistic as feasible (include HW imperfections, non-Gaussian noise, specific band behaviors). Then fine-tune on real-world. The sim-to-real gap is the standard ML deployment challenge, well-studied.
Distribution shift
Training distribution ≠ deployment distribution . NN performance on : bound by Wasserstein distance between and .
Simulation weakness
Standard 3GPP models miss: HW effects (IQ imbalance, PA clipping), real spectra, interference statistics.
Mitigation effects
Domain randomization: broadens . Domain adaptation: shifts toward .
Remaining gap
0.5-1 dB is the intrinsic sim-real gap for well-designed systems. Accept and plan for it.
Definition: Federated Learning for OTFS
Federated Learning for OTFS
Federated learning (FL) trains ML models without centralizing training data:
- Each UE/BS trains locally on its own channel observations.
- Periodically, local model updates (gradients or weights) are aggregated at a server.
- Server averages updates and distributes the new global model.
- Cycle repeats.
Advantages:
- Privacy: UE data never leaves the UE.
- Efficiency: no centralized data collection.
- Personalization: each UE can fine-tune the global model locally.
Disadvantages:
- Communication overhead for model updates.
- Convergence slower than centralized.
- Federated clients may have biased data.
OTFS application: federated training of NN detectors and learned pilots. Model updates: MB per UE per round. 100 rounds to convergence. Total: 1 GB — acceptable.
Theorem: Federated vs Centralized Performance
Federated learning with clients, each with samples, achieves performance comparable to centralized training on samples, subject to:
- Clients have independent data (i.i.d. assumption).
- Aggregation happens frequently enough.
Gap: FL converges in more rounds than centralized, but total compute same.
Practical 6G: - UEs contributing. Each has - frames. Total: - samples equivalent. Sufficient for NN OTFS detector training.
FL achieves a Privacy-Performance trade-off. Centralized has access to all data but violates privacy. FL protects privacy at slight training cost. For commercial 6G where privacy regulation (GDPR, CCPA) forbids centralized data collection, FL is the only viable approach.
Gradient aggregation
Each client's gradient : unbiased estimator of from samples.
FedAvg
Server averages . Expected: same as centralized gradient.
Communication overhead
Each round: bytes. Total rounds: . Typically 1 GB total.
Convergence rate
FL converges in rounds (same as centralized). Per-round compute same as central. Total: same.
Example: Federated OTFS Receiver Deployment
A Tier-1 operator deploys federated learned-pilot for 6G OTFS. 1000 UEs across 10 cities. Describe the training/deployment flow.
Initialization
Vendor trains initial NN model on simulated data. Deploys to all UEs and BSs. Convergence: hand-crafted baseline.
Federated training
Each UE trains local model on its channel logs. Sends model update to operator CPU. Aggregator averages updates. Federated round: every few hours. Convergence: weeks.
Personalization
Global model + per-UE fine-tune. Each UE stores small personalization head. Improves UE-specific performance by 1-2 dB.
Deployment
Monthly global model update. UE-side personalization continuous.
Performance
After 10 weeks: federated model at 95% of best-central performance. Privacy preserved. Continuous improvement as more UEs participate.
Federated vs Centralized Training Convergence
Plot training loss and inference BER vs epoch for centralized and federated learning. Sliders: client count , rounds per epoch.
Parameters
Definition: Adversarial Robustness
Adversarial Robustness
Adversarial examples are small, specifically-crafted perturbations to input that cause misclassification: for adversarial loss . For OTFS receivers, adversarial perturbations can be injected into the received signal by a malicious transmitter (jamming-attack analog).
OTFS susceptibility:
- Pure-NN detector: high susceptibility. Perturbation of 0.5 dB signal can cause BER to 10x worse.
- Classical MP: robust. Perturbation unlikely to cause ML-specific failures.
- Unfolded MP: between. Structure provides robustness.
Defenses:
- Adversarial training: include adversarial examples in training data. NN learns to resist.
- Detection: flag anomalous inputs and fall back to classical.
- Input smoothing: denoise before NN detector. Reduces perturbation magnitude.
Theorem: Adversarial Training Performance
An NN trained with adversarial examples achieves: whereas a naively-trained NN exhibits under attack.
Cost: training time. dB worse on clean data.
For 6G V2X safety, adversarial training is mandatory.
Adversarial robustness is critical for safety-critical applications. The extra training time is a one-time cost; the inference remains unchanged. Clean-data performance: slight degradation. Under attack: huge improvement.
Adversarial risk
Standard training minimizes expected loss over . Adversarial training minimizes max over -ball of each input.
Convergence
Adversarial loss is a min-max problem. Solved by projected gradient or Madry's method. Converges in ~ rounds.
Performance
Under attack: NN bounded. . Clean: slight penalty.
ML OTFS Deployment Patterns
ML OTFS deployment patterns:
Pattern 1: Pre-trained vendor models
- Vendor (Qualcomm, MediaTek) trains NN detectors on broad simulated data.
- Deployed to UE chip. Fixed. No UE-side training.
- Pro: simple. Con: no personalization.
Pattern 2: Federated personalization
- Vendor pre-trains. UE fine-tunes on local data.
- Federated updates for global improvement.
- Pro: privacy-preserving. Con: complex coordination.
Pattern 3: Online learning
- UE continuously fine-tunes NN on measured channel.
- Fastest adaptation. Privacy-preserving.
- Pro: best performance in stable environments. Con: sensitive to environment changes.
Pattern 4: Hybrid
- Vendor model + Federated updates + online fine-tuning.
- Maximum flexibility. Standard for 6G commercial deployment.
2026 reality: Pattern 1 dominates (simpler). Pattern 2 in pilot. Pattern 4 expected 2030+.
- •
Pattern 1: vendor-only (simplest, fixed)
- •
Pattern 2: federated (privacy + personalization)
- •
Pattern 3: online (adaptive, noisy)
- •
Pattern 4: hybrid (6G commercial)
Common Mistake: Respect Privacy Constraints
Mistake:
Centralizing UE channel data for ML training, violating GDPR, CCPA, or similar privacy regulations.
Correction:
Design ML training for privacy from the ground up:
- Federated learning (§3): training happens at UE.
- Differential privacy: add noise to model updates.
- Secure aggregation: cryptographic sum of updates.
- Anonymization: remove UE identifiers before any data handling.
Compliance costs: training complexity increases by . Performance gap vs non-private: dB. Acceptable for regulatory compliance. Commercial deployments 2028+ will mandate privacy-preserving ML for physical layer.