Deep Learning for OTFS Receivers
Why ML for OTFS?
Classical OTFS receivers are built from principled algorithms: ML detection, message passing, MMSE — each with well-characterized assumptions and performance bounds. They work, but they are not always optimal in practice. Real channels have imperfections: phase noise, hardware nonlinearities, interference, Doppler fractional offsets. Classical algorithms handle these imperfectly. Deep learning offers a complementary approach: learn the optimal detector from data, absorbing all the real-world non-idealities that analytical models ignore. This section surveys ML-based OTFS receivers and quantifies when they win.
Definition: Machine Learning OTFS Receiver
Machine Learning OTFS Receiver
A machine learning OTFS receiver replaces one or more of the receiver blocks with a learnable neural network:
- NN channel estimator: takes received DD samples + pilot symbols as input, outputs channel estimate .
- NN detector: takes received DD samples + channel estimate, outputs soft symbol decisions .
- Joint NN (end-to-end): single NN from received samples to hard decisions. No explicit intermediate steps.
Architectures:
- Feedforward NN (MLP): simple, fast, limited expressivity.
- Convolutional NN (CNN): exploits spatial structure in DD grid.
- Transformer: attention over DD cells. Most expressive. Compute-heavy.
- Graph NN: factor-graph-structured, combining physics with learning.
Theorem: ML Receiver Performance Bounds
An NN-based OTFS receiver with sufficient capacity can asymptotically achieve the same BER as the optimal (ML) detector: The convergence rate depends on training data size and NN capacity . Empirical: for MSE convergence.
Practical performance (typical 2026 results):
- NN detector vs MP at 15 dB SNR: dB gap (at converged training).
- NN detector vs MP under fractional Doppler: NN improves by 1-2 dB (handles imperfections classical models don't capture).
- NN detector at very high SNR: marginal gain (matches theory).
Consequence: NN receivers match or slightly beat classical detectors at typical operating points. Gain is largest where analytical models break (imperfections, finite-precision).
Neural networks are universal approximators — given enough capacity and data, they can learn any function, including the optimal detector. The question is whether this is practically useful. For idealized channels: NNs match classical but don't beat them. For real channels with non-idealities: NNs learn these patterns and beat classical. The more non-ideal the channel, the more ML wins.
Universal approximation
Cybenko 1989 / Hornik 1991: NNs with enough hidden units approximate any continuous function arbitrarily well. The optimal detector is such a function.
Training convergence
With stochastic gradient descent + adequate data, NN parameters converge to minimizer of training loss. For convex loss: global min; for non-convex (deep NN): local min.
Generalization
Test performance follows train performance with gap (standard statistical learning bound).
Comparison to classical
Classical ML detector: optimal under known channel. NN detector: learns implicit channel + detector jointly. Under ideal assumptions: same performance. Under non-ideal: NN wins.
Key Takeaway
NN OTFS receivers match classical at idealized conditions and beat them at realistic conditions. The gain comes from learning non-idealities (fractional Doppler, hardware imperfections, non-Gaussian noise). At typical 6G operating points: dB improvement. Marginal for ideal, substantial for practical.
Definition: CNN-Based OTFS Detector
CNN-Based OTFS Detector
A CNN detector for OTFS treats the DD grid as a 2D image:
- Input: received DD samples (split into real/imaginary as 2 channels).
- Architecture: convolutional layers (extract local DD features)
- attention (cross-DD relationships) + dense layer (per-cell detection).
- Output: per-cell soft decision .
Why CNN? The DD channel is locally sparse — each path contributes to neighboring DD cells only. CNN's local receptive field matches this structure.
Typical architecture:
- 3-5 conv layers, 32-64 filters each.
- 2-3 attention layers, 4 heads.
- 1-2 dense layers, 128 units.
- Total parameters: -. Trainable on modest GPU.
Theorem: CNN vs MP Performance
Under idealized OTFS channel (integer Doppler, Gaussian noise): CNN detector achieves BER within 0.2 dB of MP detector.
Under realistic conditions (fractional Doppler , non-Gaussian noise, phase noise): CNN beats MP by 1-2 dB.
Under extreme conditions (fractional Doppler + phase noise + hardware distortion): CNN beats MP by 3-4 dB.
Compute: CNN training: 12 hours on 1 GPU (once). CNN inference: ms per frame on UE chip. MP inference: ms per frame. CNN is slower.
Trade-off: CNN is slower but more robust. Suited for high- value links (URLLC, safety-critical); classical for mass-scale (low-cost IoT).
The CNN wins exactly where classical MP assumes too much. Perfect Gaussian noise, integer Doppler, linear hardware — classical is matched. Real channels with fractional Doppler, nonlinear PAs, and complex noise — the CNN adapts. The trade-off is compute: 5× slower than MP, but still real-time-feasible.
Training
Train CNN on simulated OTFS data. Include realistic channel imperfections. CNN learns these as implicit features.
Idealized test
On integer Doppler + Gaussian: MP is Bayes-optimal. CNN converges to same. Gap: training noise, slight.
Realistic test
Fractional Doppler + other imperfections: MP mis-models. CNN adapts: learns true likelihood. 1-2 dB advantage.
Extreme test
Many imperfections: MP fails to converge or diverges. CNN remains robust. 3-4 dB.
Compute
CNN: 5× MP but ms latency. Acceptable for URLLC.
Example: CNN Receiver for 6G URLLC
Design a CNN-based OTFS receiver for 6G URLLC (V2X safety): target BER at 20 dB SNR, paths, fractional Doppler , 1 ms latency budget.
Architecture
CNN: 4 conv layers (32 filters, kernel 3×3) → 2 transformer layers (heads 4) → dense (256 units). Output: 4 QPSK bits per DD cell.
Parameters
~500k parameters. Trainable in 6-12 hours on GPU.
Training
Data: 10⁶ OTFS frames with realistic imperfections. 50 epochs with Adam. Train to MSE .
Performance
BER at 20 dB SNR: . Beats MP by 2 dB in fractional Doppler.
Latency
Inference: 8 ms on mmWave UE chip. Misses 1-ms target!
Mitigation
Use lighter architecture (2 conv layers + dense). Latency: 3 ms. BER: 1 dB worse but still meets target. Depth-latency trade-off per URLLC sub-class.
ML vs Classical OTFS Detector BER
Plot BER vs SNR for MP detector, CNN detector, and NN detector with various architectures. Sliders: fractional Doppler, mobility, noise model.
Parameters
ML Receiver Deployment in 5G/6G
ML receiver deployment status (2026):
- 5G NR: limited ML in physical layer (vendor-proprietary for MIMO detection). No ML standardization.
- 5G Advanced (Rel. 18): AI/ML framework introduced. Channel feedback compression via NN. Experimental ML detectors.
- 6G Foundation (Rel. 21): AI/ML is native in the RAN. Standardized NN architectures for channel estimation, detection, resource allocation.
- 6G Deployment (Rel. 22+): ML receivers common. Combined with OTFS: NN handles DD-domain detection + pilot optimization.
Hardware: modern UE SoCs include AI/ML accelerators (Apple Neural Engine, Qualcomm Hexagon). - TOPS. Inference for OTFS-CNN: - ms feasible.
Privacy concerns: ML trained on UE channel data raises privacy issues (location inference). Mitigation: federated learning across UEs; training happens at UE without centralized data collection.
Adversarial robustness: NN receivers can be jammed by adversarial examples. Under active jamming: CNN robustness is dB better than MP (smoothing effect).
- •
5G: vendor-proprietary ML (not standardized)
- •
6G Rel. 21: native AI/ML framework
- •
Hardware: UE AI accelerators (10-100 TOPS)
- •
Privacy: federated learning for training
Common Mistake: NN Overfits to Training Channel
Mistake:
Training an NN OTFS receiver on one channel profile (e.g., 3GPP Urban Micro) and deploying in a different one (e.g., Rural Macro). The NN overfits to training statistics; out-of-distribution channels cause severe performance drops (- dB).
Correction:
Train NN on diverse channel profiles covering realistic deployment scenarios. Include: 3GPP Urban Micro, Urban Macro, Rural Macro, Highway, LEO, and custom scenarios. Use domain randomization: randomly perturb channel parameters during training. Test on held-out channel profiles. For deployment: adaptive fine-tuning on current environment. Typical practice: 80% training
- 20% adaptation budget.