Ferkans — Interactive Telecom Tutor

ex-otfs-ch21-01

Easy

Why would an NN OTFS receiver outperform classical MP detection? List three scenarios.

Show Hint

Non-idealities, data-driven learning.

Solution

Fractional Doppler

Classical MP assumes integer Doppler. NN learns non-integer structure.

Hardware imperfections

Phase noise, PA nonlinearity. NN absorbs these patterns.

Non-Gaussian noise

Impulsive noise, interference. NN learns non-Gaussian likelihoods.

Idealized

On perfect channel: NN matches classical. Gain comes from realism.

ex-otfs-ch21-02

Easy

What is the structural difference between a pure NN detector and an unfolded MP detector?

Show Hint

Structure from classical algorithm.

Solution

Pure NN

No structural assumptions. Any function of received signal. $\sim 10^6$ parameters.

Unfolded MP

Inherits MP iteration structure. Per-iteration parameters are learnable. $\sim 10^4$ parameters.

Trade-off

Pure NN: more expressive, worse OOD. Unfolded: less expressive, better OOD, interpretable. Unfolded preferred for safety- critical; pure NN for stable conditions.

ex-otfs-ch21-03

Easy

Explain the simulation-to-real gap in OTFS ML deployment. What mitigations exist?

Show Hint

Distribution shift from sim to real.

Solution

Gap

NN trained on simulated channels: 2-3 dB worse on real channels due to hardware imperfections, different path statistics, etc.

Domain randomization

Train on wide range of simulated parameters. Reduces gap to ~1 dB.

Domain adaptation

Fine-tune on small real-world sample. Recovers another 0.5 dB.

Residual gap

0.5-1 dB. Acceptable for deployment.

ex-otfs-ch21-04

Medium

A CNN-based OTFS detector has 3 conv layers (32 filters each) + 2 dense layers (256 units). Estimate parameter count and training requirements.

Show Hint

Count weights per layer.

Solution

Conv layers

3 conv, 32 filters, kernel 3×3 on 2-channel input (I/Q): 32 × 2 × 9 = 576 weights + 32 biases per layer = 608. Layer 2 and 3: 32 × 32 × 9 = 9216 + 32 = 9248 each. Total conv: ~19k parameters.

Dense layers

Flatten from 32 × 16 × 16 = 8192 features (example MN=256). Dense 256: 8192 × 256 + 256 = 2.1M parameters. Dense output: 256 × 4 + 4 = 1028 (4 QAM bits).

Total

~2.1M parameters. Trainable on modern GPU in ~1 hour.

Training data

For 2M parameters: need ~10x = $10^7$ training samples. Simulated OTFS data: generate in 1-2 hours.

ex-otfs-ch21-05

Medium

For a learned pilot with 5 active DD cells out of 1024, what sparsity does it have and what's its $\ell_0$ regularization contribution?

Show Hint

$\|\mathbf{p}\|_0 =$ nonzero count.

Solution

Sparsity

$\|\mathbf{p}\|_0 = 5$ out of 1024. Fraction: 0.5%.

Regularization

$\alpha \|\mathbf{p}\|_0 = 5 \alpha$ . Small compared to other terms. Encourages more sparsity if $\alpha$ small.

Effect

Sparse pilot: less interference with data; lower PAPR. Classical comparison: 1-3% pilot overhead. Learned: 0.5% (via $\ell_0$ encouragement).

ex-otfs-ch21-06

Medium

Describe how unfolded MP-OTFS inherits the robustness of classical MP while gaining NN flexibility.

Show Hint

Initialization at classical, training fine-tunes.

Solution

Initialization

Unfolded NN layer-by-layer initialized to match classical MP update rules. Output identical to MP at start.

Training

Gradient descent on per-iteration hyperparameters. Structure preserved. Performance fine-tunes.

Robustness

Core update rule from MP provides convergence guarantees. NN layers constrained to MP-like operations.

Flexibility

Per-iteration learnable parameters: damping, weighting, activation. NN expressivity within MP structure.

Trade-off

Less expressive than pure NN. More robust. 1-2 dB better than classical; safer than pure NN.

ex-otfs-ch21-07

Medium

For federated learning with 100 UEs, each training on $10^3$ channel frames, compute the effective training size and number of rounds to convergence.

Show Hint

FedAvg convergence.

Solution

Effective size

Total samples: $100 \times 10^3 = 10^5$ . Comparable to moderate centralized training.

Rounds

FedAvg: $\mathcal{O}(T)$ rounds for centralized-equivalent convergence. Typical: $T = 100$ - $1000$ rounds.

Bandwidth

Per round: 10 MB × 100 UEs = 1 GB. Total: $10^2$ - $10^3$ GB across rounds. Spread over days-weeks.

Performance

After convergence: ~95% of centralized performance. Accept small gap for privacy.

ex-otfs-ch21-08

Medium

An unfolded MP detector with $T = 8$ layers, each with 100 parameters, compares to a pure NN CNN with $10^5$ parameters. How do their training data requirements compare?

Show Hint

Parameter count determines data need.

Solution

Unfolded parameters

$T \times 100 = 800$ parameters. Compact.

Pure NN

$10^5$ parameters. 125× more than unfolded.

Data for convergence

Rule of thumb: 10x-100x more data than parameters needed for good generalization. Unfolded: $10^4$ - $10^5$ samples. Pure NN: $10^6$ - $10^7$ samples.

Practical implications

Unfolded: trained on simulation + small real-world fine-tuning. Pure NN: requires large training dataset, federated helps.

ex-otfs-ch21-09

Hard

Prove that an unfolded MP detector with $T$ layers matches classical MP with $T$ iterations at initialization.

Show Hint

Layer-wise correspondence.

Solution

Layer structure

Layer $t$ of unfolded MP: update $\mu^{(t+1)} = \alpha_t \mu^{(t)} + (1 - \alpha_t) f_\theta(\mu^{(t)})$ . At initialization: $\alpha_t = \alpha_{\mathrm{classical}}$ , $f_\theta$ identical to classical update.

Equivalence

At init: $f_\theta$ implements classical update rule exactly. $\alpha_t$ matches classical damping. Layer output = classical iteration output.

After $T$ layers

Cascaded: unfolded $=$ classical MP with $T$ iterations. Performance identical.

Training deviation

Training perturbs $\alpha_t, f_\theta$ to minimize loss on data. Performance improves beyond classical. $\blacksquare$

ex-otfs-ch21-10

Hard

Design an end-to-end training pipeline for learned OTFS pilot + NN detector on a simulated V2X environment.

Show Hint

Joint optimization, differentiable channel.

Solution

Data generation

Simulator produces (channel, data) pairs. Parameters: V2X channel profile, fractional Doppler, SNR distribution.

Forward model

$\mathbf{y} = \mathbf{H}(\mathbf{p}_\theta + \mathbf{x}) + \mathbf{w}$ where $\mathbf{p}_\theta$ is pilot, $\mathbf{H}$ channel. Differentiable end-to-end.

NN structure

Estimator $h_\psi(\mathbf{y}, \mathbf{p}_\theta) = \hat{H}$ . Detector $d_\phi(\mathbf{y}, \hat{H}) = \hat{x}$ . Parameters: $\{\theta, \psi, \phi\}$ trained jointly.

Loss

$\mathcal{L} = \|\hat{x} - x\|^2 + \alpha \|\mathbf{p}_\theta\|_0 + \beta \mathrm{PAPR}(\mathbf{p}_\theta)$

Training

Adam optimizer, batch size 64, 10⁴ epochs. Convergence: BER 3 dB better than classical at 15 dB SNR.

Deployment

Export $(\mathbf{p}_\theta^*, \psi^*, \phi^*)$ for on-chip use. Pilot stored as 5 DD cell values; detector as NN weights.

ex-otfs-ch21-11

Hard

For an NN OTFS receiver deployed in a mobile vehicle, design online adaptation: update NN parameters based on real-time channel measurements.

Show Hint

Fast online adaptation.

Solution

Adaptation window

Every 100 frames: collect channel measurements. Compare NN prediction vs classical reference.

Adaptation trigger

If |NN BER - classical BER| > threshold: trigger adaptation.

Adaptation step

Small gradient step on NN parameters via recent data. Learning rate: 10⁻⁴ (small; avoids catastrophic forgetting).

Stability

Limit adaptation to last-layer parameters. Freeze deep layers to maintain pretrained structure.

Computational

Per adaptation: 1000 frames × forward+backward = ~1M ops. At frame rate 100 Hz: 100 ms per adaptation. Feasible.

ex-otfs-ch21-12

Hard

Analyze the adversarial robustness gap between pure NN and unfolded MP OTFS detectors under a 0.5 dB perturbation attack.

Show Hint

Attack surface, Lipschitz constant.

Solution

Pure NN

Lipschitz constant: large (expressive network). Perturbation of 0.5 dB can shift output significantly. BER under attack: 10× worse than clean.

Unfolded MP

Structure from classical MP: bounded Lipschitz. Perturbation bounded. BER under attack: 2-3× worse than clean.

Gap

Unfolded: 3-5× more robust than pure NN. Important for safety-critical applications.

Adversarial training

Include perturbations in training. Both NN types improve. Unfolded still wins by ~1 dB.

ex-otfs-ch21-13

Hard

A federated learning system for OTFS NN detector has 500 UEs with non-i.i.d. data (different channel profiles). Describe challenges and mitigations.

Show Hint

Client drift, weighted averaging.

Solution

Challenge: client drift

Different UEs have different channel distributions. Local gradients point in different directions. Naive averaging causes NN to zig-zag.

Mitigation: weighted averaging

Weight updates by data size or loss. Prioritize high-quality clients.

Mitigation: personalization

Shared global model + per-UE personalization head. UE-specific adaptation without losing global convergence.

Mitigation: regularization

Proximal term in local objective: $\text{local loss} + \mu \|\theta - \theta_{\text{global}}\|^2$ . Keeps UEs close to global.

Result

Convergence to a model that works well across most UEs. Each UE fine-tunes locally for its specific channel.

ex-otfs-ch21-14

Hard

Compare compute complexity of classical MP, unfolded MP, CNN, and transformer-based OTFS detectors at $MN = 10^4$ and $P = 8$ .

Show Hint

Per-frame ops for each.

Solution

Classical MP

$T \cdot MN \cdot P = 10 \cdot 10^4 \cdot 8 = 8 \cdot 10^5$ ops/frame.

Unfolded MP

Same structure + per-iteration learned parameters. ~1.5x overhead. $1.2 \cdot 10^6$ ops/frame.

CNN

3 conv layers × 32 filters × 9 kernel × $MN$ : 3 × 32 × 9 × 10⁴ = $8.6 \cdot 10^6$ ops/frame.

Transformer

Self-attention: $MN^2$ . $10^8$ ops/frame. 100× slower.

Comparison

MP ≈ unfolded MP < CNN << transformer. Choice: performance vs compute budget. URLLC favors unfolded; best-performance (offline) favors transformer.

ex-otfs-ch21-15

Hard

Design a NN OTFS channel estimator that exploits DD-domain sparsity.

Show Hint

Sparse prior, L1 regularization.

Solution

Sparse prior

Channel has $P = 5$ - $20$ paths; DD tensor is 99% zeros.

NN architecture

Input: received DD samples (2 channels for I/Q). CNN: 3 layers, attention over DD. Output: sparse channel tensor.

Loss

$\mathcal{L} = \|\hat{H} - H\|^2 + \alpha \|\hat{H}\|_1$ ( $\ell_1$ regularization encourages sparsity).

Performance

Compared to OMP: slightly better MSE, same compute. Compared to pure LS: 3 dB better. Benefits from joint training with detector for end-to-end optimization.

Deployment

Pre-trained on simulation, fine-tuned on deployment. Handles fractional Doppler structure automatically.

ex-otfs-ch21-16

Hard

Assess the standardization prospects for learned pilots in 6G. What are the 3GPP considerations?

Show Hint

Interoperability, testing, IPR.

Solution

Interoperability

Learned pilots are UE-capability-specific. Need standardized pilot-template exchange via RRC. Rel. 21 expected.

Testing

3GPP test: demonstrate learned pilots outperform classical across diverse profiles. Requires large simulation campaigns.

IPR

Learned pilot algorithms are somewhat patent-protected (Ma-Wang-Caire + CommIT). FRAND licensing expected.

Backward compat

Legacy UEs: fall back to classical pilots. Rel. 21 UEs: optional learned.

Timeline

Rel. 20 (2026-2028): study item. Rel. 21 (2028-2030): spec. Rel. 22 (2030+): deployment.

Exercises

ex-otfs-ch21-01

Fractional Doppler

Hardware imperfections

Non-Gaussian noise

Idealized

ex-otfs-ch21-02

Pure NN

Unfolded MP

Trade-off

ex-otfs-ch21-03

Gap

Domain randomization

Domain adaptation

Residual gap

ex-otfs-ch21-04

Conv layers

Dense layers

Total

Training data

ex-otfs-ch21-05

Sparsity

Regularization

Effect

ex-otfs-ch21-06

Initialization

Training

Robustness

Flexibility

Trade-off

ex-otfs-ch21-07

Effective size

Rounds

Bandwidth

Performance

ex-otfs-ch21-08

Unfolded parameters

Pure NN

Data for convergence

Practical implications

ex-otfs-ch21-09

Layer structure

Equivalence

After $T$ layers

Training deviation

ex-otfs-ch21-10

Data generation

Forward model

NN structure

Loss

Training

Deployment

ex-otfs-ch21-11

Adaptation window

Adaptation trigger

Adaptation step

Stability

Computational

ex-otfs-ch21-12

Pure NN

Unfolded MP

Gap

Adversarial training

ex-otfs-ch21-13

Challenge: client drift

Mitigation: weighted averaging

Mitigation: personalization

Mitigation: regularization

Result

ex-otfs-ch21-14

Classical MP

Unfolded MP

CNN

Transformer

Comparison

ex-otfs-ch21-15

Sparse prior

NN architecture

Loss

Performance