Ferkans — Interactive Telecom Tutor

ex32-01-sim2real-sources

Easy

List 5 sources of the sim-to-real gap for a learned OFDM radar imaging system trained on point-scatterer simulation and deployed on a TI IWR6843 in an office. For each, estimate the approximate PSNR degradation (in dB).

Show Hint

Think about model mismatch, hardware, and environment.

Solution

Sources and estimates

(1) Model mismatch (Born vs multipath): $\sim 5$ dB. Multipath creates ghost targets not in the model. (2) Off-grid targets (continuous positions vs grid): $\sim 2$ dB. Basis mismatch in sparse recovery. (3) Phase noise (oscillator imperfections): $\sim 1$ dB. Broadens the PSF slightly. (4) Mutual coupling (unmodelled antenna interaction): $\sim 2$ dB. Distorts the array pattern. (5) Clutter (furniture, walls not modelled as targets): $\sim 3$ dB. Elevates the noise floor. Total: $\sim 13$ dB, consistent with the typical 10--15 dB sim-to-real gap. $\square$

ex32-02-dynamic-prior

Easy

A radar images a scene at 10 fps. Between consecutive frames, one person walks 0.15 m. Which temporal prior (smoothness, sparse innovation, or optical flow) is most appropriate? Justify.

Show Hint

Consider the nature of the change: smooth, sparse, or motion-based?

Solution

Analysis

A walking person shifts by 0.15 m/frame. This is a coherent rigid-body motion best modelled by optical flow: $\boldsymbol{\gamma}_t(\mathbf{x}) \approx \boldsymbol{\gamma}_{t-1}(\mathbf{x} - \mathbf{v}\Delta t)$ with $\mathbf{v} = 1.5$ m/s.

Smoothness: inappropriate because the change is spatially localised. Sparse innovation: partially appropriate (few pixels change) but ignores the motion structure. Optical flow: most appropriate because it directly models the translational displacement. $\square$

ex32-03-primitive-count

Easy

A conference room contains 4 walls, a table (box), 8 chairs (simplified as boxes), a projector screen (plane), and a cylindrical pillar. Compute the total parameter count for a primitive representation and the compression ratio vs a $8 \times 6 \times 3$ m voxel grid at 5 cm resolution.

Show Hint

Each primitive: 7 parameters (3 position, 3 scale, 1 reflectivity).

Solution

Parameter count

Primitives: 4 walls + 1 table + 8 chairs + 1 screen + 1 pillar = 15 primitives. Parameters: $15 \times 7 = 105$ .

Voxel count

Voxels: $(8/0.05) \times (6/0.05) \times (3/0.05) = 160 \times 120 \times 60 = 1{,}152{,}000$ .

Compression ratio

$\rho = 1{,}152{,}000 / 105 \approx 10{,}971\times$ . The primitive representation is nearly four orders of magnitude more compact. $\square$

ex32-04-claim-analysis

Easy

A paper abstract claims: "We propose DeepRadar, a novel framework that achieves state-of-the-art performance, outperforming all existing methods by 8 dB PSNR on the RadarScenes dataset." Identify 4 questions you would ask before accepting this claim.

Show Hint

Consider baselines, dataset, and statistical rigour.

Solution

Questions

(1) What are the "existing methods" compared? Are they fairly tuned, or default parameters? (2) Is the 8 dB consistent across different scenes, or driven by easy cases? Are confidence intervals reported? (3) What is the train/test split? Is there scene overlap? (4) Is RadarScenes a standard benchmark, or curated by the authors? Is it publicly available? $\square$

ex32-05-domain-adaptation

Medium

You have a learned SAR imaging network trained on 50,000 simulated scenes and 10 real measured scenes. Design three domain adaptation strategies and predict which achieves the best real-data performance.

Show Hint

Consider fine-tuning, adversarial adaptation, and self-supervised.

Solution

Strategy 1: Fine-tuning

Pre-train on 50k simulations. Fine-tune on 10 real scenes with reduced learning rate ( $0.1\times$ ) for 1,000 iterations. Risk: overfitting. Mitigate: early stopping, weight decay. Expected: 5--7 dB recovery.

Strategy 2: Adversarial

Domain discriminator distinguishing sim from real features. Train imaging network to fool it. Expected: 3--5 dB recovery (unstable with few real examples).

Strategy 3: Self-supervised

Measurement consistency: hold out 20% of measurements per scene, reconstruct from 80%, predict held-out. No labels needed. Expected: 7--10 dB recovery (best).

Prediction

Strategy 3 (self-supervised) likely best because it avoids label bottleneck and adversarial instability. Combining pre-train $\to$ self-supervised fine-tune may be optimal. $\square$

ex32-06-4d-nerf

Medium

Extend the RF-NeRF framework to 4D (space + time). Describe: (1) the modified MLP input/output; (2) how to handle time; (3) training data requirements; (4) main challenges vs 3D RF-NeRF.

Show Hint

The simplest approach: condition the MLP on a time code.

Solution

MLP modification

Input: $(\gamma(\mathbf{x}), \gamma(t)) \to (\sigma, \rho)$ where $\gamma(\cdot)$ is positional encoding, with $L_t = 6$ frequency bands for time. Alternatively, use a per-frame latent code $\mathbf{z}_t$ (auto-decoder).

Time handling

Option A: single MLP conditioned on $t$ (smooth changes). Option B: deformation field $\mathbf{d}(\mathbf{x}, t)$ warping a canonical frame (better for rigid motion).

Data requirements

3D RF-NeRF: $V$ viewpoints at one time. 4D: $V$ viewpoints at $T$ time steps $= VT$ measurement sets. Typical: $T = 50$ , $V = 10$ $\to$ 500 CSI snapshots.

Challenges

(1) $50\times$ more data. (2) Temporal aliasing if motion exceeds frame rate. (3) $50\times$ training cost. (4) Ambiguity: new object vs moved object. $\square$

ex32-07-cross-modal

Medium

A cross-modal foundation model is pre-trained on 1 million paired (optical, RF channel) samples from simulation. Describe how to use it for RF imaging in a new building with no optical images.

Show Hint

The embedding captures scene structure independent of modality.

Solution

Approach

(1) Collect RF measurements in the new building. (2) Encode into shared embedding: $\mathbf{z} = f_{\mathrm{RF}}(\mathbf{y})$ . (3) Condition decoder on embedding: $\hat{\boldsymbol{\gamma}} = g(\mathbf{z}, \mathbf{y})$ . The embedding encodes scene-level features learned cross-modally.

Why it works

The embedding captures scene types (open plan, partitioned, few/many scatterers) common across modalities. Even without optical images, the RF embedding provides a useful prior.

Limitation

Novel scene types absent from pre-training may produce misleading embeddings. The model should report uncertainty (distance from nearest training embedding). $\square$

ex32-08-pareto

Medium

An ISAC system allocates power $P$ between communication ( $\alpha P$ ) and imaging ( $(1-\alpha)P$ ). Rate $R = \log_2(1 + \alpha P / \sigma^2)$ . Imaging PSNR $Q = 10\log_{10}((1-\alpha)P / \sigma^2)$ dB. Plot the Pareto frontier for $P/\sigma^2 = 30$ dB by varying $\alpha \in [0, 1]$ .

Show Hint

Evaluate $R$ and $Q$ at several $\alpha$ values.

Solution

Evaluation

$P/\sigma^2 = 1000$ . Key points: $\alpha = 0$ : $R = 0$ , $Q = 30$ dB. $\alpha = 0.1$ : $R = 6.66$ , $Q = 29.5$ dB. $\alpha = 0.5$ : $R = 8.97$ , $Q = 27.0$ dB. $\alpha = 0.9$ : $R = 9.82$ , $Q = 20.0$ dB. $\alpha = 1$ : $R = 9.97$ , $Q = -\infty$ .

Frontier shape

The frontier is convex. The "knee" at $\alpha \approx 0.1$ -- $0.3$ : allocating 10--30% to communication costs only 0.5--3 dB imaging quality while providing 7--9 bits/s/Hz. $\square$

ex32-09-scalability

Medium

Estimate memory requirements for a 3DGS digital twin of a $500 \times 500$ m campus. Propose a hierarchical scheme fitting within 32 GB GPU memory.

Show Hint

Divide into blocks; only render near the query point.

Solution

Naive estimate

Campus surface area $\sim 500{,}000$ m $^2$ . At 1 Gaussian per 10 m $^2$ surface: $50{,}000$ Gaussians. Memory: $50{,}000 \times 60$ bytes $= 3$ MB. Actually feasible for storage, but rendering against all is expensive for many queries.

Hierarchical scheme

Divide into $50 \times 50 = 2{,}500$ blocks of $10 \times 10$ m. Each building block: $\sim 2{,}000$ Gaussians. Active zone (100 m radius): $\sim 300$ blocks $\times 2{,}000 = 600{,}000$ Gaussians $= 36$ MB. Distant zone: path-loss model (no Gaussians). Stream blocks in/out as UEs move. Fits within 32 GB. $\square$

ex32-10-assumption-audit

Medium

Read the following signal model and list all assumptions (explicit and implicit): "We consider a MIMO radar with $N_t$ transmit and $N_r$ receive antennas observing $K$ point targets in the far field. The received signal is $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ , where $\mathbf{A}$ is the known sensing matrix, $\boldsymbol{\gamma} \in \mathbb{R}^N$ is the sparse scene, and $\mathbf{w} \sim \mathcal{CN}(0, \sigma^2\mathbf{I})$ ."

Show Hint

Count both stated and implied assumptions.

Solution

Explicit assumptions

(1) Point targets. (2) Far field. (3) Known $\mathbf{A}$ . (4) Sparse scene ( $K$ targets). (5) AWGN noise. (6) Known $\sigma^2$ .

Implicit assumptions

(7) $\boldsymbol{\gamma} \in \mathbb{R}^N$ : real-valued reflectivity (no phase). (8) On-grid targets. (9) Born approximation (linear model). (10) Stationary scene. (11) Narrowband (single $\mathbf{A}$ ). (12) Perfect calibration. $\square$

ex32-11-baseline-fairness

Medium

A paper compares its deep unrolling method to: (1) matched filter, (2) LASSO with $\lambda = 0.1$ , (3) ISTA with 50 iterations. Critique each baseline's fairness. Propose improvements.

Show Hint

Is each baseline given its best chance?

Solution

Matched filter

Fair as a lower bound but too weak for meaningful comparison. Keep but add stronger methods.

LASSO with $\lambda = 0.1$

Unfair: $\lambda = 0.1$ is arbitrary. Performance is highly sensitive to $\lambda$ . Fix: 5-fold cross-validation.

ISTA with 50 iterations

Potentially unfair: 50 may be insufficient. Fix: run to convergence ( $\|\boldsymbol{\gamma}^{(k+1)} - \boldsymbol{\gamma}^{(k)}\| < 10^{-6}$ ) or max 500 iterations.

Missing baselines

Add: (4) ADMM (different optimiser), (5) LISTA (learned unrolling), (6) U-Net (learned non-physics). This spans classical to learned. $\square$

ex32-12-resolution-chart

Hard

A learned SAR imaging method claims $2\times$ super-resolution. Analyse: (1) is this physically possible? (2) When does super-resolution become hallucination? (3) How would you test whether it is genuine?

Show Hint

Super-resolution from priors depends on SNR and scene statistics.

Solution

Physical possibility

Yes, for sparse scenes. The Rayleigh limit assumes no prior; with $K$ point targets, the CRB on separation can be much smaller at sufficient SNR.

Hallucination boundary

Occurs when the prior dominates: (1) low SNR ( $< 10$ dB); (2) scene types not in training; (3) two targets closer than the limit may merge or split incorrectly.

Testing

(1) Resolution chart: vary target separation, plot $P_d$ . (2) Noise sensitivity: genuine degrades with SNR; hallucination does not. (3) OOD test: test on extended targets (not in training). (4) Uncertainty map: high uncertainty near resolution limit $\to$ prior-dominated. $\square$

ex32-13-fisher-information

Hard

For a linear array of $N_r = 16$ elements at half-wavelength spacing imaging a 2D scene at 28 GHz with 200 MHz bandwidth, compute the Fisher information matrix and determine the range and cross-range CRB for a target at broadside, range 10 m.

Show Hint

The Fisher information is $(1/\sigma^2)\mathbf{A}^H\mathbf{A}$ .

Use the resolution formulae for range and cross-range.

Solution

Range CRB

Range resolution: $\Delta r = c/(2W) = 3 \times 10^8 / (2 \times 200 \times 10^6) = 0.75$ m. The CRB for range estimation of a single target: $\sigma_r^2 \geq \Delta r^2 / (8\pi^2 \text{SNR})$ . At SNR $= 20$ dB (100): $\sigma_r \geq 0.75 / (8\pi^2 \times 100)^{1/2} \approx 0.0084$ m $= 8.4$ mm.

Cross-range CRB

Array aperture: $D = (N_r - 1) \times \lambda/2 = 15 \times 0.0107/2 = 0.080$ m. Cross-range resolution: $\Delta x = \lambda R / D = 0.0107 \times 10 / 0.080 = 1.34$ m. CRB: $\sigma_x \geq 1.34 / (8\pi^2 \times 100)^{1/2} \approx 0.015$ m $= 1.5$ cm.

Interpretation

Range resolution is limited by bandwidth (0.75 m); cross-range by aperture (1.34 m). The CRBs show that at 20 dB SNR, we can localise a single target to cm accuracy -- much finer than the resolution cell. For multiple targets within one resolution cell, the CRB degrades and requires super-resolution. $\square$

ex32-14-primitive-optimisation

Hard

Formulate the gradient of the data-fidelity loss with respect to the position $\mathbf{p}_k$ of a box primitive. Assume the box has half-extents $\mathbf{s}_k$ and the forward model uses the Born approximation with far-field steering vectors.

Show Hint

The measurement response of a shifted box involves a phase term $e^{-j\mathbf{k}\cdot\mathbf{p}_k}$ .

Differentiate the complex exponential.

Solution

Forward model

The measurement response of box $k$ at measurement point $m$ (wavenumber $\mathbf{k}_m$ ): $[\mathbf{A}[\mathcal{P}_k]]_m = \alpha_k \, e^{-j\mathbf{k}_m \cdot \mathbf{p}_k} \prod_{d=1}^{3} \text{sinc}(k_{m,d} s_{k,d})$ where the sinc arises from the Fourier transform of a rectangular function.

Loss gradient

Loss: $\mathcal{L} = \|\mathbf{y} - \sum_k \mathbf{a}_k\|^2$ where $\mathbf{a}_k = \mathbf{A}[\mathcal{P}_k]$ . Residual: $\mathbf{r} = \mathbf{y} - \sum_k \mathbf{a}_k$ . $\frac{\partial \mathcal{L}}{\partial \mathbf{p}_k} = -2\text{Re}\left[\sum_m r_m^* \frac{\partial a_{k,m}}{\partial \mathbf{p}_k}\right]$ . $\frac{\partial a_{k,m}}{\partial \mathbf{p}_k} = -j\mathbf{k}_m \cdot a_{k,m}$ . Therefore: $\frac{\partial \mathcal{L}}{\partial \mathbf{p}_k} = 2\text{Im}\left[\sum_m r_m^* \mathbf{k}_m a_{k,m}\right]$ . $\square$

ex32-15-imaging-capacity

Hard

Derive the imaging capacity for a MIMO radar with $M = N_t N_r$ measurements imaging an $N$ -voxel scene. Show that when $\mathbf{A}$ has rank $r \leq M$ , the imaging capacity is $C = \sum_{k=1}^{r} \log_2(1 + \text{SNR} \cdot \sigma_k^2 / N)$ where $\sigma_k$ are the singular values.

Show Hint

Use the mutual information formula for Gaussian channels.

Solution

Mutual information

Model: $\mathbf{y} = \mathbf{A}\boldsymbol{\gamma} + \mathbf{w}$ with $\boldsymbol{\gamma} \sim \mathcal{CN}(\mathbf{0}, P/N \cdot \mathbf{I}_N)$ and $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2 \mathbf{I}_M)$ . $I(\boldsymbol{\gamma}; \mathbf{y}) = h(\mathbf{y}) - h(\mathbf{y}|\boldsymbol{\gamma}) = \log\det(\mathbf{R}_y) - \log\det(\sigma^2 \mathbf{I})$ .

Covariance

$\mathbf{R}_y = P/N \cdot \mathbf{A}\mathbf{A}^H + \sigma^2 \mathbf{I}$ . Using SVD $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ : $\mathbf{R}_y = \mathbf{U}(P/N \cdot \boldsymbol{\Sigma}\boldsymbol{\Sigma}^H + \sigma^2 \mathbf{I})\mathbf{U}^H$ .

Capacity

$C = \log_2\det(\mathbf{I} + \text{SNR}/N \cdot \boldsymbol{\Sigma}\boldsymbol{\Sigma}^H) = \sum_{k=1}^{r} \log_2(1 + \text{SNR} \cdot \sigma_k^2 / N)$ where $\text{SNR} = P/\sigma^2$ . This is identical to the MIMO capacity formula, confirming the deep connection. $\square$

ex32-16-uncertainty

Hard

Design an uncertainty quantification method for a learned RF imaging system deployed in an autonomous vehicle. Provide: (1) per-pixel confidence; (2) overall reliability score; (3) failure detection mechanism.

Show Hint

Consider MC dropout and conformal prediction.

Solution

Per-pixel confidence

MC Dropout: run the network $M = 10$ times with random dropout at inference. Per-pixel variance $\sigma^2(\mathbf{x}) = \text{Var}[\hat{\gamma}_m(\mathbf{x})]$ estimates epistemic uncertainty. Overhead: $10\times$ inference ( $\sim 100$ ms).

Reliability score

$R = |\{\mathbf{x} : \sigma^2(\mathbf{x}) < \tau\}| / N_{\text{pixels}}$ . If $R < 0.8$ , the reconstruction is unreliable. Threshold $\tau$ calibrated on validation set.

Failure detection

Conformal prediction: conformity score $s = \|\mathbf{y} - \mathbf{A}\hat{\boldsymbol{\gamma}}\|^2 / \sigma_n^2$ . If $s > q_{1-\alpha}$ (calibrated quantile), flag as failure. Distribution-free guarantee: failure rate $\leq \alpha$ . Total overhead: $\sim 100$ ms, acceptable for 10 Hz radar. $\square$

ex32-17-hidden-assumptions

Hard

A paper trains a neural network for mmWave radar imaging on 50,000 simulated indoor scenes (point model) and reports 30 dB PSNR on simulated test data. It then shows one "qualitative" real measurement result. Identify all methodological concerns and propose fixes.

Show Hint

Consider inverse crime, sim-to-real gap, and statistical validity.

Solution

Concerns

(1) Inverse crime: same forward model for training and testing; 30 dB PSNR is inflated. (2) Sim-to-real gap: no quantitative real-data metrics. (3) Single real measurement: proves nothing; could be cherry-picked. (4) No ground truth for real data. (5) Point model: real scenes have extended targets.

Fixes

(1) Different forward models for train/test (ray tracing for test). (2) Collect $\geq 10$ real scenes with ground truth (laser scan). (3) Report quantitative real-data metrics. (4) Fine-tune on real examples. (5) Use extended target models in training. (6) Add ray-tracing data alongside point models. $\square$

ex32-18-ablation-design

Challenge

A paper proposes "RF-ResNet" for through-wall radar imaging, combining: (A) physics-informed residual network, (B) wall-clutter removal, (C) complex-valued convolutions, (D) frequency-aware positional encoding. Design a minimal ablation study with 6 variants.

Show Hint

Include single-component removals and key replacements.

Solution

Ablation table

#	Variant	A	B	C	D
1	Full RF-ResNet	Y	Y	Y	Y
2	w/o wall removal (B)	Y	N	Y	Y
3	w/o complex conv (C $\to$ real)	Y	Y	N	Y
4	w/o freq encoding (D)	Y	Y	Y	N
5	w/o physics (A $\to$ plain ResNet)	N	Y	Y	Y
6	Minimal (plain ResNet, no B/D)	N	N	Y	N

Expected insights

Comparing 1 vs 2: if wall removal contributes $> 3$ dB, pre-processing dominates. 1 vs 3: complex convolutions capture phase (expected for coherent imaging). 1 vs 5: value of physics-informed architecture. Variant 6: lower bound. $\square$

ex32-19-literature-survey

Challenge

You are writing the related work section for a paper on learned indoor imaging from Wi-Fi signals. Identify the 4 research communities whose work you should cite, list 2 key papers from each, and explain how each community's perspective differs.

Show Hint

RF imaging draws from signal processing, ML, wireless, and computational imaging.

Solution

Communities

(1) Signal processing: resolution limits, sparse recovery. Papers: Candes et al. (2014, super-resolution); Potter et al. (2010, CS-SAR). Perspective: guarantees, worst-case. (2) Machine learning: architectures, generalisation. Papers: Monga et al. (2021, unrolling); Ongie et al. (2020, deep inverse). Perspective: data-driven, empirical. (3) Wireless communications: channel models, ISAC. Papers: Liu et al. (2022, ISAC survey); Caire (2026, illumination model). Perspective: system-level, standards. (4) Computational imaging: physics-based reconstruction. Papers: Mildenhall et al. (2021, NeRF); Tulsiani et al. (2017, primitives). Perspective: novel view synthesis.

Why all matter

Missing any community risks reinventing techniques, wrong baselines, or ignoring constraints. Reviewers at TSP/JSAC span all four communities. $\square$

ex32-20-research-proposal

Challenge

Write a 1-page research proposal for a 3-year PhD project on one open problem from this chapter. Include: (1) problem statement; (2) three research questions; (3) proposed approach; (4) expected contributions; (5) timeline with milestones.

Show Hint

Choose a focused problem with clear deliverables.

Solution

Example: Primitive-Based RF Scene Reconstruction

Problem: Current RF imaging methods use millions of parameters (voxels, neural fields) to represent scenes with inherently low-dimensional geometric structure, leading to high computational cost and poor physical interpretability.

Research questions: (Q1) Can indoor RF scenes be decomposed into $\leq 30$ geometric primitives with reconstruction quality matching voxel-based methods? (Q2) What is the optimal primitive dictionary (box, cylinder, superquadric) for indoor RF imaging at sub-6 GHz and mmWave? (Q3) Can BIM-initialised primitive representations improve reconstruction accuracy and reduce the data requirement?

Approach: Year 1: Differentiable primitive rendering for RF. Greedy decomposition algorithm. Validation on 2D simulated scenes. Year 2: Extension to 3D. Neural primitive predictor from matched-filter images. Real-data validation on 5 rooms. Year 3: BIM integration. Multi-frequency primitive sharing. Campus-scale demonstration.

Expected contributions: (1) First primitive-based reconstruction for RF imaging with $>10{,}000\times$ compression. (2) Differentiable RF primitive rendering library (open source). (3) BIM-integrated digital twin pipeline. (4) Dataset: 20 rooms with ground truth + BIM.

Timeline: M1-M6: literature, 2D implementation. M7-M12: first paper (2D results). M13-M18: 3D extension. M19-M24: real data, second paper. M25-M30: BIM integration. M31-M36: thesis, third paper. $\square$

Exercises

ex32-01-sim2real-sources

Sources and estimates

ex32-02-dynamic-prior

Analysis

ex32-03-primitive-count

Parameter count

Voxel count

Compression ratio

ex32-04-claim-analysis

Questions

ex32-05-domain-adaptation

Strategy 1: Fine-tuning

Strategy 2: Adversarial

Strategy 3: Self-supervised

Prediction

ex32-06-4d-nerf

MLP modification

Time handling

Data requirements

Challenges

ex32-07-cross-modal

Approach

Why it works

Limitation

ex32-08-pareto

Evaluation

Frontier shape

ex32-09-scalability

Naive estimate

Hierarchical scheme

ex32-10-assumption-audit

Explicit assumptions

Implicit assumptions

ex32-11-baseline-fairness

Matched filter

LASSO with $\lambda = 0.1$

ISTA with 50 iterations

Missing baselines

ex32-12-resolution-chart

Physical possibility

Hallucination boundary

Testing

ex32-13-fisher-information

Range CRB

Cross-range CRB

Interpretation

ex32-14-primitive-optimisation

Forward model

Loss gradient

ex32-15-imaging-capacity

Mutual information

Covariance

Capacity

ex32-16-uncertainty

Per-pixel confidence

Reliability score

Failure detection

ex32-17-hidden-assumptions

Concerns

Fixes

ex32-18-ablation-design

Ablation table

Expected insights

ex32-19-literature-survey

Communities

Why all matter

ex32-20-research-proposal

Example: Primitive-Based RF Scene Reconstruction