Over-the-Air Measurements and Dataset Curation

Why Real Channels Embarrass Good Models

Chapter 2 introduced the statistical channel models that every subsequent chapter relies on β€” Rayleigh fading, one-ring covariance, clustered delay-line, 3GPP TR 38.901. These models are elegant because they are closed: a small set of parameters (angular spread, Ricean KK-factor, path loss exponent) reproduces the qualitative behavior of real channels. A channel sounder shows us the gap between the closed world and the real world, and that gap is where the fun happens. Real channels have moments of blockage, non-Gaussian tails, unexpected specular components, and occasional pathologies that no model predicts. OTA measurement campaigns are how we discover which of those pathologies matter.

This section covers the architecture of a channel-sounding testbed, the logistics of running a measurement campaign, and the question that has taken over 3GPP since Release 18: how do we curate the resulting dataset so that AI/ML for physical layer works in the field?

Definition:

Channel Sounder Architecture

A channel sounder is a measurement system whose purpose is to record the true propagation channel H(t,f,rtx,rrx)\mathbf{H}(t, f, \mathbf{r}_{\rm tx}, \mathbf{r}_{\rm rx}) over time, frequency, and transmit/receive locations. The three standard architectures:

  1. Correlator sounder. A known PN sequence is transmitted continuously; the receiver correlates and recovers the delay profile. Simple and robust but gives only the delay response.

  2. Chirp sounder. A linear FM sweep with bandwidth WW is transmitted; the receiver dechirps and FFTs to recover the frequency response across the full bandwidth. Standard for sub-6 GHz.

  3. MIMO sounder. A full NtΓ—NrN_t \times N_r sweep using orthogonal codes (Hadamard or Golay) on the transmit antennas and coherent reception on all receive antennas. Gives the full spatial-temporal-spectral channel tensor.

The raw output is a hypercube H∈CNtΓ—NrΓ—NscΓ—Nsnap\mathbf{H} \in \mathbb{C}^{N_t \times N_r \times N_{\rm sc} \times N_{\rm snap}} that grows rapidly with array size and measurement duration. A day of MIMO sounding at 100 MHz bandwidth on a 64Γ—464\times 4 array produces hundreds of gigabytes of data.

Sounder Hypercube

The raw output of a channel sounder: a four-dimensional complex tensor indexed by transmit antenna, receive antenna, OFDM subcarrier, and time snapshot. A typical measurement session produces 101010^{10} complex entries, which is the reason channel sounding drives storage and I/O requirements that outstrip real-time testbed needs.

πŸ”§Engineering Note

HHI Massive MIMO Testbeds as Channel Sounders

The Fraunhofer Heinrich-Hertz-Institut (HHI) in Berlin operates a family of massive MIMO testbeds that serve jointly as real-time demonstrators and as channel sounders for 3GPP contributions. HHI's KIARA platform is a ∼128\sim 128-element planar array at 26 GHz with 400 MHz instantaneous bandwidth, mounted on mobile carts for indoor and outdoor campaigns. The data collected by HHI testbeds has fed directly into 3GPP TR 38.901 revisions (channel model extensions for FR2) and into the 3GPP AI/ML workitem (Rel-18) that produced the reference datasets for model training.

The HHI testbeds are an example of how a single experimental platform can simultaneously serve (i) a real-time technology demonstration, (ii) an academic channel-measurement program, and (iii) a standards contribution. Most of the FR2 channel statistics in 3GPP TR 38.901 Rel-18 trace back to HHI sounding campaigns.

Practical Constraints
  • β€’

    HHI KIARA operates at 26 GHz with 400 MHz bandwidth on a 128-element planar array

  • β€’

    Raw data rates exceed 20 Gb/s during active sounding

  • β€’

    Campaigns typically produce 1--10 TB of raw data per deployment day

πŸ“‹ Ref: 3GPP TR 38.901 (channel models) and 3GPP TR 38.843 (AI/ML for NR)
,

Definition:

Over-the-Air Measurement Campaign

An OTA campaign is an instrumented field experiment in which a testbed collects data under real propagation conditions. A campaign has four phases:

  1. Site survey and link budget. Identify candidate locations, verify line-of-sight and NLOS geometries, estimate path loss with a coarse link-budget model, and select receiver positions that give a range of measurement conditions.

  2. Calibration and shakedown. Verify reciprocity calibration, synchronization, and data-logging pipeline on the first day. A fraction of the campaign budget is always consumed by discovering the first bug in the field.

  3. Data collection. Run scheduled sounding sessions that sweep the intended user positions and scenarios (indoor, outdoor, vehicular, blockage). Log both raw I/Q and extracted KPIs.

  4. Curation and analysis. Post-process the raw data into a labelled dataset: link-level channel matrices, path-loss samples, delay-Doppler profiles, and outlier annotations. This is the step that takes the most time.

Measured Path Loss vs Log-Distance Model

Scatter plot of simulated OTA path-loss measurements against a log-distance reference model, with configurable path-loss exponent, shadowing standard deviation, and blockage probability. Illustrates the three systematic deviations β€” bias, scatter, and outlier tail β€” that every real campaign encounters.

Parameters
3.5
6
0.1

Probability of heavy blockage per sample

500

Definition:

The Reality Gap

The reality gap is the difference between the per-user rate predicted by a theoretical model (closed-form Shannon bound, asymptotic expression, or simulation under a 3GPP CDL model) and the per-user rate measured in an OTA campaign with the same nominal parameters. Measured-vs-modeled gaps reported in the literature range from 1.5 to 6 dB at sub-6 GHz and from 3 to 12 dB at FR2. The gap has three identifiable contributions:

  • Systematic modeling bias. The model uses wrong parameters, wrong statistics, or wrong geometry for the deployment scenario. Correctable by refitting the model.

  • Hardware residual. Calibration, synchronization, phase noise, ADC quantization, PA nonlinearity β€” the impairments covered in Sections 26.2--26.4. Correctable by better engineering but never zero.

  • Non-Gaussian pathologies. Occasional blockage events, human body shadowing, unexpected specular reflections, mobility-induced Doppler spikes. These are the tail of the channel distribution and they defeat models that average them out.

The gap is the headline number every measurement paper should report and decompose.

Example: Decomposing the Reality Gap for a Sub-6 GHz Trial

An outdoor LuMaMi-class trial at 3.7 GHz measures a mean per-user throughput of 320 Mb/s, while the zero-forcing Shannon bound under a 3GPP UMa model with the same link budget predicts 440 Mb/s. Decompose the 120 Mb/s reality gap into plausible contributions.

,

Definition:

Dataset Curation for AI/ML Physical Layer

The 3GPP Rel-18 AI/ML workitem (TR 38.843) defines three reference use cases for learned physical-layer components: CSI feedback, beam management, and positioning. All three require labelled datasets of real channels, which only come from OTA testbeds. A curated dataset has:

  1. Raw channel tensor. The sounder hypercube, stored in a compressed binary format (e.g., HDF5 with complex dtype).

  2. Metadata. Tx/Rx positions, carrier frequency, bandwidth, subcarrier spacing, calibration state, environmental conditions (temperature, time of day, nominal blockage).

  3. Outlier annotations. A per-snapshot flag marking known pathologies (blockage, calibration-update instants, human interference). Training models on unannotated pathologies is the single biggest source of pathological model behavior.

  4. Train/val/test splits. Designed to prevent leakage between spatially nearby samples. Random splits defeat the purpose of evaluating generalization; the splits should be by position, by time, or by scenario.

  5. Licensing and anonymization. If the dataset includes user-equipment positions, they must be anonymized before public release. 3GPP has standardized this in Rel-18.

The effort to build a curated dataset typically exceeds the effort to build the testbed that collected it.

,
πŸŽ“CommIT Contribution(2023)

OTA Datasets for Rel-18 AI/ML for NR

G. Caire, CommIT research group, Fraunhofer HHI partners β€” 3GPP TSG-RAN1 Rel-18 AI/ML contributions

As part of the CommIT group's standards engagement, Caire's team and Fraunhofer HHI partners contributed curated OTA datasets to the 3GPP Rel-18 AI/ML workitem. The contributions established the annotation conventions now in use β€” outlier flags, calibration-state metadata, positional anonymization β€” and delivered baseline datasets from sub-6 GHz and FR2 sounding campaigns that the 3GPP evaluation methodology relies on. The same datasets also feed back into the Massive Beams product development, giving the startup a calibrated view of how real cell-free channels depart from the textbook models.

datasetai-ml3gppcommitrel-18View Paper β†’

Common Mistake: Testing Only on Simulated Channels

Mistake:

Evaluating a new MIMO algorithm only on synthetic 3GPP CDL or Rayleigh channels, claiming it is ready for deployment, and skipping the OTA measurement step.

Correction:

Synthetic channels do not contain blockage tails, human interference, or the long-lived deterministic reflections that dominate real environments. An algorithm that excels on CDL can collapse under real conditions. Every serious MIMO research contribution should include OTA validation on a testbed or, if unavailable, on a publicly curated measurement dataset.

Common Mistake: Random Splits on a Spatially Correlated Dataset

Mistake:

Randomly splitting an OTA dataset into train/val/test sets. Since consecutive snapshots are highly correlated, a random split leaks information between train and test and produces wildly optimistic evaluation metrics.

Correction:

Split by position (disjoint measurement tracks), by time (whole sessions), or by scenario (indoor vs outdoor). The split methodology should be explicit in the dataset card so that readers can judge whether reported performance is a real test of generalization.

Historical Note: From COST 259 to 3GPP TR 38.901

1999--2020

The 3GPP TR 38.901 channel model, which underlies every 5G NR system-level simulation, is the end product of two decades of OTA measurement campaigns. The lineage traces from the European COST 259 model of the late 1990s, through COST 2100 (geometry- based stochastic, 2007), WINNER II (2007), and IMT-A submissions, to the 3GPP spatial channel model (SCM, 2003) and its 5G descendants. Each step folded in new measurement data from academic and industrial testbeds β€” Lund, HHI, Aalborg, NYU WIRELESS, Huawei β€” and each step expanded the modeled frequency range, array sizes, and blockage statistics. The pattern will continue: 3GPP TR 38.901 Rel-19 is expected to integrate AI/ML-trained channel components directly into the reference model.

Why This Matters: From Testbeds to 6G Commercialization

Every 6G technology listed in Chapter 27 β€” cell-free at scale, holographic MIMO surfaces, XL-MIMO, RIS-assisted links, ISAC β€” will enter standardization through exactly the testbed pathway described in this chapter. The sequence is always: theoretical result, academic testbed demonstration, industrial testbed validation, standards contribution, commercial product. Books like this one exist in part to shorten each step. The Massive Beams TU Berlin spin-off is a concrete example: a theoretical insight (cell-free coherent processing, Chapter 11) becomes a testbed result (Chapter 15), then a product and a 3GPP contribution.

Key Takeaway

OTA campaigns are where theories are tried. A channel sounder captures the full space-time-frequency channel tensor; a measurement campaign produces the raw data that exposes blockage, phase-noise pathologies, and modeling errors; dataset curation turns the raw data into something the 3GPP AI/ML workitem (or the next MIMO algorithm paper) can actually use. The reality gap between theory and measurement is the most informative single number any testbed can publish, and decomposing it is the professional responsibility of anyone building massive MIMO hardware.

Quick Check

In a well-engineered sub-6 GHz massive MIMO trial, which contribution typically dominates the reality gap between the theoretical per-user rate and the measured per-user rate?

Fixed-point quantization in the ZF combiner

Residual calibration error

Occasional heavy blockage events in the measurement window

Thermal noise at the ADC