Over-the-Air Measurements and Dataset Curation
Why Real Channels Embarrass Good Models
Chapter 2 introduced the statistical channel models that every subsequent chapter relies on β Rayleigh fading, one-ring covariance, clustered delay-line, 3GPP TR 38.901. These models are elegant because they are closed: a small set of parameters (angular spread, Ricean -factor, path loss exponent) reproduces the qualitative behavior of real channels. A channel sounder shows us the gap between the closed world and the real world, and that gap is where the fun happens. Real channels have moments of blockage, non-Gaussian tails, unexpected specular components, and occasional pathologies that no model predicts. OTA measurement campaigns are how we discover which of those pathologies matter.
This section covers the architecture of a channel-sounding testbed, the logistics of running a measurement campaign, and the question that has taken over 3GPP since Release 18: how do we curate the resulting dataset so that AI/ML for physical layer works in the field?
Definition: Channel Sounder Architecture
Channel Sounder Architecture
A channel sounder is a measurement system whose purpose is to record the true propagation channel over time, frequency, and transmit/receive locations. The three standard architectures:
-
Correlator sounder. A known PN sequence is transmitted continuously; the receiver correlates and recovers the delay profile. Simple and robust but gives only the delay response.
-
Chirp sounder. A linear FM sweep with bandwidth is transmitted; the receiver dechirps and FFTs to recover the frequency response across the full bandwidth. Standard for sub-6 GHz.
-
MIMO sounder. A full sweep using orthogonal codes (Hadamard or Golay) on the transmit antennas and coherent reception on all receive antennas. Gives the full spatial-temporal-spectral channel tensor.
The raw output is a hypercube that grows rapidly with array size and measurement duration. A day of MIMO sounding at 100 MHz bandwidth on a array produces hundreds of gigabytes of data.
Sounder Hypercube
The raw output of a channel sounder: a four-dimensional complex tensor indexed by transmit antenna, receive antenna, OFDM subcarrier, and time snapshot. A typical measurement session produces complex entries, which is the reason channel sounding drives storage and I/O requirements that outstrip real-time testbed needs.
HHI Massive MIMO Testbeds as Channel Sounders
The Fraunhofer Heinrich-Hertz-Institut (HHI) in Berlin operates a family of massive MIMO testbeds that serve jointly as real-time demonstrators and as channel sounders for 3GPP contributions. HHI's KIARA platform is a -element planar array at 26 GHz with 400 MHz instantaneous bandwidth, mounted on mobile carts for indoor and outdoor campaigns. The data collected by HHI testbeds has fed directly into 3GPP TR 38.901 revisions (channel model extensions for FR2) and into the 3GPP AI/ML workitem (Rel-18) that produced the reference datasets for model training.
The HHI testbeds are an example of how a single experimental platform can simultaneously serve (i) a real-time technology demonstration, (ii) an academic channel-measurement program, and (iii) a standards contribution. Most of the FR2 channel statistics in 3GPP TR 38.901 Rel-18 trace back to HHI sounding campaigns.
- β’
HHI KIARA operates at 26 GHz with 400 MHz bandwidth on a 128-element planar array
- β’
Raw data rates exceed 20 Gb/s during active sounding
- β’
Campaigns typically produce 1--10 TB of raw data per deployment day
Definition: Over-the-Air Measurement Campaign
Over-the-Air Measurement Campaign
An OTA campaign is an instrumented field experiment in which a testbed collects data under real propagation conditions. A campaign has four phases:
-
Site survey and link budget. Identify candidate locations, verify line-of-sight and NLOS geometries, estimate path loss with a coarse link-budget model, and select receiver positions that give a range of measurement conditions.
-
Calibration and shakedown. Verify reciprocity calibration, synchronization, and data-logging pipeline on the first day. A fraction of the campaign budget is always consumed by discovering the first bug in the field.
-
Data collection. Run scheduled sounding sessions that sweep the intended user positions and scenarios (indoor, outdoor, vehicular, blockage). Log both raw I/Q and extracted KPIs.
-
Curation and analysis. Post-process the raw data into a labelled dataset: link-level channel matrices, path-loss samples, delay-Doppler profiles, and outlier annotations. This is the step that takes the most time.
Measured Path Loss vs Log-Distance Model
Scatter plot of simulated OTA path-loss measurements against a log-distance reference model, with configurable path-loss exponent, shadowing standard deviation, and blockage probability. Illustrates the three systematic deviations β bias, scatter, and outlier tail β that every real campaign encounters.
Parameters
Probability of heavy blockage per sample
Definition: The Reality Gap
The Reality Gap
The reality gap is the difference between the per-user rate predicted by a theoretical model (closed-form Shannon bound, asymptotic expression, or simulation under a 3GPP CDL model) and the per-user rate measured in an OTA campaign with the same nominal parameters. Measured-vs-modeled gaps reported in the literature range from 1.5 to 6 dB at sub-6 GHz and from 3 to 12 dB at FR2. The gap has three identifiable contributions:
-
Systematic modeling bias. The model uses wrong parameters, wrong statistics, or wrong geometry for the deployment scenario. Correctable by refitting the model.
-
Hardware residual. Calibration, synchronization, phase noise, ADC quantization, PA nonlinearity β the impairments covered in Sections 26.2--26.4. Correctable by better engineering but never zero.
-
Non-Gaussian pathologies. Occasional blockage events, human body shadowing, unexpected specular reflections, mobility-induced Doppler spikes. These are the tail of the channel distribution and they defeat models that average them out.
The gap is the headline number every measurement paper should report and decompose.
Example: Decomposing the Reality Gap for a Sub-6 GHz Trial
An outdoor LuMaMi-class trial at 3.7 GHz measures a mean per-user throughput of 320 Mb/s, while the zero-forcing Shannon bound under a 3GPP UMa model with the same link budget predicts 440 Mb/s. Decompose the 120 Mb/s reality gap into plausible contributions.
Quantify the calibration residual
From Section 26.3, a well-tuned testbed achieves , equivalent to a 1 dB SINR penalty at the operating point. A 1 dB SINR loss on a link operating at dB translates, via the Shannon formula, to a rate penalty of , i.e., about 50 Mb/s.
Account for synchronization
Residual CFO of Hz after per-slot tracking and ns of timing jitter together cost another 0.5 dB, adding about 25 Mb/s to the gap.
Fixed-point and DAC/ADC effects
LuMaMi's 16-bit mantissa and 12-bit ADC give a combined effective SINR floor around 40 dB, irrelevant at 10 dB operating SNR. So this contribution is Mb/s.
The remaining gap: blockage
The residual 40--50 Mb/s gap corresponds to roughly 5--10% of the measurement window being in heavy blockage, during which the per-user throughput collapses to a fraction of nominal. This matches the HHI and LuMaMi reports: blockage is the single largest contributor to the reality gap once hardware impairments are properly engineered.
Definition: Dataset Curation for AI/ML Physical Layer
Dataset Curation for AI/ML Physical Layer
The 3GPP Rel-18 AI/ML workitem (TR 38.843) defines three reference use cases for learned physical-layer components: CSI feedback, beam management, and positioning. All three require labelled datasets of real channels, which only come from OTA testbeds. A curated dataset has:
-
Raw channel tensor. The sounder hypercube, stored in a compressed binary format (e.g., HDF5 with complex dtype).
-
Metadata. Tx/Rx positions, carrier frequency, bandwidth, subcarrier spacing, calibration state, environmental conditions (temperature, time of day, nominal blockage).
-
Outlier annotations. A per-snapshot flag marking known pathologies (blockage, calibration-update instants, human interference). Training models on unannotated pathologies is the single biggest source of pathological model behavior.
-
Train/val/test splits. Designed to prevent leakage between spatially nearby samples. Random splits defeat the purpose of evaluating generalization; the splits should be by position, by time, or by scenario.
-
Licensing and anonymization. If the dataset includes user-equipment positions, they must be anonymized before public release. 3GPP has standardized this in Rel-18.
The effort to build a curated dataset typically exceeds the effort to build the testbed that collected it.
OTA Datasets for Rel-18 AI/ML for NR
As part of the CommIT group's standards engagement, Caire's team and Fraunhofer HHI partners contributed curated OTA datasets to the 3GPP Rel-18 AI/ML workitem. The contributions established the annotation conventions now in use β outlier flags, calibration-state metadata, positional anonymization β and delivered baseline datasets from sub-6 GHz and FR2 sounding campaigns that the 3GPP evaluation methodology relies on. The same datasets also feed back into the Massive Beams product development, giving the startup a calibrated view of how real cell-free channels depart from the textbook models.
Common Mistake: Testing Only on Simulated Channels
Mistake:
Evaluating a new MIMO algorithm only on synthetic 3GPP CDL or Rayleigh channels, claiming it is ready for deployment, and skipping the OTA measurement step.
Correction:
Synthetic channels do not contain blockage tails, human interference, or the long-lived deterministic reflections that dominate real environments. An algorithm that excels on CDL can collapse under real conditions. Every serious MIMO research contribution should include OTA validation on a testbed or, if unavailable, on a publicly curated measurement dataset.
Common Mistake: Random Splits on a Spatially Correlated Dataset
Mistake:
Randomly splitting an OTA dataset into train/val/test sets. Since consecutive snapshots are highly correlated, a random split leaks information between train and test and produces wildly optimistic evaluation metrics.
Correction:
Split by position (disjoint measurement tracks), by time (whole sessions), or by scenario (indoor vs outdoor). The split methodology should be explicit in the dataset card so that readers can judge whether reported performance is a real test of generalization.
Historical Note: From COST 259 to 3GPP TR 38.901
1999--2020The 3GPP TR 38.901 channel model, which underlies every 5G NR system-level simulation, is the end product of two decades of OTA measurement campaigns. The lineage traces from the European COST 259 model of the late 1990s, through COST 2100 (geometry- based stochastic, 2007), WINNER II (2007), and IMT-A submissions, to the 3GPP spatial channel model (SCM, 2003) and its 5G descendants. Each step folded in new measurement data from academic and industrial testbeds β Lund, HHI, Aalborg, NYU WIRELESS, Huawei β and each step expanded the modeled frequency range, array sizes, and blockage statistics. The pattern will continue: 3GPP TR 38.901 Rel-19 is expected to integrate AI/ML-trained channel components directly into the reference model.
Why This Matters: From Testbeds to 6G Commercialization
Every 6G technology listed in Chapter 27 β cell-free at scale, holographic MIMO surfaces, XL-MIMO, RIS-assisted links, ISAC β will enter standardization through exactly the testbed pathway described in this chapter. The sequence is always: theoretical result, academic testbed demonstration, industrial testbed validation, standards contribution, commercial product. Books like this one exist in part to shorten each step. The Massive Beams TU Berlin spin-off is a concrete example: a theoretical insight (cell-free coherent processing, Chapter 11) becomes a testbed result (Chapter 15), then a product and a 3GPP contribution.
Key Takeaway
OTA campaigns are where theories are tried. A channel sounder captures the full space-time-frequency channel tensor; a measurement campaign produces the raw data that exposes blockage, phase-noise pathologies, and modeling errors; dataset curation turns the raw data into something the 3GPP AI/ML workitem (or the next MIMO algorithm paper) can actually use. The reality gap between theory and measurement is the most informative single number any testbed can publish, and decomposing it is the professional responsibility of anyone building massive MIMO hardware.
Quick Check
In a well-engineered sub-6 GHz massive MIMO trial, which contribution typically dominates the reality gap between the theoretical per-user rate and the measured per-user rate?
Fixed-point quantization in the ZF combiner
Residual calibration error
Occasional heavy blockage events in the measurement window
Thermal noise at the ADC
Once calibration, synchronization, and mantissa widths are properly engineered, the dominant reality-gap contributor is the non-Gaussian tail of the channel β specifically blockage events that are not present in standard CDL or Rayleigh simulations. This is why blockage annotation is the centerpiece of Rel-18 dataset curation.