Benchmarking and Fair Comparisons
Why Baselines Make or Break a Paper
A new algorithm is only as impressive as the baselines it is compared against. Showing that "our method beats random" is uninformative. Showing that "our method beats the state-of-the-art under identical conditions" is the standard of evidence expected in wireless research.
Fair benchmarking requires three things:
- Same system model β identical channel, CSI, and constraint assumptions for all schemes.
- Same metric β compare sum rate to sum rate, not sum rate to max-min rate.
- Same resources β if your scheme uses more power, antennas, or computation, say so.
This section catalogs standard baselines and common comparison pitfalls.
Common Baseline Comparisons in Wireless Research
The following table lists frequently used baseline pairs. When proposing a new scheme, compare against the appropriate baselines for your problem category.
| Problem area | Standard baselines | What to show |
|---|---|---|
| MIMO precoding | MRT, ZF, MMSE, WMMSE | Sum rate vs. SNR, vs. |
| Multiple access | OMA (TDMA/OFDMA), NOMA, RSMA | Sum rate, user fairness |
| Channel estimation | LS, MMSE, genie-aided (perfect CSI) | MSE vs. SNR, rate with estimated CSI |
| Detection | ZF, MMSE, ML (or near-ML like sphere decoding) | BER vs. SNR, complexity comparison |
| Resource allocation | Equal allocation, water-filling, exhaustive | Rate/EE vs. number of users/subcarriers |
| Beamforming (single-user) | MRT, eigen-beamforming | Rate vs. SNR |
| Massive MIMO | MR combining, ZF, MMSE, capacity bound | Rate vs. antennas, vs. users |
| RIS/IRS | No RIS, random phase, SDR upper bound | Rate gain vs. number of elements |
| Deep learning approaches | Model-based (WMMSE, ZF), other DL methods | Rate, convergence speed, complexity |
A paper that does not include at least the basic baselines for its category will face criticism during peer review.
Definition: Fair Comparison
Fair Comparison
A comparison between schemes A and B is fair if:
- Both schemes operate under identical system assumptions (same , , , channel model, CSI quality)
- Both schemes use identical resources (same total transmit power, same bandwidth, same number of time slots)
- Both schemes are evaluated with the same performance metric
- Both schemes are given their best operating parameters (e.g., optimal regularization for MMSE, not an arbitrary value)
- The same Monte Carlo realizations are used for both schemes (using a shared random seed eliminates inter-experiment variance)
Violating any of these conditions biases the comparison.
Example: Spotting an Unfair Comparison
A paper proposes an NOMA scheme and compares it to OFDMA. The simulation shows that NOMA achieves 30% higher sum rate. However, you notice the following in the simulation setup:
- NOMA: 2 users per subcarrier, successive interference cancellation (SIC) with perfect CSI
- OFDMA: 1 user per subcarrier, no CSI at the transmitter
Is this a fair comparison? Identify the biases.
Identify the biases
Bias 1 β CSI asymmetry: NOMA uses perfect CSI (needed for SIC ordering and power allocation), while OFDMA has no CSIT. Giving OFDMA the same CSI would allow frequency-selective scheduling, dramatically improving its performance.
Bias 2 β User pairing advantage: NOMA serves 2 users per subcarrier, effectively doubling spectral efficiency by exploiting the near-far effect. A fair comparison should either (a) allow OFDMA the same total resources or (b) compare at the same total number of served users.
Bias 3 β Perfect SIC assumption: In practice, SIC suffers from error propagation and imperfect channel estimation. Assuming perfect SIC inflates NOMA's gains.
Conclusion: The 30% gain is an artifact of the asymmetric assumptions, not an inherent advantage of NOMA.
Pitfall: Unfair Comparisons
The most common ways papers create unfair comparisons:
- Giving the proposed scheme better CSI than the baselines
- Optimizing the proposed scheme's parameters while using default parameters for baselines
- Comparing at a single SNR point where the proposed scheme happens to excel, hiding regions where it does not
- Using a simple baseline when a strong one exists (e.g., comparing to MRT instead of MMSE)
- Ignoring complexity: A scheme that needs computation vs. a baseline with should acknowledge this
- Cherry-picking user geometries that favor the proposed scheme (e.g., specific near-far ratios for NOMA)
When reading a paper, always ask: "Would the proposed scheme still win if the baseline were given the same advantages?"
MRT vs. ZF vs. MMSE Precoding: Sum Rate
Compare the sum-rate performance of three standard linear precoders as a function of SNR. Adjust the number of base station antennas and users to see how the gap between schemes changes. Key observations: MRT is optimal at low SNR, ZF dominates at high SNR, and MMSE interpolates between both. As grows (massive MIMO regime), all three converge.
Parameters
Complexity Must Accompany Performance
Performance gains are meaningless without context. A paper that proposes a new detector must compare both performance (BER) and complexity (flops, latency, memory).
Standard complexity measures in wireless research:
| Operation | Complexity (real multiplications) |
|---|---|
| MRT precoding | |
| ZF precoding | |
| MMSE precoding | |
| WMMSE (per iteration) | |
| Sphere decoding (worst case) | (exponential) |
| MMSE-SIC detection | |
| ML detection | (exhaustive) |
A useful visualization: plot performance vs. complexity (e.g., BER vs. flops per symbol) rather than just performance vs. SNR. This Pareto frontier shows the true value of a new algorithm.
Upper and Lower Bounds as Benchmarks
Beyond comparing to practical schemes, strong papers include theoretical bounds:
Upper bounds (optimistic benchmarks):
- Perfect CSI at both transmitter and receiver
- Dirty-paper coding (DPC) capacity for the broadcast channel
- Genie-aided detection (interference perfectly known)
Lower bounds (pessimistic benchmarks):
- Worst-case channel realization
- Treating interference as noise
- Single-antenna (SISO) baseline
Showing that your scheme approaches the upper bound or significantly outperforms the lower bound strengthens the paper. If your scheme is within 1 dB of the capacity upper bound, further improvement is not practically meaningful.
Quick Check
A paper compares a new beamforming scheme against ZF precoding. The new scheme uses estimated CSI from uplink pilots, while ZF uses perfect CSI. The paper shows ZF is better. What is the problem with this comparison?
Nothing β the comparison is fair
The comparison favors the proposed scheme
ZF with perfect CSI is an upper bound, not a fair baseline; a fair comparison would give ZF the same estimated CSI
ZF should never be used as a baseline
Correct. ZF with perfect CSI is a useful upper bound but should be labeled as such. For a fair comparison, both schemes should use the same estimated channel. Alternatively, show both: ZF with perfect CSI as a bound and ZF with estimated CSI as the fair baseline.
Standard Simulation Scenarios
Using standardized scenarios enables cross-paper comparison. The most widely adopted are:
3GPP evaluation methodology:
- Urban Macro (UMa): ISD 500 m, 2 GHz carrier
- Urban Micro (UMi): ISD 200 m, 3.5 GHz carrier
- Indoor Hotspot (InH): open office, 30 GHz carrier
- Rural Macro (RMa): ISD 1732 m, 700 MHz carrier
Channel models:
- 3GPP TR 38.901 (NR channel model)
- COST 2100 / QuaDRiGa
- IEEE 802.11 TGn/TGax (for Wi-Fi)
- Saleh-Valenzuela (clustered mmWave)
Link-level parameters (common defaults):
- OFDM with 15 kHz subcarrier spacing (NR numerology 0)
- 5G NR LDPC codes
- Resource block: 12 subcarriers 14 OFDM symbols
Using these standard scenarios does not replace a custom evaluation, but including at least one standard scenario makes the work comparable to other published results.
Definition: Pareto Efficiency in Algorithm Comparison
Pareto Efficiency in Algorithm Comparison
An algorithm A is Pareto-dominated by algorithm B if B achieves equal or better performance (rate, BER) with equal or lower complexity (flops, latency) under the same system assumptions.
A fair comparison should plot the Pareto frontier: performance vs. complexity for all compared schemes. An algorithm on the frontier cannot be improved in one metric without degrading the other. Algorithms below the frontier are strictly dominated and should not be recommended for any operating point.
This is more informative than showing performance at a single SNR point, which hides the complexity dimension.
Standard Linear Precoder Comparison
| Precoder | Formula | Complexity | Low-SNR behavior | High-SNR behavior |
|---|---|---|---|---|
| MRT | Optimal (maximises received power) | Poor (ignores MUI) | ||
| ZF | Noise enhancement (low rank) | Optimal (eliminates MUI) | ||
| MMSE (RZF) | Reduces to MRT | Reduces to ZF |
Key Takeaway
A comparison is only as good as its weakest baseline. Always compare against the strongest known practical scheme under identical system assumptions. Include at least one information- theoretic bound (capacity, DPC region) to show how much room for improvement remains. Plot performance and complexity.
Baseline
A reference scheme against which a proposed algorithm is compared. Standard baselines in wireless MIMO: MRT, ZF, MMSE for precoding; LS, MMSE for channel estimation; OMA for multiple access. A fair paper includes at least the relevant baselines for its problem category.
Related: Monte Carlo Simulation, SNR (Signal-to-Noise Ratio)
Pareto Frontier
The set of schemes that are not dominated in both performance and complexity. A Pareto-optimal scheme cannot be improved in one metric without degrading the other. The standard way to compare algorithms with different complexity-performance trade-offs.
Related: Baseline