Benchmarking and Fair Comparisons

Why Baselines Make or Break a Paper

A new algorithm is only as impressive as the baselines it is compared against. Showing that "our method beats random" is uninformative. Showing that "our method beats the state-of-the-art under identical conditions" is the standard of evidence expected in wireless research.

Fair benchmarking requires three things:

  1. Same system model β€” identical channel, CSI, and constraint assumptions for all schemes.
  2. Same metric β€” compare sum rate to sum rate, not sum rate to max-min rate.
  3. Same resources β€” if your scheme uses more power, antennas, or computation, say so.

This section catalogs standard baselines and common comparison pitfalls.

Common Baseline Comparisons in Wireless Research

The following table lists frequently used baseline pairs. When proposing a new scheme, compare against the appropriate baselines for your problem category.

Problem area Standard baselines What to show
MIMO precoding MRT, ZF, MMSE, WMMSE Sum rate vs. SNR, vs. KK
Multiple access OMA (TDMA/OFDMA), NOMA, RSMA Sum rate, user fairness
Channel estimation LS, MMSE, genie-aided (perfect CSI) MSE vs. SNR, rate with estimated CSI
Detection ZF, MMSE, ML (or near-ML like sphere decoding) BER vs. SNR, complexity comparison
Resource allocation Equal allocation, water-filling, exhaustive Rate/EE vs. number of users/subcarriers
Beamforming (single-user) MRT, eigen-beamforming Rate vs. SNR
Massive MIMO MR combining, ZF, MMSE, capacity bound Rate vs. NN antennas, vs. KK users
RIS/IRS No RIS, random phase, SDR upper bound Rate gain vs. number of elements
Deep learning approaches Model-based (WMMSE, ZF), other DL methods Rate, convergence speed, complexity

A paper that does not include at least the basic baselines for its category will face criticism during peer review.

Definition:

Fair Comparison

A comparison between schemes A and B is fair if:

  1. Both schemes operate under identical system assumptions (same NtN_t, NrN_r, KK, channel model, CSI quality)
  2. Both schemes use identical resources (same total transmit power, same bandwidth, same number of time slots)
  3. Both schemes are evaluated with the same performance metric
  4. Both schemes are given their best operating parameters (e.g., optimal regularization for MMSE, not an arbitrary value)
  5. The same Monte Carlo realizations are used for both schemes (using a shared random seed eliminates inter-experiment variance)

Violating any of these conditions biases the comparison.

Example: Spotting an Unfair Comparison

A paper proposes an NOMA scheme and compares it to OFDMA. The simulation shows that NOMA achieves 30% higher sum rate. However, you notice the following in the simulation setup:

  • NOMA: 2 users per subcarrier, successive interference cancellation (SIC) with perfect CSI
  • OFDMA: 1 user per subcarrier, no CSI at the transmitter

Is this a fair comparison? Identify the biases.

Pitfall: Unfair Comparisons

The most common ways papers create unfair comparisons:

  • Giving the proposed scheme better CSI than the baselines
  • Optimizing the proposed scheme's parameters while using default parameters for baselines
  • Comparing at a single SNR point where the proposed scheme happens to excel, hiding regions where it does not
  • Using a simple baseline when a strong one exists (e.g., comparing to MRT instead of MMSE)
  • Ignoring complexity: A scheme that needs O(N3)O(N^3) computation vs. a baseline with O(N)O(N) should acknowledge this
  • Cherry-picking user geometries that favor the proposed scheme (e.g., specific near-far ratios for NOMA)

When reading a paper, always ask: "Would the proposed scheme still win if the baseline were given the same advantages?"

MRT vs. ZF vs. MMSE Precoding: Sum Rate

Compare the sum-rate performance of three standard linear precoders as a function of SNR. Adjust the number of base station antennas NN and users KK to see how the gap between schemes changes. Key observations: MRT is optimal at low SNR, ZF dominates at high SNR, and MMSE interpolates between both. As N/KN/K grows (massive MIMO regime), all three converge.

Parameters
16
4

Complexity Must Accompany Performance

Performance gains are meaningless without context. A paper that proposes a new detector must compare both performance (BER) and complexity (flops, latency, memory).

Standard complexity measures in wireless research:

Operation Complexity (real multiplications)
MRT precoding O(NtK)O(N_t K)
ZF precoding O(NtK2+K3)O(N_t K^2 + K^3)
MMSE precoding O(NtK2+K3)O(N_t K^2 + K^3)
WMMSE (per iteration) O(NtK2+K3)O(N_t K^2 + K^3)
Sphere decoding (worst case) O(MNt)O(M^{N_t}) (exponential)
MMSE-SIC detection O(Nt2K)O(N_t^2 K)
ML detection O(MNt)O(M^{N_t}) (exhaustive)

A useful visualization: plot performance vs. complexity (e.g., BER vs. flops per symbol) rather than just performance vs. SNR. This Pareto frontier shows the true value of a new algorithm.

Upper and Lower Bounds as Benchmarks

Beyond comparing to practical schemes, strong papers include theoretical bounds:

Upper bounds (optimistic benchmarks):

  • Perfect CSI at both transmitter and receiver
  • Dirty-paper coding (DPC) capacity for the broadcast channel
  • Genie-aided detection (interference perfectly known)

Lower bounds (pessimistic benchmarks):

  • Worst-case channel realization
  • Treating interference as noise
  • Single-antenna (SISO) baseline

Showing that your scheme approaches the upper bound or significantly outperforms the lower bound strengthens the paper. If your scheme is within 1 dB of the capacity upper bound, further improvement is not practically meaningful.

Quick Check

A paper compares a new beamforming scheme against ZF precoding. The new scheme uses estimated CSI from uplink pilots, while ZF uses perfect CSI. The paper shows ZF is better. What is the problem with this comparison?

Nothing β€” the comparison is fair

The comparison favors the proposed scheme

ZF with perfect CSI is an upper bound, not a fair baseline; a fair comparison would give ZF the same estimated CSI

ZF should never be used as a baseline

Standard Simulation Scenarios

Using standardized scenarios enables cross-paper comparison. The most widely adopted are:

3GPP evaluation methodology:

  • Urban Macro (UMa): ISD 500 m, 2 GHz carrier
  • Urban Micro (UMi): ISD 200 m, 3.5 GHz carrier
  • Indoor Hotspot (InH): open office, 30 GHz carrier
  • Rural Macro (RMa): ISD 1732 m, 700 MHz carrier

Channel models:

  • 3GPP TR 38.901 (NR channel model)
  • COST 2100 / QuaDRiGa
  • IEEE 802.11 TGn/TGax (for Wi-Fi)
  • Saleh-Valenzuela (clustered mmWave)

Link-level parameters (common defaults):

  • OFDM with 15 kHz subcarrier spacing (NR numerology 0)
  • 5G NR LDPC codes
  • Resource block: 12 subcarriers Γ—\times 14 OFDM symbols

Using these standard scenarios does not replace a custom evaluation, but including at least one standard scenario makes the work comparable to other published results.

Definition:

Pareto Efficiency in Algorithm Comparison

An algorithm A is Pareto-dominated by algorithm B if B achieves equal or better performance (rate, BER) with equal or lower complexity (flops, latency) under the same system assumptions.

A fair comparison should plot the Pareto frontier: performance vs. complexity for all compared schemes. An algorithm on the frontier cannot be improved in one metric without degrading the other. Algorithms below the frontier are strictly dominated and should not be recommended for any operating point.

This is more informative than showing performance at a single SNR point, which hides the complexity dimension.

Standard Linear Precoder Comparison

PrecoderFormulaComplexityLow-SNR behaviorHigh-SNR behavior
MRTW=HH\mathbf{W} = \mathbf{H}^{H}O(NtK)O(N_t K)Optimal (maximises received power)Poor (ignores MUI)
ZFW=HH(HHH)βˆ’1\mathbf{W} = \mathbf{H}^{H}(\mathbf{H}\mathbf{H}^{H})^{-1}O(NtK2+K3)O(N_t K^2 + K^3)Noise enhancement (low rank)Optimal (eliminates MUI)
MMSE (RZF)W=HH(HHH+Ξ±I)βˆ’1\mathbf{W} = \mathbf{H}^{H}(\mathbf{H}\mathbf{H}^{H} + \alpha\mathbf{I})^{-1}O(NtK2+K3)O(N_t K^2 + K^3)Reduces to MRTReduces to ZF

Key Takeaway

A comparison is only as good as its weakest baseline. Always compare against the strongest known practical scheme under identical system assumptions. Include at least one information- theoretic bound (capacity, DPC region) to show how much room for improvement remains. Plot performance and complexity.

Baseline

A reference scheme against which a proposed algorithm is compared. Standard baselines in wireless MIMO: MRT, ZF, MMSE for precoding; LS, MMSE for channel estimation; OMA for multiple access. A fair paper includes at least the relevant baselines for its problem category.

Related: Monte Carlo Simulation, SNR (Signal-to-Noise Ratio)

Pareto Frontier

The set of schemes that are not dominated in both performance and complexity. A Pareto-optimal scheme cannot be improved in one metric without degrading the other. The standard way to compare algorithms with different complexity-performance trade-offs.

Related: Baseline