Benchmarking and Fair Comparisons

Why Baselines Make or Break a Paper

A new algorithm is only as impressive as the baselines it is compared against. Showing that "our method beats random" is uninformative. Showing that "our method beats the state-of-the-art under identical conditions" is the standard of evidence expected in wireless research.

Fair benchmarking requires three things:

Same system model — identical channel, CSI, and constraint assumptions for all schemes.
Same metric — compare sum rate to sum rate, not sum rate to max-min rate.
Same resources — if your scheme uses more power, antennas, or computation, say so.

This section catalogs standard baselines and common comparison pitfalls.

Common Baseline Comparisons in Wireless Research

The following table lists frequently used baseline pairs. When proposing a new scheme, compare against the appropriate baselines for your problem category.

Problem area	Standard baselines	What to show
MIMO precoding	MRT, ZF, MMSE, WMMSE	Sum rate vs. SNR, vs. $K$
Multiple access	OMA (TDMA/OFDMA), NOMA, RSMA	Sum rate, user fairness
Channel estimation	LS, MMSE, genie-aided (perfect CSI)	MSE vs. SNR, rate with estimated CSI
Detection	ZF, MMSE, ML (or near-ML like sphere decoding)	BER vs. SNR, complexity comparison
Resource allocation	Equal allocation, water-filling, exhaustive	Rate/EE vs. number of users/subcarriers
Beamforming (single-user)	MRT, eigen-beamforming	Rate vs. SNR
Massive MIMO	MR combining, ZF, MMSE, capacity bound	Rate vs. $N$ antennas, vs. $K$ users
RIS/IRS	No RIS, random phase, SDR upper bound	Rate gain vs. number of elements
Deep learning approaches	Model-based (WMMSE, ZF), other DL methods	Rate, convergence speed, complexity

A paper that does not include at least the basic baselines for its category will face criticism during peer review.

Definition:
Fair Comparison

A comparison between schemes A and B is fair if:

Both schemes operate under identical system assumptions (same $N_t$ , $N_r$ , $K$ , channel model, CSI quality)
Both schemes use identical resources (same total transmit power, same bandwidth, same number of time slots)
Both schemes are evaluated with the same performance metric
Both schemes are given their best operating parameters (e.g., optimal regularization for MMSE, not an arbitrary value)
The same Monte Carlo realizations are used for both schemes (using a shared random seed eliminates inter-experiment variance)

Violating any of these conditions biases the comparison.

Example: Spotting an Unfair Comparison

A paper proposes an NOMA scheme and compares it to OFDMA. The simulation shows that NOMA achieves 30% higher sum rate. However, you notice the following in the simulation setup:

NOMA: 2 users per subcarrier, successive interference cancellation (SIC) with perfect CSI
OFDMA: 1 user per subcarrier, no CSI at the transmitter

Is this a fair comparison? Identify the biases.

Solution

Identify the biases

Bias 1 — CSI asymmetry: NOMA uses perfect CSI (needed for SIC ordering and power allocation), while OFDMA has no CSIT. Giving OFDMA the same CSI would allow frequency-selective scheduling, dramatically improving its performance.

Bias 2 — User pairing advantage: NOMA serves 2 users per subcarrier, effectively doubling spectral efficiency by exploiting the near-far effect. A fair comparison should either (a) allow OFDMA the same total resources or (b) compare at the same total number of served users.

Bias 3 — Perfect SIC assumption: In practice, SIC suffers from error propagation and imperfect channel estimation. Assuming perfect SIC inflates NOMA's gains.

Conclusion: The 30% gain is an artifact of the asymmetric assumptions, not an inherent advantage of NOMA.

Pitfall: Unfair Comparisons

The most common ways papers create unfair comparisons:

Giving the proposed scheme better CSI than the baselines
Optimizing the proposed scheme's parameters while using default parameters for baselines
Comparing at a single SNR point where the proposed scheme happens to excel, hiding regions where it does not
Using a simple baseline when a strong one exists (e.g., comparing to MRT instead of MMSE)
Ignoring complexity: A scheme that needs $O(N^3)$ computation vs. a baseline with $O(N)$ should acknowledge this
Cherry-picking user geometries that favor the proposed scheme (e.g., specific near-far ratios for NOMA)

When reading a paper, always ask: "Would the proposed scheme still win if the baseline were given the same advantages?"

MRT vs. ZF vs. MMSE Precoding: Sum Rate

Compare the sum-rate performance of three standard linear precoders as a function of SNR. Adjust the number of base station antennas $N$ and users $K$ to see how the gap between schemes changes. Key observations: MRT is optimal at low SNR, ZF dominates at high SNR, and MMSE interpolates between both. As $N/K$ grows (massive MIMO regime), all three converge.

Parameters

BS antennas

N

Users

K

Complexity Must Accompany Performance

Performance gains are meaningless without context. A paper that proposes a new detector must compare both performance (BER) and complexity (flops, latency, memory).

Standard complexity measures in wireless research:

Operation	Complexity (real multiplications)
MRT precoding	$O(N_t K)$
ZF precoding	$O(N_t K^2 + K^3)$
MMSE precoding	$O(N_t K^2 + K^3)$
WMMSE (per iteration)	$O(N_t K^2 + K^3)$
Sphere decoding (worst case)	$O(M^{N_t})$ (exponential)
MMSE-SIC detection	$O(N_t^2 K)$
ML detection	$O(M^{N_t})$ (exhaustive)

A useful visualization: plot performance vs. complexity (e.g., BER vs. flops per symbol) rather than just performance vs. SNR. This Pareto frontier shows the true value of a new algorithm.

Upper and Lower Bounds as Benchmarks

Beyond comparing to practical schemes, strong papers include theoretical bounds:

Upper bounds (optimistic benchmarks):

Perfect CSI at both transmitter and receiver
Dirty-paper coding (DPC) capacity for the broadcast channel
Genie-aided detection (interference perfectly known)

Lower bounds (pessimistic benchmarks):

Worst-case channel realization
Treating interference as noise
Single-antenna (SISO) baseline

Showing that your scheme approaches the upper bound or significantly outperforms the lower bound strengthens the paper. If your scheme is within 1 dB of the capacity upper bound, further improvement is not practically meaningful.

Quick Check

A paper compares a new beamforming scheme against ZF precoding. The new scheme uses estimated CSI from uplink pilots, while ZF uses perfect CSI. The paper shows ZF is better. What is the problem with this comparison?

Nothing — the comparison is fair

The comparison favors the proposed scheme

ZF with perfect CSI is an upper bound, not a fair baseline; a fair comparison would give ZF the same estimated CSI

ZF should never be used as a baseline

Correction:

ZF with perfect CSI is an upper bound, not a fair baseline; a fair comparison would give ZF the same estimated CSI

Correct. ZF with perfect CSI is a useful upper bound but should be labeled as such. For a fair comparison, both schemes should use the same estimated channel. Alternatively, show both: ZF with perfect CSI as a bound and ZF with estimated CSI as the fair baseline.

Standard Simulation Scenarios

Using standardized scenarios enables cross-paper comparison. The most widely adopted are:

3GPP evaluation methodology:

Urban Macro (UMa): ISD 500 m, 2 GHz carrier
Urban Micro (UMi): ISD 200 m, 3.5 GHz carrier
Indoor Hotspot (InH): open office, 30 GHz carrier
Rural Macro (RMa): ISD 1732 m, 700 MHz carrier

Channel models:

3GPP TR 38.901 (NR channel model)
COST 2100 / QuaDRiGa
IEEE 802.11 TGn/TGax (for Wi-Fi)
Saleh-Valenzuela (clustered mmWave)

Link-level parameters (common defaults):

OFDM with 15 kHz subcarrier spacing (NR numerology 0)
5G NR LDPC codes
Resource block: 12 subcarriers $\times$ 14 OFDM symbols

Using these standard scenarios does not replace a custom evaluation, but including at least one standard scenario makes the work comparable to other published results.

Definition:
Pareto Efficiency in Algorithm Comparison

An algorithm A is Pareto-dominated by algorithm B if B achieves equal or better performance (rate, BER) with equal or lower complexity (flops, latency) under the same system assumptions.

A fair comparison should plot the Pareto frontier: performance vs. complexity for all compared schemes. An algorithm on the frontier cannot be improved in one metric without degrading the other. Algorithms below the frontier are strictly dominated and should not be recommended for any operating point.

This is more informative than showing performance at a single SNR point, which hides the complexity dimension.

Standard Linear Precoder Comparison

Precoder	Formula	Complexity	Low-SNR behavior	High-SNR behavior
MRT	$\mathbf{W} = \mathbf{H}^{H}$	$O(N_t K)$	Optimal (maximises received power)	Poor (ignores MUI)
ZF	$\mathbf{W} = \mathbf{H}^{H}(\mathbf{H}\mathbf{H}^{H})^{-1}$	$O(N_t K^2 + K^3)$	Noise enhancement (low rank)	Optimal (eliminates MUI)
MMSE (RZF)	$\mathbf{W} = \mathbf{H}^{H}(\mathbf{H}\mathbf{H}^{H} + \alpha\mathbf{I})^{-1}$	$O(N_t K^2 + K^3)$	Reduces to MRT	Reduces to ZF

Key Takeaway

A comparison is only as good as its weakest baseline. Always compare against the strongest known practical scheme under identical system assumptions. Include at least one information- theoretic bound (capacity, DPC region) to show how much room for improvement remains. Plot performance and complexity.

Baseline

A reference scheme against which a proposed algorithm is compared. Standard baselines in wireless MIMO: MRT, ZF, MMSE for precoding; LS, MMSE for channel estimation; OMA for multiple access. A fair paper includes at least the relevant baselines for its problem category.

Pareto Frontier

The set of schemes that are not dominated in both performance and complexity. A Pareto-optimal scheme cannot be improved in one metric without degrading the other. The standard way to compare algorithms with different complexity-performance trade-offs.

Related: Baseline

Simulation Methodology Tools and Reproducibility