Ferkans — Interactive Telecom Tutor

What Gets Measured Gets Optimised

The choice of evaluation metric shapes the conclusions of every RF imaging paper. PSNR, SSIM, and LPIPS can disagree on which reconstruction is "better." Shape metrics (Chamfer, IoU) matter for 3D reconstruction. Detection metrics ( $P_D$ , ROC) matter for radar applications. Computational metrics (FLOPs, time) determine practical deployability. This section defines each metric precisely and shows when they agree and disagree.

Definition:
Peak Signal-to-Noise Ratio (PSNR)

PSNR measures the reconstruction quality relative to the dynamic range of the image:

$\mathrm{PSNR} = 10\log_{10}\!\left( \frac{\max_q |\mathbf{c}_{q}|^2} {\frac{1}{Q}\sum_{q=1}^{Q}|\hat{\mathbf{c}}_q - \mathbf{c}_{q}|^2} \right) = 10\log_{10}\!\left(\frac{\mathbf{c}_{\max}^{2}}{\mathrm{MSE}}\right)$

where $\mathbf{c}_{\max}$ is the maximum scene value and MSE is the mean squared error over $Q$ voxels.

Typical values: $> 30$ dB (good), $> 40$ dB (excellent), $< 20$ dB (poor).

PSNR is the most widely used metric but has limitations: it does not capture perceptual quality, structural preservation, or the distribution of errors across the image. PSNR favours smooth reconstructions that minimise total error energy.

Definition:
Structural Similarity Index (SSIM)

SSIM compares local image patches in terms of luminance, contrast, and structure:

$\mathrm{SSIM}(\mathbf{x}, \hat{\mathbf{x}}) = \frac{(2\mu_x\mu_{\hat{x}} + c_1)(2\sigma_{x\hat{x}} + c_2)} {(\mu_x^2 + \mu_{\hat{x}}^2 + c_1)(\sigma_x^2 + \sigma_{\hat{x}}^2 + c_2)}$

where $\mu$ , $\sigma^2$ , and $\sigma_{x\hat{x}}$ are the local mean, variance, and cross-covariance computed over a sliding window, and $c_1, c_2$ are stabilisation constants.

SSIM $\in [0, 1]$ where $1$ indicates perfect reconstruction. The overall SSIM is the average over all windows (MSSIM).

SSIM is more perceptually meaningful than PSNR because it captures structural information. For RF imaging, SSIM correlates better with target detection performance than PSNR.

Definition:
Learned Perceptual Image Patch Similarity (LPIPS)

LPIPS measures perceptual distance using features from a pretrained deep network (typically VGG or AlexNet):

$\mathrm{LPIPS}(\mathbf{x}, \hat{\mathbf{x}}) = \sum_l w_l \cdot \|f_l(\mathbf{x}) - f_l(\hat{\mathbf{x}})\|_2^2$

where $f_l$ extracts features at layer $l$ and $w_l$ are learned weights. Lower LPIPS means higher perceptual similarity.

LPIPS captures high-level structural features that PSNR and SSIM miss. However, it was trained on natural images, so its relevance to RF reflectivity maps (which look different from photographs) is an open question.

Definition:
Normalised Mean Squared Error (NMSE)

NMSE measures the relative reconstruction error:

$\mathrm{NMSE} = \frac{\|\hat{\mathbf{c}} - \mathbf{c}\|_2^2}{\|\mathbf{c}\|_2^2}$

NMSE $= 0$ is perfect reconstruction; NMSE $= 1$ means the reconstruction error equals the signal energy.

In dB: $\mathrm{NMSE}_{\mathrm{dB}} = 10\log_{10}(\mathrm{NMSE})$ . Typical targets: $< -20$ dB (good), $< -30$ dB (excellent).

NMSE is preferred over PSNR when comparing across different scenes because it is normalised by the signal energy. PSNR depends on the dynamic range of each specific image.

Definition:
Chamfer and Hausdorff Distances

For 3D reconstructions represented as point sets $\mathcal{P}$ (reconstruction) and $\mathcal{Q}$ (ground truth):

Chamfer distance: $d_C(\mathcal{P}, \mathcal{Q}) = \frac{1}{|\mathcal{P}|}\sum_{\mathbf{p} \in \mathcal{P}} \min_{\mathbf{q} \in \mathcal{Q}} \|\mathbf{p} - \mathbf{q}\|^2 + \frac{1}{|\mathcal{Q}|}\sum_{\mathbf{q} \in \mathcal{Q}} \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{q} - \mathbf{p}\|^2$

Hausdorff distance: $d_H(\mathcal{P}, \mathcal{Q}) = \max\!\left( \max_{\mathbf{p} \in \mathcal{P}} \min_{\mathbf{q} \in \mathcal{Q}} \|\mathbf{p} - \mathbf{q}\|,\; \max_{\mathbf{q} \in \mathcal{Q}} \min_{\mathbf{p} \in \mathcal{P}} \|\mathbf{q} - \mathbf{p}\| \right)$

Chamfer is the average nearest-neighbour distance (sensitive to overall shape). Hausdorff is the worst-case distance (sensitive to outliers).

Definition:
IoU and F-Score for Volumetric Evaluation

For volumetric (voxel-based) reconstructions:

Intersection over Union (IoU): $\mathrm{IoU} = \frac{|\mathcal{V}_{\mathrm{pred}} \cap \mathcal{V}_{\mathrm{true}}|} {|\mathcal{V}_{\mathrm{pred}} \cup \mathcal{V}_{\mathrm{true}}|}$ where $\mathcal{V}$ is the set of occupied voxels after thresholding. IoU $\in [0, 1]$ ; higher is better.

F-Score at threshold $\tau$ : $F_\tau = \frac{2 \cdot \mathrm{Precision}_\tau \cdot \mathrm{Recall}_\tau} {\mathrm{Precision}_\tau + \mathrm{Recall}_\tau}$ where precision and recall are computed by counting predicted points within distance $\tau$ of a ground truth point and vice versa. The F-score at multiple thresholds (e.g., $\tau \in \{\lambda/4, \lambda/2, \lambda\}$ ) characterises both coarse and fine shape recovery.

Definition:
Detection Metrics

For radar detection applications:

Probability of detection $P_D$ : the fraction of true targets that are correctly detected (above threshold).

Probability of false alarm $P_{\mathrm{FA}}$ : the fraction of non-target locations that exceed the detection threshold.

ROC curve: plots $P_D$ vs. $P_{\mathrm{FA}}$ as the threshold varies. The area under the ROC (AUC) summarises detection performance in a single number ( $\mathrm{AUC} \in [0.5, 1]$ ).

Operating point: typically report $P_D$ at a fixed $P_{\mathrm{FA}} = 10^{-4}$ or $10^{-6}$ , reflecting the practical requirement of low false alarm rates.

PSNR vs SSIM vs $P_D$ : When Metrics Disagree

Apply different types of degradation to a 1D scene with 4 targets on a smooth background. Observe how PSNR, SSIM, and detection probability $P_D$ respond differently:

Noise degrades PSNR but may preserve target peaks ( $P_D$ stays high).
Blur preserves PSNR (low total error) but destroys structure (SSIM drops, $P_D$ drops).
Missing data affects all metrics.
Structured artifacts can fool PSNR while SSIM and $P_D$ degrade.

Diamond markers show detected (green) vs. missed (red) targets.

Parameters

Degradation Type

Degradation Level0.1

ROC Curves for Different Reconstruction Methods

Compare ROC curves for matched filter, LASSO, and a deep network at different SNR levels. At low SNR, the gap between methods widens. The deep network achieves higher $P_D$ at the same $P_{\mathrm{FA}}$ because it has learned to separate targets from noise.

Parameters

\mathrm{SNR}

(dB)15

Number of Targets5

Definition:
Computational Metrics

Beyond reconstruction quality, practical deployment requires:

FLOPs: floating-point operations per reconstruction. Matched filter: $O(MQ)$ . LASSO (ISTA, $T$ iterations): $O(T \cdot MQ)$ . Deep network: depends on architecture.
Inference time: wall-clock time for a single reconstruction on specified hardware (GPU model, batch size).
Memory: peak GPU memory during reconstruction.
Training time: total GPU-hours for learned methods (amortised over all future reconstructions).
Number of parameters: for neural networks, total trainable parameters.

Report computational metrics alongside quality metrics. A method that achieves 0.5 dB higher PSNR but takes $100\times$ longer may not be preferable.

Evaluation Metrics Overview

Metric	Type	Range	Best For	Limitation
PSNR (dB)	Image quality	$[0, \infty)$	Overall error	Ignores structure
SSIM	Image quality	$[0, 1]$	Structural fidelity	Local windows only
LPIPS	Perceptual	$[0, \infty)$	Perceptual quality	Trained on natural images
NMSE (dB)	Image quality	$(-\infty, 0]$	Cross-scene comparison	Same as PSNR
Chamfer	Shape (3D)	$[0, \infty)$	Average shape error	Ignores outliers
Hausdorff	Shape (3D)	$[0, \infty)$	Worst-case error	Sensitive to outliers
IoU	Volume	$[0, 1]$	Occupancy overlap	Threshold-dependent
$P_D$ at $P_{\mathrm{FA}}$	Detection	$[0, 1]$	Target detection	Binary (no localisation)
AUC-ROC	Detection	$[0.5, 1]$	Overall detectability	Averages over all thresholds
FLOPs	Compute	$[0, \infty)$	Computational cost	Hardware-dependent

Example: Metric Disagreement in Practice

Two reconstruction algorithms produce images with PSNR 28 dB and 25 dB respectively. However, the second algorithm has higher target detection probability ( $P_D = 0.95$ vs. $0.87$ at $P_{\mathrm{FA}} = 10^{-4}$ ). Explain this discrepancy.

Solution

PSNR analysis

Algorithm 1 has lower total error (higher PSNR) but the errors may be concentrated in the background (uniform noise reduction), leaving target regions slightly degraded.

Detection analysis

Algorithm 2 has higher total error but preserves target peaks better (perhaps using a sparsity prior that enhances point targets at the cost of background noise). The improved contrast at target locations leads to better detection despite worse overall PSNR.

Recommendation

For detection applications, use task-specific metrics ( $P_D$ , ROC) rather than image quality metrics (PSNR, SSIM). PSNR is a reasonable proxy for general image quality but can be misleading for specific tasks. $\square$

⚠️Engineering Note

Report Multiple Metrics — Always

No single metric captures all aspects of reconstruction quality. Every RF imaging paper should report at minimum:

One image quality metric (PSNR or NMSE).
One structural metric (SSIM).
One task-specific metric ( $P_D$ , Chamfer, or IoU depending on the application).
Computational cost (inference time and memory).

Papers reporting only PSNR are incomplete. Reviewers should request additional metrics when they are missing.

Common Mistake: Reporting Only PSNR

Mistake:

Comparing algorithms using only PSNR and concluding that the one with higher PSNR is "better."

Correction:

PSNR measures total squared error, which favours smooth reconstructions. A method with 3 dB lower PSNR may preserve edges and targets better (higher SSIM, higher $P_D$ ). Always report SSIM and a task-specific metric alongside PSNR.

Common Mistake: Blindly Using LPIPS for RF Images

Mistake:

Using LPIPS (trained on natural images) as the primary metric for RF reflectivity maps without questioning its relevance.

Correction:

LPIPS was trained on natural photographs and may not capture the perceptual features relevant to RF images (which are sparse, have different statistics, and are interpreted differently). Use LPIPS as a supplementary metric, not as the primary one. For RF imaging, SSIM and detection metrics are more meaningful.

Historical Note: SSIM: Beyond Mean Squared Error

2004

SSIM was introduced by Wang, Bovik, Sheikh, and Simoncelli in 2004, motivated by the observation that MSE (and hence PSNR) correlates poorly with human perception of image quality. The paper has been cited over 50,000 times and SSIM has become a standard metric across all imaging fields. The key insight: the human visual system is adapted to structural information, not pixel-wise error.

PSNR (Peak Signal-to-Noise Ratio)

A logarithmic measure of reconstruction quality defined as $10\log_{10}(\mathbf{c}_{\max}^{2} / \mathrm{MSE})$ . Higher is better. Standard image quality metric but insensitive to structural degradation.

ROC Curve

Receiver Operating Characteristic: a plot of $P_D$ vs. $P_{\mathrm{FA}}$ as the detection threshold varies. Summarises the trade-off between detecting true targets and generating false alarms.

Quick Check

Algorithm A achieves PSNR = 30 dB, SSIM = 0.75. Algorithm B achieves PSNR = 27 dB, SSIM = 0.92. For a target detection task, which is likely better?

A, because it has higher PSNR

B, because higher SSIM indicates better structural preservation

Cannot determine without detection metrics

They are equivalent

Correction:

B, because higher SSIM indicates better structural preservation

Correct. SSIM correlates better with detection because it measures structural fidelity, which preserves target peaks.

Key Takeaway

PSNR measures total error energy; widely used but insensitive to structure. SSIM captures structural similarity and correlates better with detection. LPIPS captures perceptual quality but was trained on natural images. For 3D reconstruction, use Chamfer/Hausdorff distance and IoU. For detection, use ROC curves and $P_D$ at fixed $P_{\mathrm{FA}}$ . Report multiple metrics --- no single number tells the full story.

Evaluation Metrics

What Gets Measured Gets Optimised

Definition: Peak Signal-to-Noise Ratio (PSNR)

Definition: Structural Similarity Index (SSIM)

Definition: Learned Perceptual Image Patch Similarity (LPIPS)

Definition: Normalised Mean Squared Error (NMSE)

Definition: Chamfer and Hausdorff Distances

Definition: IoU and F-Score for Volumetric Evaluation

Definition: Detection Metrics

PSNR vs SSIM vs PDP_DPD​: When Metrics Disagree

Parameters

ROC Curves for Different Reconstruction Methods

Parameters

Definition: Computational Metrics

Evaluation Metrics Overview

Example: Metric Disagreement in Practice

PSNR analysis

Detection analysis

Recommendation

Report Multiple Metrics — Always

Common Mistake: Reporting Only PSNR

Common Mistake: Blindly Using LPIPS for RF Images

Historical Note: SSIM: Beyond Mean Squared Error

PSNR (Peak Signal-to-Noise Ratio)

ROC Curve

Quick Check

Key Takeaway

Definition:
Peak Signal-to-Noise Ratio (PSNR)

Definition:
Structural Similarity Index (SSIM)

Definition:
Learned Perceptual Image Patch Similarity (LPIPS)

Definition:
Normalised Mean Squared Error (NMSE)

Definition:
Chamfer and Hausdorff Distances

Definition:
IoU and F-Score for Volumetric Evaluation

Definition:
Detection Metrics

PSNR vs SSIM vs $P_D$ : When Metrics Disagree

Definition:
Computational Metrics