Ferkans — Interactive Telecom Tutor

What Makes a Good Reconstruction?

Classical information theory measures distortion by comparing source and reconstruction symbol by symbol: MSE, Hamming distance, absolute error. But humans and machines evaluate quality differently. Two images can have the same MSE yet look vastly different — one with imperceptible high-frequency noise, the other with visible blurring. A speech signal with low MSE may sound robotic, while one with higher MSE but preserved prosody sounds natural. The question is: can we define distortion measures that capture perceptual or semantic quality, and what are the information-theoretic implications?

Definition:
Semantic Distortion Measures

A semantic distortion measure $d_{\text{sem}}(s, \hat{s})$ evaluates the quality of reconstruction $\hat{s}$ based on task-relevant or perceptual criteria rather than symbol-by-symbol fidelity. Common examples:

Perceptual quality: $d_{\text{perc}}(s, \hat{s}) = \|f(s) - f(\hat{s})\|^2$ where $f$ is a learned feature extractor (e.g., VGG features for images, wav2vec for audio)
Task accuracy: $d_{\text{task}}(s, \hat{s}) = \mathbb{1}[c(\hat{s}) \neq c(s)]$ where $c$ is a classifier — distortion is 0 if the task output is preserved, 1 otherwise
Semantic similarity: $d_{\text{sim}}(s, \hat{s}) = 1 - \text{cos}(e(s), e(\hat{s}))$ where $e$ is an embedding function (e.g., CLIP for image-text alignment)

Theorem: Rate-Distortion Theory with Semantic Distortion

For a source $S$ with distribution $P_S$ and a semantic distortion measure $d_{\text{sem}}(s, \hat{s})$ , the rate-distortion function is: $R_{\text{sem}}(D) = \min_{P_{\hat{S}|S}: \, \mathbb{E}[d_{\text{sem}}(S, \hat{S})] \leq D} I(S; \hat{S})$ When $d_{\text{sem}}$ is a feature-space MSE $\|f(S) - f(\hat{S})\|^2$ , this reduces to the rate-distortion function of the feature vector $f(S)$ : $R_{\text{sem}}(D) \leq R_{f(S)}(D)$ with equality when $f$ is sufficient for reconstruction.

The rate-distortion function with semantic distortion can be much lower than with MSE because the semantic measure ignores irrelevant variations. If $f(S)$ is a low-dimensional feature, then $R_{f(S)}(D) \ll R_{S}(D)$ — we need far fewer bits to preserve features than to preserve pixels. This is why semantic communication can achieve dramatic compression gains.

Proof

Standard rate-distortion framework

The proof follows the standard rate-distortion argument (Chapter 6) with $d_{\text{sem}}$ replacing the classical distortion. The achievability uses random codebooks and typical sequences. The converse uses Fano's inequality. The only difference is the distortion measure.

Feature-space reduction

When $d_{\text{sem}}(s, \hat{s}) = \|f(s) - f(\hat{s})\|^2$ , define $U = f(S)$ and $\hat{U} = f(\hat{S})$ . By data processing, $I(S; \hat{S}) \geq I(U; \hat{U})$ . But $\mathbb{E}[d_{\text{sem}}] = \mathbb{E}[\|U - \hat{U}\|^2]$ , so the constraint depends only on $(U, \hat{U})$ . Therefore the minimum over $P_{\hat{S}|S}$ is at most the minimum over $P_{\hat{U}|U}$ , giving $R_{\text{sem}}(D) \leq R_{U}(D)$ .

Example: MSE vs. Perceptual Distortion for Images

A $d$ -dimensional Gaussian source has covariance $\Sigma_S$ with eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d$ . A perceptual feature extractor projects onto the top $m \ll d$ principal components: $f(S) = U_m^\top S \in \mathbb{R}^m$ . Compare $R_{\text{MSE}}(D)$ with $R_{\text{perc}}(D)$ .

Solution

Classical rate-distortion (MSE)

By the reverse water-filling theorem: $R_{\text{MSE}}(D) = \sum_{i=1}^d \max\!\left(0, \frac{1}{2}\log\frac{\lambda_i}{\gamma}\right)$ where $\gamma$ is set so that $\sum_i \min(\lambda_i, \gamma) = D$ . For $D$ small, nearly all $d$ components must be encoded.

Perceptual rate-distortion

The feature $f(S) \sim \mathcal{N}(0, \text{diag}(\lambda_1, \ldots, \lambda_m))$ . $R_{\text{perc}}(D) = \sum_{i=1}^m \max\!\left(0, \frac{1}{2}\log\frac{\lambda_i}{\gamma}\right)$ where $\gamma$ satisfies $\sum_{i=1}^m \min(\lambda_i, \gamma) = D$ . Only $m$ components need encoding.

Rate savings

If $m = 10$ and $d = 1000$ , the perceptual rate is roughly $10/1000 = 1\%$ of the MSE rate at the same distortion level. This is the information-theoretic basis for the massive compression gains of semantic communication: most of the source's entropy is in perceptually irrelevant components.

Definition:
The Perception-Distortion Tradeoff

The perception-distortion tradeoff (Blau and Michaeli, 2018) states that for any reconstruction $\hat{S}$ of source $S$ : $\text{MSE}(S, \hat{S}) + \lambda \cdot d_{\text{perc}}(P_S, P_{\hat{S}}) \geq D_{\min}(\lambda)$ where $d_{\text{perc}}(P_S, P_{\hat{S}})$ measures the divergence between the distribution of the source and the distribution of reconstructions (e.g., FID for images). Perfect perception ( $P_{\hat{S}} = P_S$ ) requires higher MSE, and low MSE requires imperfect perception (blurring).

This tradeoff explains why MMSE estimators produce blurry images: they minimize MSE but the output distribution $P_{\hat{S}}$ concentrates around the conditional mean, losing the sharpness of $P_S$ . Generative models (GANs, diffusion models) sacrifice MSE to improve perceptual quality by generating realistic-looking samples.

Perception-Distortion Tradeoff

Visualize the tradeoff between MSE (distortion) and distributional divergence (perception) for a Gaussian source with varying compression rate.

Parameters

Source dimension d10

Coding rate R (bits/dim)1

Source variance σ²1

Classical vs. Semantic Distortion Measures

Metric	Formula	What It Measures	Limitations
MSE	$\\|s - \hat{s}\\|^2/d$	Per-dimension squared error	Does not correlate with perception; penalizes irrelevant details
SSIM	Luminance × contrast × structure	Structural similarity	Hand-crafted; not differentiable end-to-end
LPIPS	$\\|f(s) - f(\hat{s})\\|^2$ (VGG features)	Learned perceptual similarity	Depends on pretrained network; not interpretable
FID	$\\|\mu_S - \mu_{\hat{S}}\\|^2 + \text{tr}(\Sigma_S + \Sigma_{\hat{S}} - 2(\Sigma_S \Sigma_{\hat{S}})^{1/2})$	Distributional divergence	Requires many samples; ignores per-sample quality
Task accuracy	$\mathbb{P}[c(\hat{s}) = c(s)]$	Downstream task performance	Task-specific; binary (no gradation)

Common Mistake: FID with Few Samples Is Unreliable

Mistake:

Computing the FID (Fréchet Inception Distance) between small batches of generated images (e.g., $N < 1000$ ) and using it to compare semantic communication systems.

Correction:

FID estimates the Wasserstein-2 distance between Gaussian fits to the Inception feature distributions. With small samples, the covariance estimate is biased, and FID can vary by 30-50% across random seeds. Use at least $N = 10{,}000$ samples for stable FID estimates, or use unbiased alternatives like Kernel FID or the CMMD metric.

Historical Note: From Weaver to Bao: 70 Years of Semantic Communication

1949–present

After Weaver's 1949 articulation of the three levels of communication, the semantic level was largely dormant for decades. Carnap and Bar-Hillel attempted a formal theory of "semantic information" in the 1950s, but it was too rigid to be practical. The modern revival began around 2019-2021, driven by three forces: (1) the success of deep learning in representation learning, making it possible to learn semantic features; (2) the approaching Shannon limits of 5G systems, motivating beyond-Shannon gains; and (3) the rise of machine-to-machine communication (IoT, autonomous driving), where "meaning" is well-defined as task performance. Bao et al. (2011) and Bourtsoulatze et al. (2019) were among the first to demonstrate learned JSCC for images, showing that neural networks could discover efficient source-channel codes without explicit training on information-theoretic objectives.

,

⚠️Engineering Note

Deployment Challenges for Semantic Communication

Deploying semantic communication in real systems faces several practical challenges:

Model sharing: Both transmitter and receiver must have compatible neural network models. Unlike standard codecs, DeepJSCC models are not standardized.
Computational cost: Neural network inference at both ends requires GPUs or dedicated hardware, which may not be available at edge devices.
Generalization: A model trained for one source distribution (e.g., faces) may fail on another (e.g., landscapes). Domain adaptation or universal models are needed.
Security: Adversarial attacks can exploit the neural network's learned representation to cause targeted misreconstruction.
Interpretability: Unlike separate coding, it is hard to debug or verify DeepJSCC systems because the latent representation has no standard structure.

Quick Check

The perception-distortion tradeoff says that:

Low MSE always means high perceptual quality

Perfect perceptual quality ( $P_{\hat{S}} = P_S$ ) requires accepting higher MSE

MSE and perceptual quality always improve together

The tradeoff only exists for non-Gaussian sources

Correction:

Perfect perceptual quality (

P_{\hat{S}} = P_S

) requires accepting higher MSE

To match the source distribution (sharp images), the reconstructor must add variability that increases MSE beyond the MMSE bound.

Key Takeaway

Semantic distortion measures capture task-relevant or perceptual quality that classical MSE misses. The rate-distortion function under semantic distortion can be dramatically lower than under MSE, providing the information-theoretic foundation for semantic communication gains. However, the perception-distortion tradeoff warns that no reconstruction can simultaneously minimize MSE and match the source distribution — system designers must choose where on this tradeoff to operate.

Why This Matters: Semantic Communication and 6G

Semantic communication is a leading candidate for 6G systems, where the goal is to support AI-native applications (autonomous driving, immersive XR, digital twins) that require task-relevant information rather than bit-perfect reconstruction. The 3GPP and ITU are investigating semantic communication as a key technology for IMT-2030. See Book telecom, Ch. 32 for the broader 6G context.

Semantic Distortion

A quality measure that evaluates reconstruction based on task-relevant features or perceptual similarity rather than symbol-by-symbol error.

Related: Semantic Distortion Measures

Perception-Distortion Tradeoff

The fundamental tradeoff between reconstruction fidelity (low MSE) and realism (matching the source distribution), first formalized by Blau and Michaeli (2018).

Semantic Metrics

What Makes a Good Reconstruction?

Definition: Semantic Distortion Measures

Theorem: Rate-Distortion Theory with Semantic Distortion

Standard rate-distortion framework

Feature-space reduction

Example: MSE vs. Perceptual Distortion for Images

Classical rate-distortion (MSE)

Perceptual rate-distortion

Rate savings

Definition: The Perception-Distortion Tradeoff

Perception-Distortion Tradeoff

Parameters

Classical vs. Semantic Distortion Measures

Common Mistake: FID with Few Samples Is Unreliable

Historical Note: From Weaver to Bao: 70 Years of Semantic Communication

Deployment Challenges for Semantic Communication

Quick Check

Key Takeaway

Why This Matters: Semantic Communication and 6G

Semantic Distortion

Perception-Distortion Tradeoff

Definition:
Semantic Distortion Measures

Definition:
The Perception-Distortion Tradeoff