The Lossy Compression Problem

When Lossless Is Not Enough

In Chapter 5 we proved that lossless compression requires at least H(X)H(X) bits per symbol. For a continuous source (e.g., a Gaussian), H(X)=∞H(X) = \infty in the discrete sense β€” we would need infinitely many bits to describe each sample exactly. Even for discrete sources with large alphabets (e.g., images with 256 gray levels), the entropy may be impractically high. The solution is lossy compression: we allow the reconstructed output X^\hat{X} to differ from the source XX, as long as the difference (distortion) is below an acceptable threshold. The fundamental question becomes: what is the minimum rate RR needed to achieve average distortion at most DD?

Definition:

Distortion Measure

A distortion measure is a function d:XΓ—X^β†’[0,∞)d : \mathcal{X} \times \hat{\mathcal{X}} \to [0, \infty) that quantifies how "bad" a reconstruction x^\hat{x} is when the source produced xx. Common choices:

  • Hamming distortion (discrete): d(x,x^)=1{xβ‰ x^}d(x, \hat{x}) = \mathbf{1}\{x \neq \hat{x}\}
  • Squared error (continuous): d(x,x^)=(xβˆ’x^)2d(x, \hat{x}) = (x - \hat{x})^2
  • Absolute error: d(x,x^)=∣xβˆ’x^∣d(x, \hat{x}) = |x - \hat{x}|

For a pair of sequences, the average distortion is d(x,x^)=1nβˆ‘i=1nd(xi,x^i).d(\mathbf{x}, \hat{\mathbf{x}}) = \frac{1}{n}\sum_{i=1}^n d(x_i, \hat{x}_i).

Definition:

Lossy Source Code

A (2nR,n)(2^{nR}, n) lossy source code consists of:

  • An encoder f:Xnβ†’{1,2,…,2nR}f : \mathcal{X}^n \to \{1, 2, \ldots, 2^{nR}\} mapping source sequences to indices,
  • A decoder g:{1,2,…,2nR}β†’X^ng : \{1, 2, \ldots, 2^{nR}\} \to \hat{\mathcal{X}}^n mapping indices to reconstruction sequences.

The codebook is C={g(1),g(2),…,g(2nR)}βŠ‚X^n\mathcal{C} = \{g(1), g(2), \ldots, g(2^{nR})\} \subset \hat{\mathcal{X}}^n, the set of possible reconstructions.

The encoder picks the "closest" codeword to the source sequence (nearest-neighbor encoding), and the decoder outputs it. The rate RR determines the codebook size: higher rate means more codewords, better coverage of the source space, and lower distortion.

Definition:

Achievable Rate-Distortion Pair

A rate-distortion pair (R,D)(R, D) is achievable if there exists a sequence of (2nR,n)(2^{nR}, n) codes with lim⁑nβ†’βˆžE[1nβˆ‘i=1nd(Xi,X^i)]≀D.\lim_{n \to \infty} \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n d(X_i, \hat{X}_i)\right] \leq D. The rate-distortion region is the closure of all achievable pairs.

Example: Scalar Quantization as a Simple Lossy Code

A Gaussian source X∼N(0,1)X \sim \mathcal{N}(0, 1) is quantized to M=4M = 4 levels using a uniform quantizer with step size Ξ”\Delta. Compute the distortion as a function of Ξ”\Delta and find the optimal Ξ”\Delta.

Scalar Quantization of a Gaussian Source

Visualize scalar quantization of a Gaussian source. Adjust the number of levels and observe the reconstruction quality, distortion, and rate. Compare uniform and Lloyd-Max quantizers.

Parameters
4
1

Common Mistake: Distortion Is Not Error Probability

Mistake:

Confusing the rate-distortion framework with the channel coding framework by treating distortion DD as an "error probability."

Correction:

In lossless coding, the goal is Peβ†’0P_e \to 0. In lossy coding, the goal is D≀D0D \leq D_0 for a target distortion D0>0D_0 > 0. The distortion can be any non-negative function, not just a 0/1 indicator. The rate-distortion function tells us the minimum rate to achieve distortion DD, just as capacity tells us the maximum rate with Peβ†’0P_e \to 0. The two are dual problems.

Distortion measure

A function d(x,x^)d(x, \hat{x}) quantifying the cost of representing source symbol xx by reconstruction x^\hat{x}. Common measures: Hamming distance (discrete), squared error (continuous), perceptual metrics (images/audio).

Lossy source code

An encoder-decoder pair that maps source sequences to a codebook of reconstruction sequences, allowing controlled distortion. The rate R=1nlog⁑∣C∣R = \frac{1}{n}\log |\mathcal{C}| determines the codebook size.

Key Takeaway

Lossy compression allows controlled distortion to reduce the bit rate below the lossless limit. A distortion measure d(x,x^)d(x, \hat{x}) quantifies reconstruction quality, and the rate-distortion pair (R,D)(R, D) specifies the tradeoff. Scalar quantization is the simplest lossy code but falls short of the information-theoretic limit β€” the rate-distortion function, derived in the next sections, gives the exact minimum rate.