The Lossy Compression Problem
When Lossless Is Not Enough
In Chapter 5 we proved that lossless compression requires at least bits per symbol. For a continuous source (e.g., a Gaussian), in the discrete sense β we would need infinitely many bits to describe each sample exactly. Even for discrete sources with large alphabets (e.g., images with 256 gray levels), the entropy may be impractically high. The solution is lossy compression: we allow the reconstructed output to differ from the source , as long as the difference (distortion) is below an acceptable threshold. The fundamental question becomes: what is the minimum rate needed to achieve average distortion at most ?
Definition: Distortion Measure
Distortion Measure
A distortion measure is a function that quantifies how "bad" a reconstruction is when the source produced . Common choices:
- Hamming distortion (discrete):
- Squared error (continuous):
- Absolute error:
For a pair of sequences, the average distortion is
Definition: Lossy Source Code
Lossy Source Code
A lossy source code consists of:
- An encoder mapping source sequences to indices,
- A decoder mapping indices to reconstruction sequences.
The codebook is , the set of possible reconstructions.
The encoder picks the "closest" codeword to the source sequence (nearest-neighbor encoding), and the decoder outputs it. The rate determines the codebook size: higher rate means more codewords, better coverage of the source space, and lower distortion.
Definition: Achievable Rate-Distortion Pair
Achievable Rate-Distortion Pair
A rate-distortion pair is achievable if there exists a sequence of codes with The rate-distortion region is the closure of all achievable pairs.
Example: Scalar Quantization as a Simple Lossy Code
A Gaussian source is quantized to levels using a uniform quantizer with step size . Compute the distortion as a function of and find the optimal .
Quantizer description
The uniform quantizer with levels and step size has decision boundaries at and reconstruction points at . The rate is bits/sample.
Distortion computation
The distortion is where is the quantized value. For large (overload region dominates), . For the granular region, (uniform quantization noise). The optimal balances granular and overload distortion. Numerically, for : optimal , giving .
Comparison with rate-distortion
The Gaussian rate-distortion function (derived in Section 4) gives at bits. The uniform quantizer achieves β about 2.7 dB worse than the rate-distortion limit. The gap is closed by non-uniform (Lloyd-Max) quantizers and entropy coding.
Scalar Quantization of a Gaussian Source
Visualize scalar quantization of a Gaussian source. Adjust the number of levels and observe the reconstruction quality, distortion, and rate. Compare uniform and Lloyd-Max quantizers.
Parameters
Common Mistake: Distortion Is Not Error Probability
Mistake:
Confusing the rate-distortion framework with the channel coding framework by treating distortion as an "error probability."
Correction:
In lossless coding, the goal is . In lossy coding, the goal is for a target distortion . The distortion can be any non-negative function, not just a 0/1 indicator. The rate-distortion function tells us the minimum rate to achieve distortion , just as capacity tells us the maximum rate with . The two are dual problems.
Distortion measure
A function quantifying the cost of representing source symbol by reconstruction . Common measures: Hamming distance (discrete), squared error (continuous), perceptual metrics (images/audio).
Lossy source code
An encoder-decoder pair that maps source sequences to a codebook of reconstruction sequences, allowing controlled distortion. The rate determines the codebook size.
Key Takeaway
Lossy compression allows controlled distortion to reduce the bit rate below the lossless limit. A distortion measure quantifies reconstruction quality, and the rate-distortion pair specifies the tradeoff. Scalar quantization is the simplest lossy code but falls short of the information-theoretic limit β the rate-distortion function, derived in the next sections, gives the exact minimum rate.