Transform Coding and Quantization
From Theory to JPEG
Rate-distortion theory tells us what is achievable; this section bridges to how it is achieved in practice. The dominant architecture in modern compression is transform coding: apply a unitary transform (DCT, wavelet) to decorrelate the source, then quantize each coefficient independently, then entropy-code the quantized values. This three-stage pipeline approaches the rate-distortion bound for Gaussian sources and is the foundation of JPEG, JPEG 2000, H.264/AVC, HEVC, and essentially all image and video compression standards.
Definition: Transform Coding
Transform Coding
Transform coding consists of three stages:
- Transform: Apply a unitary (orthogonal) transform to decorrelate the source: . The transform coefficients are (approximately) independent with variances .
- Quantize: Apply scalar quantization to each coefficient independently, allocating bits according to the variance (reverse waterfilling).
- Entropy code: Encode the quantized coefficients losslessly using arithmetic coding or Huffman coding.
The key insight: unitary transforms preserve distortion (), so we can work in the transform domain without penalty. If the transform achieves perfect decorrelation (the KLT for Gaussian sources), the problem reduces to independent scalar quantization β which we know how to do optimally.
Definition: Lloyd-Max Quantizer
Lloyd-Max Quantizer
The Lloyd-Max algorithm designs an optimal scalar quantizer for a given source distribution by iterating between two steps:
- Nearest-neighbor condition: Given reconstruction points , set decision boundaries to the midpoints: .
- Centroid condition: Given decision regions, set each reconstruction point to the centroid: .
Iterate until convergence. The algorithm converges to a local optimum of the distortion .
Lloyd-Max without entropy coding achieves approximately distortion for Gaussian sources β matching the rate-distortion slope but with a constant gap. Adding entropy coding (non-uniform quantizer + arithmetic coder) closes most of this gap.
Theorem: Entropy-Coded Scalar Quantization Gap (Gish-Pierce)
For a continuous source with finite differential entropy, a uniform scalar quantizer followed by an optimal entropy coder achieves distortion within a constant gap of the rate-distortion function: where is the differential entropy. For a Gaussian source:
The 1.53 dB gap (a factor of ) comes from the difference between a uniform quantization cell (a cube) and the optimal quantization cell (a sphere, in high dimensions). This is the "shaping loss" β the price of using scalar quantization instead of vector quantization. It can be reduced by dithering (adding controlled randomness) or by using lattice quantizers with better shaping.
High-rate approximation
At high rate, a uniform quantizer with step produces quantization noise uniformly distributed on with variance . The entropy of the quantized output is approximately . Setting : , so . The R-D bound gives . Ratio: dB.
Example: The JPEG Compression Pipeline
Describe how JPEG implements transform coding and identify the rate-distortion implications at each stage.
Block DCT
JPEG divides the image into blocks and applies the 2D DCT to each. The DCT approximately decorrelates natural images (it is close to the KLT for stationary first-order Markov processes). The 64 DCT coefficients have rapidly decaying variances β most energy is in the low-frequency coefficients.
Quantization
Each DCT coefficient is divided by a quantization step from the "quantization matrix" and rounded to the nearest integer. The quantization matrix implements a crude form of reverse waterfilling: large steps for high-frequency coefficients (low variance, perceptually less important), small steps for low-frequency (high variance, perceptually critical). The quality factor scales the entire matrix.
Entropy coding
The quantized coefficients are entropy-coded using Huffman coding (baseline JPEG) or arithmetic coding (JPEG with arithmetic option). Run-length coding of zero coefficients exploits the sparsity created by quantization.
Rate-distortion perspective
At high quality (): near-lossless, rate approaches of the DCT coefficients. At low quality (): aggressive quantization, rate approaches 0, distortion approaches the variance of the source. The quality factor traces out a curve on the rate-distortion plane, typically 3β6 dB above the Gaussian R-D bound (due to non-Gaussianity and the 8Γ8 block constraint).
ECSQ vs. Rate-Distortion Bound
Compare the distortion-rate curves of: (i) the Gaussian rate-distortion bound, (ii) entropy-coded uniform quantizer (ECSQ), and (iii) fixed-rate uniform quantizer. Observe the 1.53 dB gap between ECSQ and the R-D bound.
Parameters
Blahut-Arimoto Algorithm for
Complexity: per iteration. Convergence is geometric.The Blahut-Arimoto algorithm alternates between optimizing the test channel (for fixed marginal) and updating the marginal (for fixed test channel). This is the same algorithmic structure as the Blahut-Arimoto algorithm for channel capacity β another manifestation of the source-channel duality. The parameter controls the operating point on the curve: large gives low distortion (high rate), small gives high distortion.
Historical Note: Blahut and Arimoto: Two Independent Discoveries
1972Richard Blahut (Bell Labs) and Suguru Arimoto (University of Tokyo) independently discovered the alternating optimization algorithm for computing channel capacity in 1972. Blahut extended it to rate-distortion in the same paper. The algorithm is a special case of coordinate descent (or block relaxation) on a convex objective, which guarantees convergence. It remains the standard numerical method for computing for discrete sources, and its structure has influenced the design of iterative algorithms in communications (turbo decoding, expectation-maximization) and machine learning (variational inference).
Practical Compression Standards and Their R-D Performance
| Standard | Transform | Quantization | Entropy Coder | Gap to R(D) |
|---|---|---|---|---|
| JPEG | 8Γ8 DCT | Uniform (per-coeff) | Huffman | ~5 dB |
| JPEG 2000 | Wavelet (9/7) | EBCOT (embedded) | Arithmetic | ~3 dB |
| H.264/AVC | 4Γ4/8Γ8 DCT + prediction | Uniform | CABAC (arithmetic) | ~2-4 dB |
| HEVC/H.265 | Variable-size DCT + prediction | Uniform | CABAC | ~1.5-3 dB |
| VVC/H.266 | Multi-type transform | Dependent scalar | CABAC | ~1-2.5 dB |
Transform coding
A lossy compression architecture: apply a decorrelating transform (DCT, wavelet, KLT), quantize the transform coefficients, and entropy-code the result. The foundation of JPEG, HEVC, and nearly all practical image/video compression.
Related: Transform Coding
Shaping loss
The gap between scalar quantization and the rate-distortion bound, arising from the suboptimality of cubic quantization cells vs. spherical ones. For uniform ECSQ on Gaussian sources: 1.53 dB ( factor).
Key Takeaway
Transform coding (transform + quantize + entropy code) is the practical realization of rate-distortion theory. The KLT decorrelates optimally, reducing the problem to independent scalar quantization with reverse waterfilling. Entropy-coded scalar quantization operates within 1.53 dB of the Gaussian R-D bound. Modern codecs (HEVC, VVC) close much of this gap through prediction, variable-size transforms, and context-adaptive entropy coding. The Blahut-Arimoto algorithm computes numerically for any discrete source.