Ferkans — Interactive Telecom Tutor

From Theory to JPEG

Rate-distortion theory tells us what is achievable; this section bridges to how it is achieved in practice. The dominant architecture in modern compression is transform coding: apply a unitary transform (DCT, wavelet) to decorrelate the source, then quantize each coefficient independently, then entropy-code the quantized values. This three-stage pipeline approaches the rate-distortion bound for Gaussian sources and is the foundation of JPEG, JPEG 2000, H.264/AVC, HEVC, and essentially all image and video compression standards.

Definition:
Transform Coding

Transform coding consists of three stages:

Transform: Apply a unitary (orthogonal) transform $\mathbf{U}$ to decorrelate the source: $\mathbf{Y} = \mathbf{U} \mathbf{X}$ . The transform coefficients $Y_1, \ldots, Y_k$ are (approximately) independent with variances $\sigma_1^2, \ldots, \sigma_k^2$ .
Quantize: Apply scalar quantization to each coefficient $Y_i$ independently, allocating bits according to the variance (reverse waterfilling).
Entropy code: Encode the quantized coefficients losslessly using arithmetic coding or Huffman coding.

The key insight: unitary transforms preserve distortion ( $\|\mathbf{X} - \hat{\mathbf{X}}\|^2 = \|\mathbf{Y} - \hat{\mathbf{Y}}\|^2$ ), so we can work in the transform domain without penalty. If the transform achieves perfect decorrelation (the KLT for Gaussian sources), the problem reduces to independent scalar quantization — which we know how to do optimally.

Definition:
Lloyd-Max Quantizer

The Lloyd-Max algorithm designs an optimal scalar quantizer for a given source distribution by iterating between two steps:

Nearest-neighbor condition: Given reconstruction points $\hat{y}_1, \ldots, \hat{y}_M$ , set decision boundaries to the midpoints: $b_i = (\hat{y}_i + \hat{y}_{i+1})/2$ .
Centroid condition: Given decision regions, set each reconstruction point to the centroid: $\hat{y}_i = \mathbb{E}[Y | Y \in \text{region}_i]$ .

Iterate until convergence. The algorithm converges to a local optimum of the distortion $\mathbb{E}[(Y - Q(Y))^2]$ .

Lloyd-Max without entropy coding achieves approximately $\sigma^2 2^{-2R}$ distortion for Gaussian sources — matching the rate-distortion slope but with a constant gap. Adding entropy coding (non-uniform quantizer + arithmetic coder) closes most of this gap.

Theorem: Entropy-Coded Scalar Quantization Gap (Gish-Pierce)

For a continuous source with finite differential entropy, a uniform scalar quantizer followed by an optimal entropy coder achieves distortion within a constant gap of the rate-distortion function: $D_{ECSQ}(R) \leq 2^{2h} \cdot 2^{-2R} \cdot \frac{\pi e}{6}$ where $h = h(X)$ is the differential entropy. For a Gaussian source: $D_{ECSQ} = \frac{\pi e}{6} \cdot D_{RD} \approx 1.53 \text{ dB above } R.$

The 1.53 dB gap (a factor of $\pi e / 6 \approx 1.42$ ) comes from the difference between a uniform quantization cell (a cube) and the optimal quantization cell (a sphere, in high dimensions). This is the "shaping loss" — the price of using scalar quantization instead of vector quantization. It can be reduced by dithering (adding controlled randomness) or by using lattice quantizers with better shaping.

Proof

High-rate approximation

At high rate, a uniform quantizer with step $\Delta$ produces quantization noise uniformly distributed on $[-\Delta/2, \Delta/2]$ with variance $\Delta^2/12$ . The entropy of the quantized output is approximately $h(X) - \log \Delta$ . Setting $R = h(X) - \log \Delta$ : $\Delta = 2^{h(X) - R}$ , so $D = \Delta^2/12 = 2^{2h(X)} \cdot 2^{-2R} / 12$ . The R-D bound gives $D^* = 2^{2h(X)} \cdot 2^{-2R} / (2\pi e)$ . Ratio: $D/D^* = 2\pi e / 12 = \pi e / 6 \approx 1.53$ dB.

Example: The JPEG Compression Pipeline

Describe how JPEG implements transform coding and identify the rate-distortion implications at each stage.

Solution

Block DCT

JPEG divides the image into $8 \times 8$ blocks and applies the 2D DCT to each. The DCT approximately decorrelates natural images (it is close to the KLT for stationary first-order Markov processes). The 64 DCT coefficients have rapidly decaying variances — most energy is in the low-frequency coefficients.

Quantization

Each DCT coefficient is divided by a quantization step from the "quantization matrix" and rounded to the nearest integer. The quantization matrix implements a crude form of reverse waterfilling: large steps for high-frequency coefficients (low variance, perceptually less important), small steps for low-frequency (high variance, perceptually critical). The quality factor $Q$ scales the entire matrix.

Entropy coding

The quantized coefficients are entropy-coded using Huffman coding (baseline JPEG) or arithmetic coding (JPEG with arithmetic option). Run-length coding of zero coefficients exploits the sparsity created by quantization.

Rate-distortion perspective

At high quality ( $Q \to 100$ ): near-lossless, rate approaches $H$ of the DCT coefficients. At low quality ( $Q \to 1$ ): aggressive quantization, rate approaches 0, distortion approaches the variance of the source. The quality factor traces out a curve on the rate-distortion plane, typically 3–6 dB above the Gaussian R-D bound (due to non-Gaussianity and the 8×8 block constraint).

ECSQ vs. Rate-Distortion Bound

Compare the distortion-rate curves of: (i) the Gaussian rate-distortion bound, (ii) entropy-coded uniform quantizer (ECSQ), and (iii) fixed-rate uniform quantizer. Observe the 1.53 dB gap between ECSQ and the R-D bound.

Parameters

Source std dev

\sigma

1

Max rate (bits/sample)6

Blahut-Arimoto Algorithm for $R(D)$

Complexity:

O(|\mathcal{X}| \cdot |\hat{\mathcal{X}}|)

per iteration. Convergence is geometric.

Input: Source distribution

P_X

, distortion measure

d(x, \hat{x})

, Lagrange multiplier

\lambda > 0

Output: Optimal test channel

P^*_{\hat{X}|X}

and rate

R(D)

Initialize:

q(\hat{x}) = 1/|\hat{\mathcal{X}}|

(uniform marginal)

Repeat until convergence:

1. Update test channel:

P_{\hat{x}|x} = \frac{q(\hat{x}) \exp(-\lambda \cdot d(x, \hat{x}))}{\sum_{\hat{x}'} q(\hat{x}') \exp(-\lambda \cdot d(x, \hat{x}'))}

2. Update marginal:

q(\hat{x}) = \sum_x P_X(x) P_{\hat{x}|x}

3. Compute

D = \sum_{x, \hat{x}} P_X(x) P_{\hat{x}|x} d(x, \hat{x})

and

R = \sum_{x, \hat{x}} P_X(x) P_{\hat{x}|x} \log \frac{P_{\hat{x}|x}}{q(\hat{x})}

Varying

\lambda

traces out the

R(D)

curve.

The Blahut-Arimoto algorithm alternates between optimizing the test channel (for fixed marginal) and updating the marginal (for fixed test channel). This is the same algorithmic structure as the Blahut-Arimoto algorithm for channel capacity — another manifestation of the source-channel duality. The parameter $\lambda$ controls the operating point on the $R(D)$ curve: large $\lambda$ gives low distortion (high rate), small $\lambda$ gives high distortion.

Historical Note: Blahut and Arimoto: Two Independent Discoveries

1972

Richard Blahut (Bell Labs) and Suguru Arimoto (University of Tokyo) independently discovered the alternating optimization algorithm for computing channel capacity in 1972. Blahut extended it to rate-distortion in the same paper. The algorithm is a special case of coordinate descent (or block relaxation) on a convex objective, which guarantees convergence. It remains the standard numerical method for computing $R(D)$ for discrete sources, and its structure has influenced the design of iterative algorithms in communications (turbo decoding, expectation-maximization) and machine learning (variational inference).

Practical Compression Standards and Their R-D Performance

Standard	Transform	Quantization	Entropy Coder	Gap to R(D)
JPEG	8×8 DCT	Uniform (per-coeff)	Huffman	~5 dB
JPEG 2000	Wavelet (9/7)	EBCOT (embedded)	Arithmetic	~3 dB
H.264/AVC	4×4/8×8 DCT + prediction	Uniform	CABAC (arithmetic)	~2-4 dB
HEVC/H.265	Variable-size DCT + prediction	Uniform	CABAC	~1.5-3 dB
VVC/H.266	Multi-type transform	Dependent scalar	CABAC	~1-2.5 dB

Transform coding

A lossy compression architecture: apply a decorrelating transform (DCT, wavelet, KLT), quantize the transform coefficients, and entropy-code the result. The foundation of JPEG, HEVC, and nearly all practical image/video compression.

Related: Transform Coding

Shaping loss

The gap between scalar quantization and the rate-distortion bound, arising from the suboptimality of cubic quantization cells vs. spherical ones. For uniform ECSQ on Gaussian sources: 1.53 dB ( $\pi e/6$ factor).

Key Takeaway

Transform coding (transform + quantize + entropy code) is the practical realization of rate-distortion theory. The KLT decorrelates optimally, reducing the problem to independent scalar quantization with reverse waterfilling. Entropy-coded scalar quantization operates within 1.53 dB of the Gaussian R-D bound. Modern codecs (HEVC, VVC) close much of this gap through prediction, variable-size transforms, and context-adaptive entropy coding. The Blahut-Arimoto algorithm computes $R(D)$ numerically for any discrete source.

Transform Coding and Quantization