The Rate-Distortion Function

The Information-Theoretic Minimum

What is the absolute minimum rate needed to represent a source at distortion DD? Shannon answered this in 1959 with the rate-distortion function β€” a convex optimization over "test channels" PX^∣XP_{\hat{X}|X} that maps each distortion level to its minimum rate. The answer is elegant: R=min⁑I(X;X^)R = \min I(X;\hat{X}) subject to E[d]≀D\mathbb{E}[d] \leq D. Intuitively, the mutual information measures how many bits we need to "communicate" the reconstruction X^\hat{X} to the decoder, and we minimize this subject to the distortion constraint.

Definition:

Rate-Distortion Function

For a source X∼PXX \sim P_X with distortion measure d:XΓ—X^β†’[0,∞)d : \mathcal{X} \times \hat{\mathcal{X}} \to [0, \infty), the rate-distortion function is R=min⁑PX^∣X:E[d(X,X^)]≀DI(X;X^).R = \min_{\substack{P_{\hat{X}|X} : \\ \mathbb{E}[d(X, \hat{X})] \leq D}} I(X; \hat{X}). The minimization is over all conditional distributions (test channels) PX^∣XP_{\hat{X}|X} that achieve average distortion at most DD. The joint distribution is PX,X^(x,x^)=PX(x)PX^∣X(x^∣x)P_{X,\hat{X}}(x, \hat{x}) = P_X(x) P_{\hat{X}|X}(\hat{x}|x).

The test channel PX^∣XP_{\hat{X}|X} is a mathematical device β€” it does not correspond to a physical channel. It represents the "best possible" stochastic mapping from source to reconstruction. The source distribution PXP_X is fixed; only the test channel is optimized.

Theorem: Properties of the Rate-Distortion Function

For any source and bounded distortion measure:

  1. RR is non-increasing in DD: more distortion allows less rate.
  2. RR is convex in DD.
  3. Rβ‰₯0R \geq 0 for all DD, with R=0R = 0 iff Dβ‰₯Dmax⁑=min⁑x^E[d(X,x^)]D \geq D_{\max} = \min_{\hat{x}} \mathbb{E}[d(X, \hat{x})].
  4. For discrete sources with Hamming distortion: R(0)=H(X)R(0) = H(X).
  5. RR is continuous on [0,Dmax⁑][0, D_{\max}].

Example: Binary Source with Hamming Distortion

Compute RR for a Bernoulli(pp) source with Hamming distortion d(x,x^)=1{xβ‰ x^}d(x, \hat{x}) = \mathbf{1}\{x \neq \hat{x}\}, where p≀1/2p \leq 1/2.

Binary Rate-Distortion Function

Plot R=H(p)βˆ’H(D)R = H(p) - H(D) for a binary source. Adjust the source parameter pp and observe how the curve changes. Note the convex, non-increasing shape and the endpoints R(0)=H(p)R(0) = H(p) and R(p)=0R(p) = 0.

Parameters
0.3

Rate-Distortion as a Convex Program

The rate-distortion optimization min⁑PX^∣XI(X;X^)\min_{P_{\hat{X}|X}} I(X;\hat{X}) subject to E[d]≀D\mathbb{E}[d] \leq D is a convex program: the mutual information is convex in PX^∣XP_{\hat{X}|X} for fixed PXP_X (this is the opposite of the channel capacity problem, where II is concave in PXP_X), and the distortion constraint is linear. Convexity means KKT conditions are sufficient for optimality, and iterative algorithms like Blahut-Arimoto converge to the global optimum. This is not just a mathematical convenience β€” it is why rate-distortion theory is computationally tractable.

Rate-distortion function RR

The minimum mutual information I(X;X^)I(X;\hat{X}) over all test channels achieving average distortion at most DD. The fundamental limit of lossy compression: no code at rate R<RR < R can achieve distortion DD.

Related: Rate-Distortion Function

Test channel

The conditional distribution PX^∣XP_{\hat{X}|X} in the rate-distortion optimization. It represents the stochastic mapping from source to reconstruction that minimizes the required rate at a given distortion level. Not a physical channel.

Key Takeaway

The rate-distortion function R=min⁑PX^∣X:E[d]≀DI(X;X^)R = \min_{P_{\hat{X}|X}: \mathbb{E}[d] \leq D} I(X;\hat{X}) is the information-theoretic minimum rate for lossy compression at distortion DD. It is convex and non-increasing in DD, ranges from H(X)H(X) (lossless) to 0 (D=Dmax⁑D = D_{\max}), and is computed by a convex optimization over test channels. For a binary source with Hamming distortion: R=H(p)βˆ’H(D)R = H(p) - H(D).