The Rate-Distortion Function
The Information-Theoretic Minimum
What is the absolute minimum rate needed to represent a source at distortion ? Shannon answered this in 1959 with the rate-distortion function β a convex optimization over "test channels" that maps each distortion level to its minimum rate. The answer is elegant: subject to . Intuitively, the mutual information measures how many bits we need to "communicate" the reconstruction to the decoder, and we minimize this subject to the distortion constraint.
Definition: Rate-Distortion Function
Rate-Distortion Function
For a source with distortion measure , the rate-distortion function is The minimization is over all conditional distributions (test channels) that achieve average distortion at most . The joint distribution is .
The test channel is a mathematical device β it does not correspond to a physical channel. It represents the "best possible" stochastic mapping from source to reconstruction. The source distribution is fixed; only the test channel is optimized.
Theorem: Properties of the Rate-Distortion Function
For any source and bounded distortion measure:
- is non-increasing in : more distortion allows less rate.
- is convex in .
- for all , with iff .
- For discrete sources with Hamming distortion: .
- is continuous on .
Non-increasing
If , then . Minimizing over a larger set gives a smaller value: .
Convexity
Let . Consider the test channel where achieve respectively. The distortion is at most (by linearity of expectation). The mutual information satisfies by convexity of mutual information in the conditional distribution . Therefore .
$R(D_{\max}) = 0$
At , the optimal test channel is β deterministic, always outputting the best constant reconstruction. Then .
Example: Binary Source with Hamming Distortion
Compute for a Bernoulli() source with Hamming distortion , where .
Set up the optimization
The source is and the reconstruction . The test channel is a BSC with crossover probability : . The distortion constraint is .
Compute mutual information
For a BSC() test channel: where is the marginal probability and is the binary entropy function.
Minimize over test channels
The BSC() test channel achieves the minimum (the reader should verify this using KKT conditions or the Blahut-Arimoto algorithm). Therefore:
Interpretation
At (lossless): , recovering the lossless source coding theorem. At (maximum useful distortion): , since always outputting achieves distortion with zero bits. The rate-distortion function smoothly interpolates between these extremes.
Binary Rate-Distortion Function
Plot for a binary source. Adjust the source parameter and observe how the curve changes. Note the convex, non-increasing shape and the endpoints and .
Parameters
Rate-Distortion as a Convex Program
The rate-distortion optimization subject to is a convex program: the mutual information is convex in for fixed (this is the opposite of the channel capacity problem, where is concave in ), and the distortion constraint is linear. Convexity means KKT conditions are sufficient for optimality, and iterative algorithms like Blahut-Arimoto converge to the global optimum. This is not just a mathematical convenience β it is why rate-distortion theory is computationally tractable.
Rate-distortion function
The minimum mutual information over all test channels achieving average distortion at most . The fundamental limit of lossy compression: no code at rate can achieve distortion .
Related: Rate-Distortion Function
Test channel
The conditional distribution in the rate-distortion optimization. It represents the stochastic mapping from source to reconstruction that minimizes the required rate at a given distortion level. Not a physical channel.
Key Takeaway
The rate-distortion function is the information-theoretic minimum rate for lossy compression at distortion . It is convex and non-increasing in , ranges from (lossless) to 0 (), and is computed by a convex optimization over test channels. For a binary source with Hamming distortion: .