Rate-Distortion with Side Information (Wyner-Ziv)

Side Information at the Decoder

Suppose the decoder has access to side information YY correlated with the source XX. Intuitively, the decoder can use YY to reduce the number of bits it needs from the encoder. The Wyner-Ziv theorem (1976) quantifies this precisely. The surprise is that for Gaussian sources with squared-error distortion, having side information only at the decoder is just as good as having it at both encoder and decoder. This "no loss from binning" result is remarkable and has profound implications for distributed compression and sensor networks.

Definition:

Wyner-Ziv Rate-Distortion

The Wyner-Ziv rate-distortion function for source XX with decoder side information YY and distortion measure dd is: RWZ(D)=min⁑U:UX⊸Y⊸ZXX⊸Y⊸ZYg:UΓ—Yβ†’X^E[d(X,g(U,Y))]≀DI(X;U)βˆ’I(U;Y)R_{WZ}(D) = \min_{\substack{U : U X \multimap Y \multimap Z X X \multimap Y \multimap Z Y \\ g : \mathcal{U} \times \mathcal{Y} \to \hat{\mathcal{X}} \\ \mathbb{E}[d(X, g(U, Y))] \leq D}} I(X; U) - I(U; Y) where UU is an auxiliary random variable forming the Markov chain UX⊸Y⊸ZXX⊸Y⊸ZYU X \multimap Y \multimap Z X X \multimap Y \multimap Z Y, and gg is the decoder reconstruction function that uses both UU and YY.

Theorem: Wyner-Ziv Theorem

For a memoryless source-side information pair (X,Y)(X, Y) with distortion measure dd:

  1. The minimum achievable rate with side information at both encoder and decoder is RXY(D)=min⁑PX^∣X,Y:E[d]≀DI(X;X^∣Y)R_{XY}(D) = \min_{P_{\hat{X}|X,Y}: \mathbb{E}[d] \leq D} I(X; \hat{X} | Y).
  2. The minimum achievable rate with side information only at the decoder (Wyner-Ziv) is RWZ(D)=min⁑U[I(X;U)βˆ’I(U;Y)]R_{WZ}(D) = \min_{U} [I(X;U) - I(U;Y)] as defined above.
  3. In general, RWZ(D)β‰₯RXY(D)R_{WZ}(D) \geq R_{XY}(D) β€” not having side information at the encoder hurts.
  4. Gaussian case: For (X,Y)(X, Y) jointly Gaussian and squared-error distortion, RWZ(D)=RXY(D)R_{WZ}(D) = R_{XY}(D) β€” side information at the encoder is worthless!

The Wyner-Ziv encoder uses binning: it groups codewords into bins and sends only the bin index. The decoder, knowing YY, can disambiguate which codeword within the bin was intended. The rate savings I(U;Y)I(U;Y) is exactly the information the side information provides about the auxiliary variable UU.

The Gaussian result is surprising: even though the encoder does not know YY, it can structure its code so that the decoder's side information is just as useful as if the encoder had known it. This fails for general sources β€” for the binary case, RWZ(D)>RXY(D)R_{WZ}(D) > R_{XY}(D).

,

Example: Wyner-Ziv for Gaussian Source and Side Information

X∼N(0,1)X \sim \mathcal{N}(0, 1) and Y=X+NY = X + N with N∼N(0,1)N \sim \mathcal{N}(0, 1) independent of XX. Compute RWZ(D)R_{WZ}(D) for D=0.25D = 0.25 and compare with R(D)R(D) without side information.

Quick Check

For a binary source XX and decoder side information Y=XβŠ•NY = X \oplus N (BSC observation), is RWZ(D)=RXY(D)R_{WZ}(D) = R_{XY}(D) always?

Yes, side information at the encoder never helps for any source

No, there is generally a strict loss for non-Gaussian sources

It depends on the correlation between XX and YY

Yes, because binning is always optimal

Why This Matters: Wyner-Ziv in Distributed Sensor Networks

In a sensor network, multiple sensors observe correlated measurements and must compress them for transmission to a fusion center. Wyner-Ziv coding allows each sensor to compress its data without knowing what the other sensors have observed β€” the fusion center uses all received data as side information when decoding each sensor's message. This is the foundation of distributed source coding, covered in Chapter 7. See also Book telecom, Ch. 11 for the information-theoretic perspective on multi-terminal systems.

Key Takeaway

The Wyner-Ziv theorem characterizes lossy compression when the decoder has correlated side information. The rate is RWZ(D)=min⁑U[I(X;U)βˆ’I(U;Y)]R_{WZ}(D) = \min_U [I(X;U) - I(U;Y)], achieved by binning. For Gaussian sources with squared-error distortion, side information at the encoder is worthless (RWZ=RXYR_{WZ} = R_{XY}) β€” a remarkable result with no analogue for general sources. This is the lossy analogue of Slepian-Wolf coding for lossless compression.