Ferkans — Interactive Telecom Tutor

Side Information at the Decoder

Suppose the decoder has access to side information $Y$ correlated with the source $X$ . Intuitively, the decoder can use $Y$ to reduce the number of bits it needs from the encoder. The Wyner-Ziv theorem (1976) quantifies this precisely. The surprise is that for Gaussian sources with squared-error distortion, having side information only at the decoder is just as good as having it at both encoder and decoder. This "no loss from binning" result is remarkable and has profound implications for distributed compression and sensor networks.

Definition:
Wyner-Ziv Rate-Distortion

The Wyner-Ziv rate-distortion function for source $X$ with decoder side information $Y$ and distortion measure $d$ is: $R_{WZ}(D) = \min_{\substack{U : U X \multimap Y \multimap Z X X \multimap Y \multimap Z Y \\ g : \mathcal{U} \times \mathcal{Y} \to \hat{\mathcal{X}} \\ \mathbb{E}[d(X, g(U, Y))] \leq D}} I(X; U) - I(U; Y)$ where $U$ is an auxiliary random variable forming the Markov chain $U X \multimap Y \multimap Z X X \multimap Y \multimap Z Y$ , and $g$ is the decoder reconstruction function that uses both $U$ and $Y$ .

Theorem: Wyner-Ziv Theorem

For a memoryless source-side information pair $(X, Y)$ with distortion measure $d$ :

The minimum achievable rate with side information at both encoder and decoder is $R_{XY}(D) = \min_{P_{\hat{X}|X,Y}: \mathbb{E}[d] \leq D} I(X; \hat{X} | Y)$ .
The minimum achievable rate with side information only at the decoder (Wyner-Ziv) is $R_{WZ}(D) = \min_{U} [I(X;U) - I(U;Y)]$ as defined above.
In general, $R_{WZ}(D) \geq R_{XY}(D)$ — not having side information at the encoder hurts.
Gaussian case: For $(X, Y)$ jointly Gaussian and squared-error distortion, $R_{WZ}(D) = R_{XY}(D)$ — side information at the encoder is worthless!

The Wyner-Ziv encoder uses binning: it groups codewords into bins and sends only the bin index. The decoder, knowing $Y$ , can disambiguate which codeword within the bin was intended. The rate savings $I(U;Y)$ is exactly the information the side information provides about the auxiliary variable $U$ .

The Gaussian result is surprising: even though the encoder does not know $Y$ , it can structure its code so that the decoder's side information is just as useful as if the encoder had known it. This fails for general sources — for the binary case, $R_{WZ}(D) > R_{XY}(D)$ .

Proof

Achievability sketch

Generate a random codebook of $U$ sequences. Partition them into $2^{nR}$ bins, each of size $2^{n[I(X;U) - R]}$ . The encoder finds a $U$ -sequence jointly typical with $X$ and sends its bin index. The decoder, knowing $Y$ , searches the bin for a $U$ -sequence jointly typical with $Y$ . If $R > I(X;U) - I(U;Y)$ , the bin is small enough relative to the typical set that the decoder succeeds.

Converse

The converse bounds $nR \geq I(\mathbf{X}; W | \mathbf{Y})$ where $W$ is the encoder output, then single-letterizes using the Markov structure.

Gaussian optimality

For $(X, Y)$ jointly Gaussian with $Y = X + N$ where $N \sim \mathcal{N}(0, \sigma_N^2)$ : Set $U = X + Z$ where $Z \sim \mathcal{N}(0, D)$ independent of everything. The MMSE estimate $\hat{X} = \mathbb{E}[X | U, Y]$ achieves distortion exactly $D^* = \frac{D\sigma_N^2}{D + \sigma_N^2}$ by Gaussian estimation theory. The rate $I(X;U) - I(U;Y) = \frac{1}{2}\log(\sigma_X^2/D)$ , which matches $R_{XY}(D)$ . The side information $Y$ does not appear in the rate expression!

,

Example: Wyner-Ziv for Gaussian Source and Side Information

$X \sim \mathcal{N}(0, 1)$ and $Y = X + N$ with $N \sim \mathcal{N}(0, 1)$ independent of $X$ . Compute $R_{WZ}(D)$ for $D = 0.25$ and compare with $R(D)$ without side information.

Solution

Without side information

$R(0.25) = \frac{1}{2}\log_2(1/0.25) = \frac{1}{2}\log_2 4 = 1$ bit/sample.

With side information (Wyner-Ziv)

For the Gaussian case, $R_{WZ}(D) = R_{XY}(D)$ . With side information: $R_{XY}(D) = \frac{1}{2}\log_2(\sigma_{X|Y}^2 / D)$ where $\sigma_{X|Y}^2 = 1 - \frac{1}{1+1} = 1/2$ (MMSE of $X$ given $Y$ ). $R_{WZ}(0.25) = \frac{1}{2}\log_2(0.5/0.25) = \frac{1}{2}\log_2 2 = 0.5$ bits/sample.

Savings

Side information reduces the rate from 1 bit to 0.5 bits — a 50% savings. The decoder uses the noisy observation $Y$ to predict $X$ with variance $\sigma_{X|Y}^2 = 0.5$ , and the encoder only needs to describe the remaining uncertainty.

Quick Check

For a binary source $X$ and decoder side information $Y = X \oplus N$ (BSC observation), is $R_{WZ}(D) = R_{XY}(D)$ always?

Yes, side information at the encoder never helps for any source

No, there is generally a strict loss for non-Gaussian sources

It depends on the correlation between $X$ and $Y$

Yes, because binning is always optimal

Correction:

No, there is generally a strict loss for non-Gaussian sources

The equality $R_{WZ} = R_{XY}$ is specific to jointly Gaussian sources with squared-error distortion. For binary sources with Hamming distortion, the gap can be significant.

Why This Matters: Wyner-Ziv in Distributed Sensor Networks

In a sensor network, multiple sensors observe correlated measurements and must compress them for transmission to a fusion center. Wyner-Ziv coding allows each sensor to compress its data without knowing what the other sensors have observed — the fusion center uses all received data as side information when decoding each sensor's message. This is the foundation of distributed source coding, covered in Chapter 7. See also Book telecom, Ch. 11 for the information-theoretic perspective on multi-terminal systems.

Key Takeaway

The Wyner-Ziv theorem characterizes lossy compression when the decoder has correlated side information. The rate is $R_{WZ}(D) = \min_U [I(X;U) - I(U;Y)]$ , achieved by binning. For Gaussian sources with squared-error distortion, side information at the encoder is worthless ( $R_{WZ} = R_{XY}$ ) — a remarkable result with no analogue for general sources. This is the lossy analogue of Slepian-Wolf coding for lossless compression.

Rate-Distortion with Side Information (Wyner-Ziv)

Side Information at the Decoder

Definition: Wyner-Ziv Rate-Distortion

Theorem: Wyner-Ziv Theorem

Achievability sketch

Converse

Gaussian optimality

Example: Wyner-Ziv for Gaussian Source and Side Information

Without side information

With side information (Wyner-Ziv)

Savings

Quick Check

Why This Matters: Wyner-Ziv in Distributed Sensor Networks

Key Takeaway

Definition:
Wyner-Ziv Rate-Distortion