Source Coding with a Helper

When Side Information is Rate-Limited

In the Wyner-Ziv problem (Chapter 6), the decoder has full access to side information YnY^n. In the Slepian-Wolf problem (Chapter 7), both sources are encoded at positive rates. But what about the intermediate case: one encoder compresses XX, while a helper who observes correlated YY can send a rate-limited description to the decoder?

This is the source coding with a helper problem. The helper cannot send YnY^n in full β€” it has a rate budget RYR_Y β€” but even a partial description of YY can help the decoder reconstruct XX. The natural question is: what rate pair (RX,RY)(R_X, R_Y) is needed to achieve distortion DD?

This problem connects to Wyner's common information, a measure of the shared structure between XX and YY that is more nuanced than mutual information.

Distributed Video Coding Pipeline

A step-by-step walkthrough of the distributed video coding pipeline: low-complexity encoder (DCT + quantization + syndrome computation), side information generation at the decoder, and belief propagation decoding with side information.

Definition:

Source Coding with a Helper

Let (X,Y)∼PXY(X, Y) \sim P_{XY} with distortion measure d:XΓ—X^β†’[0,∞)d : \mathcal{X} \times \hat{\mathcal{X}} \to [0, \infty).

  • Main encoder: observes XnX^n, sends index fX(Xn)∈[1:2nRX]f_X(X^n) \in [1 : 2^{nR_X}]
  • Helper encoder: observes YnY^n, sends index fY(Yn)∈[1:2nRY]f_Y(Y^n) \in [1 : 2^{nR_Y}]
  • Decoder: produces X^n=g(fX(Xn),fY(Yn))\hat{X}^n = g(f_X(X^n), f_Y(Y^n))

The rate-distortion region is the closure of rate-distortion triples (RX,RY,D)(R_X, R_Y, D) for which E[d(Xn,X^n)]≀D+Ο΅\mathbb{E}[d(X^n, \hat{X}^n)] \leq D + \epsilon for all Ο΅>0\epsilon > 0 and sufficiently large nn.

When RY=0R_Y = 0, this reduces to standard rate-distortion. When RYβ‰₯H(Y)R_Y \geq H(Y), the decoder has full side information and we recover the Wyner-Ziv problem. The interesting regime is 0<RY<H(Y)0 < R_Y < H(Y).

Definition:

Wyner's Common Information

The Wyner common information between XX and YY is defined as:

CW(X;Y)=min⁑W:Xβˆ’Wβˆ’YI(X,Y;W)C_W(X; Y) = \min_{W : X - W - Y} I(X, Y; W)

where the minimum is over all random variables WW such that XX and YY are conditionally independent given WW: PXY∣W(x,y∣w)=PX∣W(x∣w)PY∣W(y∣w)P_{XY|W}(x, y | w) = P_{X|W}(x|w) P_{Y|W}(y|w).

Equivalently, CW(X;Y)=min⁑W:XβŠ₯Y∣W[H(W)]C_W(X; Y) = \min_{W : X \perp Y | W} [H(W)] where WW has the smallest entropy among all common causes that render XX and YY independent.

Wyner's common information satisfies I(X;Y)≀CW(X;Y)≀min⁑(H(X),H(Y))I(X; Y) \leq C_W(X; Y) \leq \min(H(X), H(Y)). It equals I(X;Y)I(X; Y) when (X,Y)(X, Y) is jointly Gaussian, and equals min⁑(H(X),H(Y))\min(H(X), H(Y)) when one variable is a deterministic function of the other. Unlike mutual information, common information captures the "shared randomness" that cannot be split β€” it measures the cost of generating (X,Y)(X, Y) from independent components via a common random variable.

Wyner's common information

A measure of shared structure between two random variables, defined as the minimum rate needed to generate (X,Y)(X, Y) from conditionally independent components. It is at least as large as mutual information and captures aspects of dependence that mutual information misses.

Related: Mutual Information, Common randomness, Source Coding with a Helper

Theorem: Rate Region for Source Coding with a Helper (Lossless Case)

For lossless reconstruction of XX (D=0D = 0, dd is Hamming distance), the minimum rate for the main encoder as a function of the helper rate RYR_Y is:

RX(RY)=min⁑W:Xβˆ’Yβˆ’W, I(Y;W)≀RYH(X∣W)R_X(R_Y) = \min_{W: X - Y - W, \, I(Y; W) \leq R_Y} H(X | W)

In particular:

  • At RY=0R_Y = 0: RX=H(X)R_X = H(X) (no help)
  • At RYβ‰₯H(Y∣X)R_Y \geq H(Y|X): RX=H(X∣Y)R_X = H(X|Y) (full Slepian-Wolf benefit)

The transition between these extremes depends on the joint distribution PXYP_{XY}.

The helper sends a compressed version WW of YY at rate RYR_Y. The decoder uses WW as side information to decode XX, needing rate H(X∣W)H(X|W). The question is: which WW minimizes H(X∣W)H(X|W) subject to the rate constraint I(Y;W)≀RYI(Y; W) \leq R_Y?

Intuitively, the helper should focus on sending the parts of YY that are most informative about XX. This is precisely the optimization that Wyner's common information addresses: the helper extracts the "common" component between XX and YY and sends it to the decoder.

,

Example: Helper for the Doubly Symmetric Binary Source

Let X∼Bernoulli(1/2)X \sim \text{Bernoulli}(1/2) and Y=XβŠ•ZY = X \oplus Z where Z∼Bernoulli(p)Z \sim \text{Bernoulli}(p), p=0.1p = 0.1. The helper observes YY and can send at rate RYR_Y. What is the minimum rate RXR_X for lossless reconstruction of XX as a function of RYR_Y?

Historical Note: Two Notions of Common Information

1975

There are actually two distinct notions of "common information" in information theory, both introduced in 1975. Wyner's common information (discussed here) is the minimum rate to generate the joint distribution from conditionally independent components. Gacs and Korner's common information is the maximum rate of "common randomness" that can be extracted from the joint distribution.

These two quantities can be very different: for the DSBS, Wyner's common information is 1βˆ’H2(p)1 - \mathcal{H}_2(p) (close to 1 for small pp), while Gacs-Korner's common information is 0 (there is no nontrivial common part). The gap between them reflects different operational settings β€” generation vs. extraction of shared randomness.

Quick Check

In source coding with a helper, what happens when the helper's rate RYR_Y exceeds H(Y∣X)H(Y|X)?

Further increasing RYR_Y continues to reduce RXR_X

The minimum RXR_X saturates at H(X∣Y)H(X|Y)

The system becomes equivalent to joint encoding

Common Mistake: Confusing Common Information with Mutual Information

Mistake:

Treating Wyner's common information CW(X;Y)C_W(X; Y) as if it equals the mutual information I(X;Y)I(X; Y). While CWβ‰₯IC_W \geq I always holds, they can differ significantly β€” especially for discrete sources.

Correction:

Wyner's common information measures the minimum rate to make XX and YY conditionally independent, while mutual information measures the total statistical dependence. For jointly Gaussian variables, they coincide. For discrete variables, CWC_W can be much larger than II.

Key Takeaway

Source coding with a helper bridges the gap between standard compression (RY=0R_Y = 0) and Slepian-Wolf coding (RYβ‰₯H(Y∣X)R_Y \geq H(Y|X)). The optimal helper strategy extracts the "common information" between XX and YY, as formalized by Wyner's common information. The rate savings from even a small helper rate can be substantial when the sources are strongly correlated.