Ferkans — Interactive Telecom Tutor

From Lossless to Lossy Distributed Coding

The Slepian-Wolf theorem solves the distributed lossless source coding problem completely. But what if we allow distortion? Two encoders observe correlated sources $X^n$ and $Y^n$ , compress them separately, and a joint decoder must reconstruct both within prescribed distortion levels. This is the distributed lossy source coding problem, and it is fundamentally harder than Slepian-Wolf.

Unlike the lossless case, the complete rate-distortion region for distributed lossy coding is unknown in general. The best known achievable region is the Berger-Tung inner bound, which combines random binning with rate-distortion codebooks. We study this bound, its tightness conditions, and the important special case of the CEO problem.

CEO Problem: Sum Rate vs. Distortion

The CEO problem animated: as the number of agents

K

increases, the minimum distortion improves but at diminishing returns per agent. Each curve shows the sum-rate--distortion tradeoff for a different number of agents.

Definition:
Distributed Lossy Source Coding

Let $(X, Y) \sim P_{XY}$ with distortion measures $d_X : \mathcal{X} \times \hat{\mathcal{X}} \to [0, \infty)$ and $d_Y : \mathcal{Y} \times \hat{\mathcal{Y}} \to [0, \infty)$ .

A $(2^{nR_X}, 2^{nR_Y}, n)$ distributed lossy source code consists of:

Encoder 1: $f_1 : \mathcal{X}^n \to [1 : 2^{nR_X}]$
Encoder 2: $f_2 : \mathcal{Y}^n \to [1 : 2^{nR_Y}]$
Decoder: $g : [1 : 2^{nR_X}] \times [1 : 2^{nR_Y}] \to \hat{\mathcal{X}}^n \times \hat{\mathcal{Y}}^n$

A rate-distortion tuple $(R_X, R_Y, D_X, D_Y)$ is achievable if for every $\epsilon > 0$ and sufficiently large $n$ , there exists a code with: $\mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n d_X(X_i, \hat{X}_i)\right] \leq D_X + \epsilon, \quad \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n d_Y(Y_i, \hat{Y}_i)\right] \leq D_Y + \epsilon$

Definition:
The Berger-Tung Inner Bound

The Berger-Tung achievable rate region for distortion pair $(D_X, D_Y)$ is the set of rate pairs $(R_X, R_Y)$ satisfying:

$R_X \geq I(X; U | V)$ $R_Y \geq I(Y; V | U)$ $R_X + R_Y \geq I(X, Y; U, V)$

for some conditional distributions $P_{U|X}$ and $P_{V|Y}$ (with $U - X - Y - V$ forming a Markov chain) and reconstruction functions $\hat{x}(U, V)$ , $\hat{y}(U, V)$ such that:

$\mathbb{E}[d_X(X, \hat{x}(U, V))] \leq D_X, \quad \mathbb{E}[d_Y(Y, \hat{y}(U, V))] \leq D_Y$

The auxiliary random variables $U$ and $V$ play the role of compressed descriptions of $X$ and $Y$ , respectively.

The Berger-Tung bound combines two ideas: rate-distortion codebooks (to introduce the auxiliary variables $U, V$ ) and Slepian-Wolf binning (to exploit the correlation between $U^n$ and $V^n$ at the decoder). The bound is known to be tight in several important special cases, but whether it is tight in general remains an open problem.

,

Berger-Tung inner bound

The best known achievable rate-distortion region for distributed lossy source coding. It combines rate-distortion codebooks with random binning. The bound is tight for the quadratic Gaussian CEO problem and for certain symmetric sources.

Theorem: Achievability of the Berger-Tung Bound

The rate-distortion tuples in the Berger-Tung region are achievable. Specifically, for any $(R_X, R_Y)$ in the interior of the Berger-Tung region for distortion $(D_X, D_Y)$ , there exists a sequence of distributed lossy source codes achieving these rates and distortions.

The proof combines two techniques we have already seen:

Covering: Each encoder generates a rate-distortion codebook of auxiliary sequences. Encoder 1 finds a $U^n$ in its codebook that is jointly typical with $X^n$ (this is the covering lemma at work). Similarly, Encoder 2 finds a $V^n$ jointly typical with $Y^n$ .
Binning: The auxiliary sequences $U^n$ and $V^n$ are correlated (through the Markov chain $U - X - Y - V$ ). Instead of sending the full codebook indices, each encoder sends only a bin index — the decoder uses joint typicality of $(U^n, V^n)$ to resolve the bin ambiguity, just as in Slepian-Wolf.

The point is that we get the rate savings of both lossy compression (through the auxiliary codebooks) and distributed coding (through binning).

Proof

Codebook generation

Fix $P_{U|X}$ , $P_{V|Y}$ and $\epsilon > 0$ .

Encoder 1's codebook: Generate $2^{n\tilde{R}_X}$ sequences $U^n(j)$ , $j \in [1 : 2^{n\tilde{R}_X}]$ , i.i.d. $\sim \prod_{i=1}^n P_U(u_i)$ . Partition into $2^{nR_X}$ bins of size $2^{n(\tilde{R}_X - R_X)}$ each.

Encoder 2's codebook: Similarly, generate $2^{n\tilde{R}_Y}$ sequences $V^n(k)$ i.i.d. $\sim \prod_{i=1}^n P_V(v_i)$ , partitioned into $2^{nR_Y}$ bins.

Encoding

Encoder 1: Given $X^n$ , find $U^n(j)$ such that $(X^n, U^n(j)) \in \mathcal{T}_\epsilon^{(n)}(X, U)$ . By the covering lemma, this succeeds with high probability if $\tilde{R}_X > I(X; U) + \delta(\epsilon)$ . Send the bin index of $U^n(j)$ .

Encoder 2: Similarly, find $V^n(k)$ jointly typical with $Y^n$ (requiring $\tilde{R}_Y > I(Y; V) + \delta(\epsilon)$ ) and send its bin index.

Decoding and error analysis

The decoder receives bin indices $(b_X, b_Y)$ and looks for a unique pair $(U^n(\hat{j}), V^n(\hat{k}))$ in the correct bins that is jointly typical. By the Slepian-Wolf argument (applied to the correlated auxiliary sequences), this succeeds if:

$\tilde{R}_X - R_X < H(U|V) - \delta(\epsilon)$ $\tilde{R}_Y - R_Y < H(V|U) - \delta(\epsilon)$ $(\tilde{R}_X - R_X) + (\tilde{R}_Y - R_Y) < H(U, V) - \delta(\epsilon)$

Combining the covering and binning constraints and eliminating $\tilde{R}_X, \tilde{R}_Y$ yields the Berger-Tung rate constraints.

Given the decoded $(U^n, V^n)$ , the reconstructions are $\hat{X}_i = \hat{x}(U_i, V_i)$ and $\hat{Y}_i = \hat{y}(U_i, V_i)$ . By joint typicality, the expected distortions satisfy the constraints.

Definition:
The CEO Problem

The CEO problem (or indirect distributed source coding) models a scenario where $K$ agents observe noisy versions of a single source $X$ :

$Y_k = X + N_k, \quad k = 1, \ldots, K$

where $N_1, \ldots, N_K$ are independent noise terms. The agents separately encode their observations and send them to a "CEO" (central decoder) who must reconstruct $X$ within distortion $D$ .

The CEO must determine: what is the minimum total rate $R_{\text{sum}} = R_1 + \cdots + R_K$ needed to achieve distortion $D$ ?

This is a special case of the distributed lossy source coding problem where:

Only one underlying source $X$ needs to be reconstructed
The observations $Y_k$ are conditionally independent given $X$
The distortion constraint is on $\mathbb{E}[d(X, \hat{X})]$

CEO problem

A distributed source coding problem where multiple agents observe noisy versions of a single underlying source and must compress their observations for a central decoder to reconstruct the original source. Named by Berger, Zhang, and Viswanathan (1996) as an analogy to a corporate CEO receiving reports from multiple department heads.

Theorem: Quadratic Gaussian CEO Problem

Let $X \sim \mathcal{N}(0, \sigma_X^2)$ and $Y_k = X + N_k$ where $N_k \sim \mathcal{N}(0, \sigma_N^2)$ are i.i.d., independent of $X$ . Under squared-error distortion, the minimum sum rate to achieve distortion $D$ is:

$R_{\text{sum}}(D) = \begin{cases} \frac{K}{2}\log\frac{\sigma_N^{-2} + \sigma_X^{-2}}{D^{-1} - \sigma_X^{-2} + K\sigma_N^{-2} - (K-1)(D^{-1} - \sigma_X^{-2})} & \text{if } D_{\min} \leq D \leq D_{\max} \\ 0 & \text{if } D > D_{\max} \end{cases}$

where $D_{\min} = \frac{1}{\sigma_X^{-2} + K\sigma_N^{-2}}$ is the MMSE with infinite rate, and $D_{\max} = \sigma_X^2$ is the distortion with zero rate.

The Berger-Tung bound is tight for this case, and the sum-rate has a water-filling interpretation over the "information dimensions" of the observations. As $K \to \infty$ , the minimum distortion approaches $D_{\min} = \frac{1}{\sigma_X^{-2} + K\sigma_N^{-2}} \to 0$ , but the rate per agent remains bounded — each agent contributes diminishing but nonzero marginal information about $X$ .

Proof

Achievability via Berger-Tung

Each agent uses a Gaussian test channel: $U_k = Y_k + W_k$ where $W_k \sim \mathcal{N}(0, \sigma_W^2)$ is independent. The variance $\sigma_W^2$ is chosen to meet the distortion constraint. By the Berger-Tung inner bound with symmetric agents:

$R_k = I(Y_k; U_k) - I(U_k; U_{-k}) / (K-1)$

where $U_{-k}$ denotes all auxiliary variables except $U_k$ . The joint Gaussianity makes all mutual information terms computable in closed form.

Converse

The converse uses the conditional entropy power inequality (EPI) applied to the agents' encodings. The key insight is that for Gaussian sources, the EPI provides matching lower bounds on the sum-rate. This was proved independently by Viswanathan and Berger (1997) and Oohama (1998).

,

Example: Two-Agent Gaussian CEO Problem

Consider the CEO problem with $K = 2$ agents, $X \sim \mathcal{N}(0, 1)$ , and $Y_k = X + N_k$ where $N_k \sim \mathcal{N}(0, 1)$ for $k = 1, 2$ . Find the minimum sum rate $R_1 + R_2$ to achieve distortion $D = 0.4$ under squared-error distortion.

Solution

Compute parameters

We have $\sigma_X^2 = 1$ , $\sigma_N^2 = 1$ , $K = 2$ .

$D_{\min} = \frac{1}{\sigma_X^{-2} + K\sigma_N^{-2}} = \frac{1}{1 + 2} = \frac{1}{3} \approx 0.333$ $D_{\max} = \sigma_X^2 = 1$

Since $D = 0.4 \in [D_{\min}, D_{\max}]$ , the sum rate is positive.

Apply the sum-rate formula

$R_{\text{sum}} = \frac{2}{2}\log\frac{1 + 1}{1/0.4 - 1 + 2 - (2-1)(1/0.4 - 1)}KATEXPLACEHOLDER0END= \log\frac{2}{2.5 - 1 + 2 - 1.5} = \log\frac{2}{2} = 0 \text{ bits}KATEXPLACEHOLDER1ENDR_{\text{sum}} = \frac{2}{2}\log\frac{\sigma_N^{-2} + \sigma_X^{-2}}{\frac{K}{D^{-1} - \sigma_X^{-2} + K\sigma_N^{-2}} \cdot \frac{1}{1 - \frac{(K-1)(D^{-1} - \sigma_X^{-2})}{D^{-1} - \sigma_X^{-2} + K\sigma_N^{-2}}}}KATEXPLACEHOLDER2ENDR_{\text{sum}} = \frac{K}{2}\log\frac{\sigma_N^{-2} + \sigma_X^{-2}}{\sigma_N^{-2} + \sigma_X^{-2} - \alpha} = \log\frac{2}{2 - 1.5} = \log 4 = 2 \text{ bits per sample}$ $

So each agent needs to send about 1 bit per sample to achieve distortion 0.4.

CEO Problem: Sum Rate vs. Distortion

Explore how the minimum sum rate varies with the target distortion and the number of agents in the Gaussian CEO problem.

Parameters

Number of agents K3

Number of distributed encoders

Source std dev sigma_X1

Standard deviation of the source X

Noise std dev sigma_N1

Standard deviation of the observation noise

Distributed vs. Joint Source Coding

Property	Joint Encoding	Distributed (Slepian-Wolf / Berger-Tung)
Encoder cooperation	Full: encoders can communicate	None: encoders operate independently
Lossless rate region	$R_X + R_Y \geq H(X,Y)$	Same! (Slepian-Wolf theorem)
Lossy rate region	Known exactly (joint rate-distortion)	Berger-Tung inner bound (tight in some cases)
Proof technique	Single-user rate-distortion	Random binning + covering lemma
Practical codes	Standard compression (JPEG, H.264)	LDPC syndromes, distributed video coding
Key open problem	None (fully solved)	Is Berger-Tung tight in general?

Common Mistake: The Berger-Tung Bound is Not Always Tight

Mistake:

Assuming the Berger-Tung inner bound gives the exact rate-distortion region for all distributed lossy source coding problems.

Correction:

The Berger-Tung bound is an inner bound (achievable region) but it is not known to be tight in general. It is tight for specific cases: the quadratic Gaussian CEO problem, the Gaussian two-terminal source coding problem with individual distortion constraints, and certain symmetric sources. The general tightness remains one of the important open problems in network information theory.

⚠️Engineering Note

CEO Problem in Wireless Sensor Networks

The CEO problem directly models wireless sensor networks: multiple sensors observe noisy versions of a physical quantity and transmit compressed observations to a fusion center. The key engineering insight is that the sum rate scales as $O(\log K)$ while the distortion decreases as $O(1/K)$ , so adding more sensors improves quality at a logarithmic rate cost.

In practice, the agents often communicate over noisy channels (not noiseless rate-limited links), leading to the joint source-channel CEO problem. The separation principle does not apply in this multiuser setting, making the problem significantly harder. See Chapter 8 for further discussion.

Key Takeaway

The Berger-Tung inner bound generalizes Slepian-Wolf to the lossy setting by combining rate-distortion codebooks with random binning. Each encoder uses a codebook to create a compressed description (auxiliary variable), then uses binning to exploit the correlation at the decoder. The bound is tight for the Gaussian CEO problem, where the sum rate has a clean closed-form expression, but its general tightness remains open.

The Berger-Tung Inner Bound

From Lossless to Lossy Distributed Coding

CEO Problem: Sum Rate vs. Distortion

Definition: Distributed Lossy Source Coding

Definition: The Berger-Tung Inner Bound

Berger-Tung inner bound

Theorem: Achievability of the Berger-Tung Bound

Codebook generation

Encoding

Decoding and error analysis

Definition: The CEO Problem

CEO problem

Theorem: Quadratic Gaussian CEO Problem

Achievability via Berger-Tung

Converse

Example: Two-Agent Gaussian CEO Problem

Compute parameters

Apply the sum-rate formula

CEO Problem: Sum Rate vs. Distortion

Parameters

Distributed vs. Joint Source Coding

Common Mistake: The Berger-Tung Bound is Not Always Tight

CEO Problem in Wireless Sensor Networks

Key Takeaway

Definition:
Distributed Lossy Source Coding

Definition:
The Berger-Tung Inner Bound

Definition:
The CEO Problem