Ferkans — Interactive Telecom Tutor

The Key Idea Behind Superposition Coding

The central question for the degraded BC is: how should the transmitter encode two independent messages — one for the strong user and one for the weak user — into a single transmitted sequence?

The insight is beautifully simple. Think of two "layers" of information:

Coarse layer (cloud center): Encode the weak user's message $W_2$ into a codeword $U^n$ from a codebook that the weak user can decode despite its noisy channel.
Fine layer (satellite): For each cloud center $U^n$ , generate a sub-codebook of "satellite" codewords. Encode the strong user's message $W_1$ by choosing the appropriate satellite within the cloud.

The weak user (receiver 2) decodes only the cloud center — it treats the satellite detail as additional noise. The strong user (receiver 1) first decodes the cloud center (which it can do because it has a better channel than the weak user), then decodes the satellite within that cloud. This is successive decoding applied to the BC.

The point is that the strong user "peels off" the layers in order of decreasing coarseness, while the weak user decodes only the coarsest layer. This layered structure is what we call superposition coding.

Definition:
Superposition Code

A superposition code for the degraded DM-BC $X \to Y_1 \to Y_2$ with an auxiliary random variable $U$ is constructed as follows:

Codebook generation. Fix a distribution $p(u) p(x|u)$ .

Generate $2^{nR_{2}}$ independent codewords $u^n(w_2) \sim \prod_{i=1}^n p(u_i)$ , one for each weak-user message $w_2 \in \{1, \ldots, 2^{nR_{2}}\}$ . These are the cloud centers.
For each cloud center $u^n(w_2)$ , generate $2^{nR_{1}}$ conditionally independent codewords $x^n(w_1, w_2) \sim \prod_{i=1}^n p(x_i | u_i(w_2))$ , one for each strong-user message $w_1 \in \{1, \ldots, 2^{nR_{1}}\}$ . These are the satellites.

Encoding. To send $(w_1, w_2)$ , transmit $x^n(w_1, w_2)$ .

Decoding (weak user). Receiver 2 finds the unique $\hat{w}_2$ such that $(u^n(\hat{w}_2), y_2^n)$ is jointly typical.

Decoding (strong user). Receiver 1 finds the unique $(\hat{w}_1, \hat{w}_2)$ such that $(u^n(\hat{w}_2), x^n(\hat{w}_1, \hat{w}_2), y_1^n)$ is jointly typical. Alternatively, it can decode in two stages: first find $\hat{w}_2$ from $(u^n, y_1^n)$ , then find $\hat{w}_1$ from $(x^n(\cdot, \hat{w}_2), y_1^n)$ .

The auxiliary random variable $U$ controls the "resolution" of the cloud centers. Choosing $U = X$ collapses the code to a single layer (all power to the weak user). Choosing $U = \text{const}$ means no cloud structure (all power to the strong user). The optimal $U$ splits resources between the two users.

Definition:
Auxiliary Random Variable in the BC

The auxiliary random variable $U$ in the degraded BC characterization satisfies:

$U \to X \to (Y_1, Y_2)$ forms a Markov chain, meaning $U$ is determined before the channel is applied.
The joint distribution factors as $p(u) p(x|u) p(y_1, y_2|x)$ .
$U$ represents the "common" information visible to both receivers.
The conditional distribution $p(x|u)$ encodes the "private" information for the strong user.

The choice of $p(u, x)$ controls the tradeoff between $R_{1}$ and $R_{2}$ : more "structure" in $U$ (i.e., $I(U; X)$ close to $I(X; Y_2)$ ) favors the weak user; less structure favors the strong user.

Theorem: Capacity Region of the Degraded Broadcast Channel

The capacity region of the degraded DM-BC $p(y_1, y_2 | x)$ with $X \to Y_1 \to Y_2$ is the closure of the set of rate pairs $(R_{1}, R_{2})$ satisfying

$R_{2} \leq I(U; Y_2),$ $R_{1} \leq I(X; Y_1 | U),$

for some distribution $p(u) p(x | u)$ with $|\mathcal{U}| \leq \min\{|\mathcal{X}|, |\mathcal{Y}_1|, |\mathcal{Y}_2|\} + 1$ .

The weak user decodes only $U$ (the cloud center), seeing a point-to-point channel from $U$ to $Y_2$ — hence the bound $R_{2} \leq I(U; Y_2)$ . The strong user, having decoded $U$ , sees an effective channel from $X$ to $Y_1$ with $U$ as side information — hence $R_{1} \leq I(X; Y_1 | U)$ . The full region is swept by varying the distribution $p(u, x)$ .

Proof

Achievability — Superposition coding

Fix $p(u) p(x|u)$ and construct the superposition code as in the definition above. We analyze the error probability.

Error event for the weak user: Receiver 2 makes an error if $(u^n(w_2), y_2^n) \notin \mathcal{T}_\epsilon^{(n)}$ (the true cloud center is not typical with the output) or if some other cloud center is jointly typical. By the packing lemma (Chapter 3), the probability of the second event vanishes if $R_{2} < I(U; Y_2) - \delta(\epsilon)$ .

Achievability — Strong user decoding

Error event for the strong user: Receiver 1 uses joint typicality to find $(\hat{w}_1, \hat{w}_2)$ . By the union bound and the packing lemma applied twice:

The probability that a wrong cloud center passes the typicality test vanishes if $R_{2} < I(U; Y_1) - \delta(\epsilon)$ (which is implied by $R_{2} < I(U; Y_2)$ because degradedness gives $I(U; Y_1) \geq I(U; Y_2)$ by the data processing inequality).
Conditioning on the correct cloud center, the probability that a wrong satellite passes the test vanishes if $R_{1} < I(X; Y_1 | U) - \delta(\epsilon)$ .

Therefore, both error probabilities tend to zero for all $(R_{1}, R_{2})$ in the interior of the stated region.

Converse — Fano's inequality

Suppose a sequence of $(2^{nR_{1}}, 2^{nR_{2}}, n)$ codes achieves $P_e^{(n)} \to 0$ . By Fano's inequality:

$nR_{2} \leq I(W_2; Y_2^n) + n\epsilon_n,$ $nR_{1} \leq I(W_1; Y_1^n | W_2) + n\epsilon_n,$

where $\epsilon_n \to 0$ .

Converse — Single-letterization

For the weak user's rate, by the chain rule:

$I(W_2; Y_2^n) = \sum_{i=1}^n I(W_2; Y_{2,i} | Y_2^{i-1}) \leq \sum_{i=1}^n I(W_2, Y_2^{i-1}; Y_{2,i}).$

Define $U_i = (W_2, Y_2^{i-1})$ . Then $I(W_2; Y_2^n) \leq \sum_{i=1}^n I(U_i; Y_{2,i})$ .

For the strong user's rate:

$I(W_1; Y_1^n | W_2) = \sum_{i=1}^n I(W_1; Y_{1,i} | W_2, Y_1^{i-1}).$

Since $X_i$ is a function of $(W_1, W_2)$ and the channel is memoryless, $I(W_1; Y_{1,i} | W_2, Y_1^{i-1}) \leq I(X_i; Y_{1,i} | U_i)$ .

Introducing a time-sharing variable $Q$ uniform on $\{1,\ldots,n\}$ , independent of everything, and defining $U = (U_Q, Q)$ , $X = X_Q$ , $Y_1 = Y_{1,Q}$ , $Y_2 = Y_{2,Q}$ :

$R_{2} \leq I(U; Y_2) + \epsilon_n, \qquad R_{1} \leq I(X; Y_1 | U) + \epsilon_n.$

The Markov chain $U \to X \to Y_1 \to Y_2$ holds by construction.

Cardinality bound

The bound $|\mathcal{U}| \leq |\mathcal{X}| + 1$ follows from the support lemma (Carathéodory-type argument): we need to preserve the marginal $p(x)$ (which requires $|\mathcal{X}| - 1$ constraints) plus the two mutual information values $I(U; Y_2)$ and $I(X; Y_1 | U)$ , giving $|\mathcal{X}| + 1$ constraints total.

,

The Proof Pattern: Achievability by Superposition, Converse by Fano

Notice the same proof architecture we have been seeing throughout the book: achievability uses a structured random codebook (here, superposition instead of i.i.d.) with typicality decoding, and the converse uses Fano's inequality followed by single-letterization.

The key difference from the point-to-point setting is that the converse introduces an auxiliary random variable $U_i = (W_2, Y_2^{i-1})$ to capture the "state" accumulated by the weak user. This is the same technique as in Gel'fand–Pinsker coding (Chapter 12), and it will reappear in every multiuser converse in the book.

The degradedness condition is used in the achievability to ensure that the strong user can decode the weak user's cloud center (because $I(U; Y_1) \geq I(U; Y_2)$ ). Without degradedness, this guarantee fails, and we need more sophisticated techniques (Marton coding, Chapter 16).

Example: Capacity Region of the Binary Symmetric Broadcast Channel

Consider the degraded BSC broadcast channel from the earlier example: $Y_1 = X \oplus Z_1$ with $Z_1 \sim \text{Bernoulli}(p_1)$ , and $Y_2 = Y_1 \oplus Z_3$ with $Z_3 \sim \text{Bernoulli}(p_3)$ , so user 2 sees a BSC $(p_2)$ where $p_2 = p_1 * p_3$ . Find the capacity region.

Solution

Choose the auxiliary variable

For the binary case, the optimal choice is $U \sim \text{Bernoulli}(1/2)$ and $X = U \oplus V$ where $V \sim \text{Bernoulli}(\beta)$ is independent of $U$ . The parameter $\beta \in [0, 1/2]$ controls the tradeoff between the two users.

Compute the weak user's rate

The channel from $U$ to $Y_2$ is a BSC with crossover probability $\beta * p_2$ . Therefore:

$R_{2} \leq I(U; Y_2) = 1 - h_b(\beta * p_2),$

where $h_b$ is the binary entropy function and $*$ denotes binary convolution.

Compute the strong user's rate

The conditional mutual information is:

$R_{1} \leq I(X; Y_1 | U) = H(Y_1 | U) - H(Y_1 | X).$

Given $U$ , the channel from $X = U \oplus V$ to $Y_1 = X \oplus Z_1$ produces $Y_1 = U \oplus V \oplus Z_1$ . So $Y_1 | U$ has the distribution of $V \oplus Z_1 \sim \text{Bernoulli}(\beta * p_1)$ :

$R_{1} \leq h_b(\beta * p_1) - h_b(p_1).$

The capacity region

The capacity region is the set of $(R_{1}, R_{2})$ satisfying:

$R_{2} \leq 1 - h_b(\beta * p_2), \qquad R_{1} \leq h_b(\beta * p_1) - h_b(p_1),$

for some $\beta \in [0, 1/2]$ . Setting $\beta = 0$ gives $(R_{1}, R_{2}) = (0, 1 - h_b(p_2))$ , allocating everything to the weak user. Setting $\beta = 1/2$ gives $(R_{1}, R_{2}) = (1 - h_b(p_1), 0)$ , allocating everything to the strong user. Intermediate $\beta$ traces the boundary of the capacity region.

Quick Check

In superposition coding for the degraded BC, the strong user (receiver 1) decodes the weak user's cloud center first, then decodes its own satellite. Why can receiver 1 reliably decode the cloud center?

Because $I(U; Y_1) \geq I(U; Y_2)$ by the data processing inequality

Because receiver 1 has access to side information about $U$

Because the cloud center is encoded at a higher rate

Because the encoder uses a different codebook for receiver 1

Correction:

Because

I(U; Y_1) \geq I(U; Y_2)

by the data processing inequality

Since $X \to Y_1 \to Y_2$ is Markov, $U \to Y_1 \to Y_2$ is also Markov (because $U \to X \to Y_1 \to Y_2$ ). The data processing inequality gives $I(U; Y_1) \geq I(U; Y_2)$ . Since the weak user can decode the cloud at rate $R_{2} \leq I(U; Y_2)$ , the strong user can certainly decode it too.

Common Mistake: Confusing Which User's Message Is the Cloud

Mistake:

Encoding the strong user's message as the cloud center and the weak user's message as the satellite.

Correction:

The cloud center always carries the weak user's message. This is because the cloud center must be decodable by both receivers, and the weak user is the bottleneck. The satellite detail is decoded only by the strong user, who first peels off the cloud.

If you reversed the assignment, the weak user would need to decode the satellite detail — which it cannot do because it has a worse channel. The whole point of superposition coding is that information is layered from coarse (for the weak) to fine (for the strong).

Common Mistake: The Sum Rate Is Not Simply $I(X; Y_1)$

Mistake:

Claiming that the sum rate $R_{1} + R_{2}$ of the degraded BC equals $I(X; Y_1)$ (the capacity of the strong user's channel).

Correction:

The sum rate satisfies $R_{1} + R_{2} = I(X; Y_1|U) + I(U; Y_2)$ , not $I(X; Y_1)$ . In general, $I(X; Y_1|U) + I(U; Y_2) < I(X; Y_1)$ because $I(U; Y_2) < I(U; Y_1)$ (the weak user extracts less information from the cloud center than the strong user could).

The maximum sum rate is achieved at one of the corner points: either $(C_{1}, 0)$ or a point where $R_{2} > 0$ but $R_{1}$ is correspondingly reduced.

Superposition Coding vs. Time-Sharing

Compare the capacity region of the degraded BC achieved by superposition coding against the time-sharing (TDMA) straight line. Adjust channel parameters to see how much superposition coding gains.

Parameters

p_1

(strong user crossover prob.)0.05

p_3

(degrading channel crossover prob.)0.1

Capacity Region

The set of all achievable rate tuples $(R_{1}, \ldots, R_{K})$ for a multiuser channel. For the degraded BC, the capacity region is a two-dimensional set in the $(R_{1}, R_{2})$ plane, bounded by the curves $R_{2} = I(U; Y_2)$ and $R_{1} = I(X; Y_1 | U)$ as $p(u, x)$ varies.

Related: Broadcast Channel (BC)

Key Takeaway

Superposition coding is the information-theoretic principle that layered encoding — sending a coarse common message visible to all and a fine private message visible only to the best user — achieves the capacity region of degraded broadcast channels. The strong user successively decodes all layers; the weak user decodes only the coarsest. This idea extends naturally to $K$ users and is the foundation of all broadcast channel coding strategies.

Superposition Coding: The Cloud-Satellite Structure

The cloud-satellite structure of superposition coding: cloud centers carry the weak user's message (decoded by both users), while satellites within each cloud carry the strong user's message (decoded only by the strong user after peeling off the cloud).

Superposition Coding and the Capacity Region

The Key Idea Behind Superposition Coding

Definition: Superposition Code

Definition: Auxiliary Random Variable in the BC

Theorem: Capacity Region of the Degraded Broadcast Channel

Achievability — Superposition coding

Achievability — Strong user decoding

Converse — Fano's inequality

Converse — Single-letterization

Cardinality bound

The Proof Pattern: Achievability by Superposition, Converse by Fano

Example: Capacity Region of the Binary Symmetric Broadcast Channel

Choose the auxiliary variable

Compute the weak user's rate

Compute the strong user's rate

The capacity region

Quick Check

Common Mistake: Confusing Which User's Message Is the Cloud

Common Mistake: The Sum Rate Is Not Simply I(X;Y1)I(X; Y_1)I(X;Y1​)

Superposition Coding vs. Time-Sharing

Parameters

Capacity Region

Time-Sharing

Key Takeaway

Superposition Coding: The Cloud-Satellite Structure

Definition:
Superposition Code

Definition:
Auxiliary Random Variable in the BC

Common Mistake: The Sum Rate Is Not Simply $I(X; Y_1)$