Superposition Coding and the Capacity Region

The Key Idea Behind Superposition Coding

The central question for the degraded BC is: how should the transmitter encode two independent messages β€” one for the strong user and one for the weak user β€” into a single transmitted sequence?

The insight is beautifully simple. Think of two "layers" of information:

  1. Coarse layer (cloud center): Encode the weak user's message W2W_2 into a codeword UnU^n from a codebook that the weak user can decode despite its noisy channel.

  2. Fine layer (satellite): For each cloud center UnU^n, generate a sub-codebook of "satellite" codewords. Encode the strong user's message W1W_1 by choosing the appropriate satellite within the cloud.

The weak user (receiver 2) decodes only the cloud center β€” it treats the satellite detail as additional noise. The strong user (receiver 1) first decodes the cloud center (which it can do because it has a better channel than the weak user), then decodes the satellite within that cloud. This is successive decoding applied to the BC.

The point is that the strong user "peels off" the layers in order of decreasing coarseness, while the weak user decodes only the coarsest layer. This layered structure is what we call superposition coding.

Definition:

Superposition Code

A superposition code for the degraded DM-BC X→Y1→Y2X \to Y_1 \to Y_2 with an auxiliary random variable UU is constructed as follows:

Codebook generation. Fix a distribution p(u)p(x∣u)p(u) p(x|u).

  1. Generate 2nR22^{nR_{2}} independent codewords un(w2)∼∏i=1np(ui)u^n(w_2) \sim \prod_{i=1}^n p(u_i), one for each weak-user message w2∈{1,…,2nR2}w_2 \in \{1, \ldots, 2^{nR_{2}}\}. These are the cloud centers.

  2. For each cloud center un(w2)u^n(w_2), generate 2nR12^{nR_{1}} conditionally independent codewords xn(w1,w2)∼∏i=1np(xi∣ui(w2))x^n(w_1, w_2) \sim \prod_{i=1}^n p(x_i | u_i(w_2)), one for each strong-user message w1∈{1,…,2nR1}w_1 \in \{1, \ldots, 2^{nR_{1}}\}. These are the satellites.

Encoding. To send (w1,w2)(w_1, w_2), transmit xn(w1,w2)x^n(w_1, w_2).

Decoding (weak user). Receiver 2 finds the unique w^2\hat{w}_2 such that (un(w^2),y2n)(u^n(\hat{w}_2), y_2^n) is jointly typical.

Decoding (strong user). Receiver 1 finds the unique (w^1,w^2)(\hat{w}_1, \hat{w}_2) such that (un(w^2),xn(w^1,w^2),y1n)(u^n(\hat{w}_2), x^n(\hat{w}_1, \hat{w}_2), y_1^n) is jointly typical. Alternatively, it can decode in two stages: first find w^2\hat{w}_2 from (un,y1n)(u^n, y_1^n), then find w^1\hat{w}_1 from (xn(β‹…,w^2),y1n)(x^n(\cdot, \hat{w}_2), y_1^n).

The auxiliary random variable UU controls the "resolution" of the cloud centers. Choosing U=XU = X collapses the code to a single layer (all power to the weak user). Choosing U=constU = \text{const} means no cloud structure (all power to the strong user). The optimal UU splits resources between the two users.

Definition:

Auxiliary Random Variable in the BC

The auxiliary random variable UU in the degraded BC characterization satisfies:

  • Uβ†’Xβ†’(Y1,Y2)U \to X \to (Y_1, Y_2) forms a Markov chain, meaning UU is determined before the channel is applied.
  • The joint distribution factors as p(u)p(x∣u)p(y1,y2∣x)p(u) p(x|u) p(y_1, y_2|x).
  • UU represents the "common" information visible to both receivers.
  • The conditional distribution p(x∣u)p(x|u) encodes the "private" information for the strong user.

The choice of p(u,x)p(u, x) controls the tradeoff between R1R_{1} and R2R_{2}: more "structure" in UU (i.e., I(U;X)I(U; X) close to I(X;Y2)I(X; Y_2)) favors the weak user; less structure favors the strong user.

Theorem: Capacity Region of the Degraded Broadcast Channel

The capacity region of the degraded DM-BC p(y1,y2∣x)p(y_1, y_2 | x) with Xβ†’Y1β†’Y2X \to Y_1 \to Y_2 is the closure of the set of rate pairs (R1,R2)(R_{1}, R_{2}) satisfying

R2≀I(U;Y2),R_{2} \leq I(U; Y_2), R1≀I(X;Y1∣U),R_{1} \leq I(X; Y_1 | U),

for some distribution p(u)p(x∣u)p(u) p(x | u) with ∣Uβˆ£β‰€min⁑{∣X∣,∣Y1∣,∣Y2∣}+1|\mathcal{U}| \leq \min\{|\mathcal{X}|, |\mathcal{Y}_1|, |\mathcal{Y}_2|\} + 1.

The weak user decodes only UU (the cloud center), seeing a point-to-point channel from UU to Y2Y_2 β€” hence the bound R2≀I(U;Y2)R_{2} \leq I(U; Y_2). The strong user, having decoded UU, sees an effective channel from XX to Y1Y_1 with UU as side information β€” hence R1≀I(X;Y1∣U)R_{1} \leq I(X; Y_1 | U). The full region is swept by varying the distribution p(u,x)p(u, x).

,

The Proof Pattern: Achievability by Superposition, Converse by Fano

Notice the same proof architecture we have been seeing throughout the book: achievability uses a structured random codebook (here, superposition instead of i.i.d.) with typicality decoding, and the converse uses Fano's inequality followed by single-letterization.

The key difference from the point-to-point setting is that the converse introduces an auxiliary random variable Ui=(W2,Y2iβˆ’1)U_i = (W_2, Y_2^{i-1}) to capture the "state" accumulated by the weak user. This is the same technique as in Gel'fand–Pinsker coding (Chapter 12), and it will reappear in every multiuser converse in the book.

The degradedness condition is used in the achievability to ensure that the strong user can decode the weak user's cloud center (because I(U;Y1)β‰₯I(U;Y2)I(U; Y_1) \geq I(U; Y_2)). Without degradedness, this guarantee fails, and we need more sophisticated techniques (Marton coding, Chapter 16).

Example: Capacity Region of the Binary Symmetric Broadcast Channel

Consider the degraded BSC broadcast channel from the earlier example: Y1=XβŠ•Z1Y_1 = X \oplus Z_1 with Z1∼Bernoulli(p1)Z_1 \sim \text{Bernoulli}(p_1), and Y2=Y1βŠ•Z3Y_2 = Y_1 \oplus Z_3 with Z3∼Bernoulli(p3)Z_3 \sim \text{Bernoulli}(p_3), so user 2 sees a BSC(p2)(p_2) where p2=p1βˆ—p3p_2 = p_1 * p_3. Find the capacity region.

Quick Check

In superposition coding for the degraded BC, the strong user (receiver 1) decodes the weak user's cloud center first, then decodes its own satellite. Why can receiver 1 reliably decode the cloud center?

Because I(U;Y1)β‰₯I(U;Y2)I(U; Y_1) \geq I(U; Y_2) by the data processing inequality

Because receiver 1 has access to side information about UU

Because the cloud center is encoded at a higher rate

Because the encoder uses a different codebook for receiver 1

Common Mistake: Confusing Which User's Message Is the Cloud

Mistake:

Encoding the strong user's message as the cloud center and the weak user's message as the satellite.

Correction:

The cloud center always carries the weak user's message. This is because the cloud center must be decodable by both receivers, and the weak user is the bottleneck. The satellite detail is decoded only by the strong user, who first peels off the cloud.

If you reversed the assignment, the weak user would need to decode the satellite detail β€” which it cannot do because it has a worse channel. The whole point of superposition coding is that information is layered from coarse (for the weak) to fine (for the strong).

Common Mistake: The Sum Rate Is Not Simply I(X;Y1)I(X; Y_1)

Mistake:

Claiming that the sum rate R1+R2R_{1} + R_{2} of the degraded BC equals I(X;Y1)I(X; Y_1) (the capacity of the strong user's channel).

Correction:

The sum rate satisfies R1+R2=I(X;Y1∣U)+I(U;Y2)R_{1} + R_{2} = I(X; Y_1|U) + I(U; Y_2), not I(X;Y1)I(X; Y_1). In general, I(X;Y1∣U)+I(U;Y2)<I(X;Y1)I(X; Y_1|U) + I(U; Y_2) < I(X; Y_1) because I(U;Y2)<I(U;Y1)I(U; Y_2) < I(U; Y_1) (the weak user extracts less information from the cloud center than the strong user could).

The maximum sum rate is achieved at one of the corner points: either (C1,0)(C_{1}, 0) or a point where R2>0R_{2} > 0 but R1R_{1} is correspondingly reduced.

Superposition Coding vs. Time-Sharing

Compare the capacity region of the degraded BC achieved by superposition coding against the time-sharing (TDMA) straight line. Adjust channel parameters to see how much superposition coding gains.

Parameters
0.05
0.1

Capacity Region

The set of all achievable rate tuples (R1,…,RK)(R_{1}, \ldots, R_{K}) for a multiuser channel. For the degraded BC, the capacity region is a two-dimensional set in the (R1,R2)(R_{1}, R_{2}) plane, bounded by the curves R2=I(U;Y2)R_{2} = I(U; Y_2) and R1=I(X;Y1∣U)R_{1} = I(X; Y_1 | U) as p(u,x)p(u, x) varies.

Related: Broadcast Channel (BC)

Time-Sharing

A simple transmission strategy where different fractions of the total time are allocated to different users. User kk receives data during its allocated fraction ΞΈk\theta_k, achieving rate ΞΈkCk\theta_k C_{k}. For the BC, the time-sharing region is a convex hull of corner points β€” always a subset of the superposition coding region.

Related: Superposition Coding

Key Takeaway

Superposition coding is the information-theoretic principle that layered encoding β€” sending a coarse common message visible to all and a fine private message visible only to the best user β€” achieves the capacity region of degraded broadcast channels. The strong user successively decodes all layers; the weak user decodes only the coarsest. This idea extends naturally to KK users and is the foundation of all broadcast channel coding strategies.

Superposition Coding: The Cloud-Satellite Structure

The cloud-satellite structure of superposition coding: cloud centers carry the weak user's message (decoded by both users), while satellites within each cloud carry the strong user's message (decoded only by the strong user after peeling off the cloud).