Superposition Coding and the Capacity Region
The Key Idea Behind Superposition Coding
The central question for the degraded BC is: how should the transmitter encode two independent messages β one for the strong user and one for the weak user β into a single transmitted sequence?
The insight is beautifully simple. Think of two "layers" of information:
-
Coarse layer (cloud center): Encode the weak user's message into a codeword from a codebook that the weak user can decode despite its noisy channel.
-
Fine layer (satellite): For each cloud center , generate a sub-codebook of "satellite" codewords. Encode the strong user's message by choosing the appropriate satellite within the cloud.
The weak user (receiver 2) decodes only the cloud center β it treats the satellite detail as additional noise. The strong user (receiver 1) first decodes the cloud center (which it can do because it has a better channel than the weak user), then decodes the satellite within that cloud. This is successive decoding applied to the BC.
The point is that the strong user "peels off" the layers in order of decreasing coarseness, while the weak user decodes only the coarsest layer. This layered structure is what we call superposition coding.
Definition: Superposition Code
Superposition Code
A superposition code for the degraded DM-BC with an auxiliary random variable is constructed as follows:
Codebook generation. Fix a distribution .
-
Generate independent codewords , one for each weak-user message . These are the cloud centers.
-
For each cloud center , generate conditionally independent codewords , one for each strong-user message . These are the satellites.
Encoding. To send , transmit .
Decoding (weak user). Receiver 2 finds the unique such that is jointly typical.
Decoding (strong user). Receiver 1 finds the unique such that is jointly typical. Alternatively, it can decode in two stages: first find from , then find from .
The auxiliary random variable controls the "resolution" of the cloud centers. Choosing collapses the code to a single layer (all power to the weak user). Choosing means no cloud structure (all power to the strong user). The optimal splits resources between the two users.
Definition: Auxiliary Random Variable in the BC
Auxiliary Random Variable in the BC
The auxiliary random variable in the degraded BC characterization satisfies:
- forms a Markov chain, meaning is determined before the channel is applied.
- The joint distribution factors as .
- represents the "common" information visible to both receivers.
- The conditional distribution encodes the "private" information for the strong user.
The choice of controls the tradeoff between and : more "structure" in (i.e., close to ) favors the weak user; less structure favors the strong user.
Theorem: Capacity Region of the Degraded Broadcast Channel
The capacity region of the degraded DM-BC with is the closure of the set of rate pairs satisfying
for some distribution with .
The weak user decodes only (the cloud center), seeing a point-to-point channel from to β hence the bound . The strong user, having decoded , sees an effective channel from to with as side information β hence . The full region is swept by varying the distribution .
Achievability β Superposition coding
Fix and construct the superposition code as in the definition above. We analyze the error probability.
Error event for the weak user: Receiver 2 makes an error if (the true cloud center is not typical with the output) or if some other cloud center is jointly typical. By the packing lemma (Chapter 3), the probability of the second event vanishes if .
Achievability β Strong user decoding
Error event for the strong user: Receiver 1 uses joint typicality to find . By the union bound and the packing lemma applied twice:
-
The probability that a wrong cloud center passes the typicality test vanishes if (which is implied by because degradedness gives by the data processing inequality).
-
Conditioning on the correct cloud center, the probability that a wrong satellite passes the test vanishes if .
Therefore, both error probabilities tend to zero for all in the interior of the stated region.
Converse β Fano's inequality
Suppose a sequence of codes achieves . By Fano's inequality:
where .
Converse β Single-letterization
For the weak user's rate, by the chain rule:
Define . Then .
For the strong user's rate:
Since is a function of and the channel is memoryless, .
Introducing a time-sharing variable uniform on , independent of everything, and defining , , , :
The Markov chain holds by construction.
Cardinality bound
The bound follows from the support lemma (CarathΓ©odory-type argument): we need to preserve the marginal (which requires constraints) plus the two mutual information values and , giving constraints total.
The Proof Pattern: Achievability by Superposition, Converse by Fano
Notice the same proof architecture we have been seeing throughout the book: achievability uses a structured random codebook (here, superposition instead of i.i.d.) with typicality decoding, and the converse uses Fano's inequality followed by single-letterization.
The key difference from the point-to-point setting is that the converse introduces an auxiliary random variable to capture the "state" accumulated by the weak user. This is the same technique as in Gel'fandβPinsker coding (Chapter 12), and it will reappear in every multiuser converse in the book.
The degradedness condition is used in the achievability to ensure that the strong user can decode the weak user's cloud center (because ). Without degradedness, this guarantee fails, and we need more sophisticated techniques (Marton coding, Chapter 16).
Example: Capacity Region of the Binary Symmetric Broadcast Channel
Consider the degraded BSC broadcast channel from the earlier example: with , and with , so user 2 sees a BSC where . Find the capacity region.
Choose the auxiliary variable
For the binary case, the optimal choice is and where is independent of . The parameter controls the tradeoff between the two users.
Compute the weak user's rate
The channel from to is a BSC with crossover probability . Therefore:
where is the binary entropy function and denotes binary convolution.
Compute the strong user's rate
The conditional mutual information is:
Given , the channel from to produces . So has the distribution of :
The capacity region
The capacity region is the set of satisfying:
for some . Setting gives , allocating everything to the weak user. Setting gives , allocating everything to the strong user. Intermediate traces the boundary of the capacity region.
Quick Check
In superposition coding for the degraded BC, the strong user (receiver 1) decodes the weak user's cloud center first, then decodes its own satellite. Why can receiver 1 reliably decode the cloud center?
Because by the data processing inequality
Because receiver 1 has access to side information about
Because the cloud center is encoded at a higher rate
Because the encoder uses a different codebook for receiver 1
Since is Markov, is also Markov (because ). The data processing inequality gives . Since the weak user can decode the cloud at rate , the strong user can certainly decode it too.
Common Mistake: Confusing Which User's Message Is the Cloud
Mistake:
Encoding the strong user's message as the cloud center and the weak user's message as the satellite.
Correction:
The cloud center always carries the weak user's message. This is because the cloud center must be decodable by both receivers, and the weak user is the bottleneck. The satellite detail is decoded only by the strong user, who first peels off the cloud.
If you reversed the assignment, the weak user would need to decode the satellite detail β which it cannot do because it has a worse channel. The whole point of superposition coding is that information is layered from coarse (for the weak) to fine (for the strong).
Common Mistake: The Sum Rate Is Not Simply
Mistake:
Claiming that the sum rate of the degraded BC equals (the capacity of the strong user's channel).
Correction:
The sum rate satisfies , not . In general, because (the weak user extracts less information from the cloud center than the strong user could).
The maximum sum rate is achieved at one of the corner points: either or a point where but is correspondingly reduced.
Superposition Coding vs. Time-Sharing
Compare the capacity region of the degraded BC achieved by superposition coding against the time-sharing (TDMA) straight line. Adjust channel parameters to see how much superposition coding gains.
Parameters
Capacity Region
The set of all achievable rate tuples for a multiuser channel. For the degraded BC, the capacity region is a two-dimensional set in the plane, bounded by the curves and as varies.
Related: Broadcast Channel (BC)
Time-Sharing
A simple transmission strategy where different fractions of the total time are allocated to different users. User receives data during its allocated fraction , achieving rate . For the BC, the time-sharing region is a convex hull of corner points β always a subset of the superposition coding region.
Related: Superposition Coding
Key Takeaway
Superposition coding is the information-theoretic principle that layered encoding β sending a coarse common message visible to all and a fine private message visible only to the best user β achieves the capacity region of degraded broadcast channels. The strong user successively decodes all layers; the weak user decodes only the coarsest. This idea extends naturally to users and is the foundation of all broadcast channel coding strategies.