Ferkans — Interactive Telecom Tutor

Why the Converse Requires New Tools

In Section 15.2, we proved the converse for the discrete degraded BC using Fano's inequality and single-letterization. That argument establishes the capacity region in terms of some auxiliary variable $U$ and distribution $p(u, x)$ . For the discrete case, the optimization over $p(u, x)$ is a finite-dimensional problem.

For the Gaussian BC, however, we need to show that the optimal $(U, X)$ is jointly Gaussian — i.e., that the Gaussian superposition coding scheme is optimal among all possible coding strategies. This is the hard part, and it requires the entropy power inequality (EPI).

The argument, due to Bergmans (1973), is one of the most elegant applications of the EPI in information theory. The key idea is that Gaussian noise is the "worst" noise (it maximizes conditional entropy for a given power), which translates into Gaussian inputs being the "best" for the broadcast channel.

Definition:
Entropy Power

The entropy power of a continuous random variable $X$ with differential entropy $h(X)$ is defined as:

$N(X) = \frac{1}{2\pi e} e^{2h(X)}.$

Equivalently, $N(X)$ is the variance of a Gaussian random variable with the same differential entropy as $X$ . If $X \sim \mathcal{N}(0, \sigma^2)$ , then $N(X) = \sigma^2$ .

The entropy power provides a "Gaussian equivalent" for any continuous distribution. The EPI says that this equivalent behaves super-additively under addition of independent random variables.

Theorem: The Entropy Power Inequality

For independent continuous random variables $X$ and $Y$ in $\mathbb{R}$ :

$N(X + Y) \geq N(X) + N(Y),$

or equivalently,

$e^{2h(X+Y)} \geq e^{2h(X)} + e^{2h(Y)}.$

Equality holds if and only if both $X$ and $Y$ are Gaussian.

The EPI says that adding independent random variables together produces "at least as much entropy" as adding independent Gaussians with the same individual entropy powers. Since the Gaussian maximizes entropy for a given variance, the EPI captures the idea that Gaussian noise is the "most entropic" perturbation.

A useful corollary: if $Z \sim \mathcal{N}(0, N)$ , then for any $X$ independent of $Z$ :

$h(X + Z) \geq \frac{1}{2}\log(2\pi e(N(X) + N))$

with equality iff $X$ is Gaussian.

Proof

Proof sketch (via Fisher information)

The standard proof uses the de Bruijn identity and Fisher information. We sketch the main steps:

De Bruijn identity: For $X$ with density $f$ and $Z_{t} \sim \mathcal{N}(0, t)$ independent: $\frac{d}{dt}h(X + Z_{t}) = \frac{1}{2}J(X + Z_{t})$ , where $J$ is the Fisher information.
Fisher information inequality (FII): For independent $X, Y$ : $\frac{1}{J(X+Y)} \geq \frac{1}{J(X)} + \frac{1}{J(Y)}$ .
Combining the de Bruijn identity with the FII and integrating over $t$ yields the EPI. The details are in Chapter 2, Section 2.4.

The proof is non-trivial; what matters for us is the consequence: Gaussian inputs are optimal for the Gaussian BC.

Theorem: Bergmans' Converse for the Gaussian BC

For the Gaussian broadcast channel with $N_1 < N_2$ and power $P$ , any achievable rate pair $(R_{1}, R_{2})$ must satisfy

$R_{2} \leq \frac{1}{2}\log\!\left(1 + \frac{\alpha P}{(1-\alpha)P + N_2}\right),$ $R_{1} \leq \frac{1}{2}\log\!\left(1 + \frac{(1-\alpha)P}{N_1}\right),$

for some $\alpha \in [0,1]$ . Thus the superposition coding region is exactly the capacity region.

The converse shows that no coding scheme — however clever — can achieve rates outside the superposition coding region. The key tool is the entropy power inequality, which proves that Gaussian inputs are the best choice for $U$ and $X$ .

The argument proceeds in two steps: first, Fano's inequality gives rate constraints in terms of differential entropies; second, the EPI converts these into the Gaussian expressions.

Proof

Step 1: Fano's inequality

From the general converse for the degraded BC (Section 15.2):

$nR_{2} \leq \sum_{i=1}^n I(U_i; Y_{2,i}) + n\epsilon_n,$ $nR_{1} \leq \sum_{i=1}^n I(X_i; Y_{1,i} | U_i) + n\epsilon_n,$

where $U_i = (W_2, Y_2^{i-1})$ . The challenge is to bound each term without assuming $X_i$ or $U_i$ is Gaussian.

Step 2: The EPI-based bound for the weak user

Focus on a single time index (we drop the subscript $i$ ). We need to show that $I(U; Y_2) \leq \frac{1}{2}\log(1 + \alpha_i P_i / ((1-\alpha_i)P_i + N_2))$ for some appropriate $\alpha_i$ related to the conditional variance of $X$ given $U$ .

Let $P_i = \mathbb{E}[X_i^2]$ and $\sigma_i^2 = \mathbb{E}[X_i^2 | U_i] = P_i - \text{Var}(\mathbb{E}[X_i | U_i])$ . Define $\alpha_i P_i = P_i - \sigma_i^2 = \text{Var}(\mathbb{E}[X_i | U_i])$ .

Then:

$I(U; Y_2) = h(Y_2) - h(Y_2 | U).$

Since $Y_2 = X + Z_{2}$ :

$h(Y_2) \leq \frac{1}{2}\log(2\pi e(P_i + N_2))$ (Gaussian maximizes entropy).
$h(Y_2 | U = u) = h(X + Z_{2} | U = u)$ . By the EPI: $e^{2h(X + Z_2 | U=u)} \geq e^{2h(X|U=u)} + 2\pi e N_2 \geq 2\pi e(\sigma_i^2(u) + N_2)$ where the last step uses $e^{2h(X|U=u)} \geq 2\pi e \sigma_i^2(u)$ (the Gaussian lower bound on entropy power).

Actually, we use the upper bound more carefully: the conditional entropy $h(Y_2 | U)$ is lower bounded using the EPI, which gives us an upper bound on $I(U; Y_2)$ .

Step 3: Completing the converse

After applying the EPI to bound $h(Y_2 | U)$ from below and the maximum-entropy property to bound $h(Y_2)$ from above, we obtain:

$I(U; Y_2) \leq \frac{1}{2}\log\!\left(\frac{P_i + N_2}{\sigma_i^2 + N_2}\right) = \frac{1}{2}\log\!\left(1 + \frac{\alpha_i P_i}{\sigma_i^2 + N_2}\right).$

For the strong user, $I(X; Y_1 | U) = h(Y_1 | U) - h(Z_{1})$ . Since $h(Y_1 | U) \leq \frac{1}{2}\log(2\pi e(\sigma_i^2 + N_1))$ :

$I(X; Y_1 | U) \leq \frac{1}{2}\log\!\left(1 + \frac{\sigma_i^2}{N_1}\right) = \frac{1}{2}\log\!\left(1 + \frac{(1-\alpha_i)P_i}{N_1}\right).$

Averaging over $i = 1, \ldots, n$ and using concavity of $\log$ (Jensen's inequality), we can reduce to a single $\alpha$ with the power constraint $\frac{1}{n}\sum P_i \leq P$ . The result matches the superposition coding region.

,

The Role of the EPI in the Converse

The EPI is used in one specific place in Bergmans' converse: to lower-bound $h(Y_2 | U)$ , the conditional entropy at the weak receiver. Without the EPI, we could not rule out the possibility that a non-Gaussian choice of $(U, X)$ might yield a higher $I(U; Y_2)$ than the Gaussian choice.

The EPI essentially says: "adding Gaussian noise is the worst thing that can happen to your mutual information." Since the weak user suffers from more noise ( $N_2 > N_1$ ), this worst-case noise argument is what forces the Gaussian optimality.

This is a recurring theme in Gaussian information theory: the extremal properties of the Gaussian distribution (maximum entropy, worst-case noise, EPI) are the tools that convert achievability results (which work for any distribution) into tight capacity characterizations.

Example: Verifying the Converse for a Specific Rate Pair

For the Gaussian BC with $P = 10$ , $N_1 = 1$ , $N_2 = 5$ , verify that the rate pair $(R_{1}, R_{2}) = (1.5, 0.5)$ is outside the capacity region.

Solution

Find the boundary

On the capacity region boundary, $R_{2} = \frac{1}{2}\log(1 + \alpha \cdot 10 / ((1-\alpha)\cdot 10 + 5))$ and $R_{1} = \frac{1}{2}\log(1 + (1-\alpha)\cdot 10)$ .

Setting $R_{2} = 0.5$ :

$0.5 = \frac{1}{2}\log\!\left(1 + \frac{10\alpha}{10 - 10\alpha + 5}\right) = \frac{1}{2}\log\!\left(1 + \frac{10\alpha}{15 - 10\alpha}\right).$

So $2^1 = 1 + \frac{10\alpha}{15-10\alpha}$ , giving $\frac{10\alpha}{15-10\alpha} = 1$ , hence $10\alpha = 15 - 10\alpha$ , so $\alpha = 0.75$ .

Check the strong user's rate

At $\alpha = 0.75$ :

$R_{1}^{\max} = \frac{1}{2}\log(1 + (1-0.75)\cdot 10) = \frac{1}{2}\log(3.5) \approx 0.90.$

The maximum $R_{1}$ compatible with $R_{2} = 0.5$ is $0.90$ , which is less than $1.5$ . Therefore $(1.5, 0.5)$ lies outside the capacity region — confirming the converse.

Preview: MAC-BC Duality

There is a remarkable duality between the MAC and BC capacity regions for Gaussian channels: the capacity region of the Gaussian BC with total power $P$ equals the capacity region of the "dual" Gaussian MAC with individual power constraints that sum to $P$ .

This duality, formalized by Vishwanath, Jindal, and Goldsmith (2003), means that computing the BC capacity region (hard, because the converse requires the EPI) can be reduced to computing the MAC capacity region (easier, because the MAC converse uses standard Fano arguments).

The duality extends to the MIMO case and is the computational engine behind practical beamforming design for the MIMO broadcast channel. We develop this fully in Chapter 16.

Quick Check

Under what condition does equality hold in the entropy power inequality $N(X + Y) \geq N(X) + N(Y)$ ?

When $X$ and $Y$ have the same variance

When both $X$ and $Y$ are Gaussian

When $X$ and $Y$ are identically distributed

When $X$ or $Y$ is zero (deterministic)

Correction:

When both

X

and

Y

are Gaussian

Equality in the EPI holds if and only if both $X$ and $Y$ are Gaussian. This is the key fact that makes Gaussian inputs optimal for the Gaussian BC.

Entropy Power

For a continuous random variable $X$ , the entropy power is $N(X) = \frac{1}{2\pi e} e^{2h(X)}$ . It equals the variance of a Gaussian with the same differential entropy. The entropy power inequality states $N(X+Y) \geq N(X) + N(Y)$ for independent $X, Y$ .

Related: Entropy Power Inequality (EPI)

Entropy Power Inequality (EPI)

A fundamental inequality in information theory: $e^{2h(X+Y)} \geq e^{2h(X)} + e^{2h(Y)}$ for independent continuous random variables $X$ and $Y$ . Equality holds iff both are Gaussian. The EPI is the key tool for proving converse results in Gaussian broadcast and interference channel problems.

Related: Entropy Power

🎓CommIT Contribution(2006)

MIMO Broadcast Channel Capacity via Dirty Paper Coding

H. Weingarten, Y. Steinberg, S. Shamai (Shitz), G. Caire — IEEE Transactions on Information Theory

The scalar Gaussian BC converse by Bergmans relies on the EPI, which does not have a natural matrix extension for the MIMO case. The capacity region of the MIMO BC was established through a different route: Weingarten, Steinberg, and Shamai (with contributions from Caire and Shamai's earlier work on the MIMO BC) proved that dirty paper coding (DPC) achieves the capacity region.

The proof uses the MAC-BC duality (the capacity region of the MIMO BC equals a transformed version of the MIMO MAC region) and the channel enhancement technique. This result resolved one of the major open problems in network information theory and provided the theoretical foundation for MU-MIMO precoding in 4G/5G systems.

Caire's work on the iterative water-filling algorithm for computing the DPC rate region is the practical bridge between the theoretical capacity result and implementable beamforming designs (see Book telecom, Chapter 17).

MIMObroadcast channeldirty paper codingcapacity regionView Paper →

Converse for the Degraded Broadcast Channel