Ferkans — Interactive Telecom Tutor

Beyond Shannon's Inequalities

All the information inequalities we have used in this book — the chain rule, the data processing inequality, submodularity of entropy — follow from a single source: the non-negativity of mutual information (or equivalently, the non-negativity of KL divergence). These are called Shannon-type (or polymatroidal) inequalities. For three or fewer random variables, Shannon-type inequalities characterize all valid entropy relationships.

But for four or more random variables, there exist valid inequalities that cannot be derived from Shannon's axioms. These are non-Shannon inequalities, and their existence has profound implications for network information theory and network coding.

Definition:
The Entropy Region

For $n$ random variables $(X_1, \ldots, X_n)$ , the entropy vector is the $2^n - 1$ dimensional vector $\mathbf{h} = (H(X_S))_{S \subseteq [n], S \neq \emptyset}$ containing the entropy of every non-empty subset. The entropy region $\Gamma_n^*$ is the closure of all entropy vectors achievable by some joint distribution on $(X_1, \ldots, X_n)$ . The Shannon outer bound $\Gamma_n$ is the cone defined by all Shannon-type inequalities (elemental inequalities plus monotonicity and submodularity).

For $n \leq 3$ , $\Gamma_n^* = \Gamma_n$ — Shannon-type inequalities are sufficient. For $n \geq 4$ , $\Gamma_n^* \subsetneq \Gamma_n$ — non-Shannon inequalities carve out additional constraints that the entropy region must satisfy.

Theorem: The Zhang-Yeung Inequality

For four random variables $(X_1, X_2, X_3, X_4)$ , the following inequality holds: $2I(X_3; X_4) \leq I(X_1; X_2) + I(X_1; X_3, X_4) + 3I(X_3; X_4 | X_1) + I(X_3; X_4 | X_2)$ This inequality is not implied by any combination of Shannon-type inequalities. It was the first non-Shannon inequality discovered (Zhang and Yeung, 1998).

The Zhang-Yeung inequality reveals that the entropy structure of four or more random variables is richer than what Shannon's axioms capture. Intuitively, there are correlations among four variables that cannot be decomposed into pairwise and conditional pairwise relationships. This is analogous to how a tetrahedron has geometric properties not determined by its faces alone.

Proof

Proof by linear programming

The proof verifies that the inequality holds for all joint distributions by showing that the left-hand side minus the right-hand side is non-positive for every entropy vector in $\Gamma_4^*$ . This is done by expressing it as a conic combination of conditional mutual informations (which are non-negative) plus terms that are identically zero — but no such combination exists using only Shannon-type inequalities.

Non-Shannon verification

To verify this is a non-Shannon inequality, one constructs an entropy vector $\mathbf{h} \in \Gamma_4 \setminus \Gamma_4^*$ that satisfies all Shannon-type inequalities but violates the Zhang-Yeung inequality. The existence of such a vector proves that $\Gamma_4^* \subsetneq \Gamma_4$ .

Definition:
Polymatroidal Region and Shannon-Type Inequalities

The polymatroidal region $\Gamma_n$ is the set of all vectors $\mathbf{h} \in \mathbb{R}^{2^n - 1}$ satisfying:

$h(\emptyset) = 0$ (normalization)
$h(S) \leq h(T)$ for $S \subseteq T$ (monotonicity)
$h(S \cup T) + h(S \cap T) \leq h(S) + h(T)$ for all $S, T$ (submodularity) These are the Shannon-type inequalities. They define a polyhedral cone. An inequality is non-Shannon if it is valid for $\Gamma_n^*$ but not implied by the polymatroidal constraints.

Implications for Network Coding and Capacity

Non-Shannon inequalities have direct implications for:

Network coding capacity: The capacity region of a network coding problem is characterized by the projection of $\Gamma_n^*$ onto the rate variables. Since $\Gamma_n^* \subsetneq \Gamma_n$ , outer bounds based on Shannon-type inequalities alone may be strictly loose. Tighter outer bounds using non-Shannon inequalities have been found for specific networks.
Distributed source coding: The rate region for distributed lossy compression depends on the entropy structure of the sources. Non-Shannon inequalities can tighten the outer bound.
Secret key agreement: The secret key capacity for multiple terminals involves optimizing over the entropy region. Non-Shannon constraints can change the capacity.

The difficulty is that infinitely many non-Shannon inequalities exist for $n \geq 4$ , and no finite characterization of $\Gamma_n^*$ is known for $n \geq 4$ . This means that proving tight outer bounds for multi-terminal problems is fundamentally harder than the two-user and three-user cases.

Example: A Non-Shannon Inequality Tightens a Network Coding Bound

Consider a network coding problem with four source-sink pairs where the Shannon outer bound gives capacity region $\mathcal{C}_{\text{Shannon}}$ . Show that the Zhang-Yeung inequality can strictly tighten this bound.

Solution

Setup

Consider a butterfly-like network with four sources $S_1, \ldots, S_4$ and edges with unit capacity. The Shannon outer bound allows certain rate tuples $(R_1, \ldots, R_4)$ based on cut constraints and Shannon-type inequalities on the auxiliary variables.

Apply Zhang-Yeung

Substituting the edge variables into the Zhang-Yeung inequality gives an additional constraint: $2R_3 + 2R_4 \leq R_1 + 4C_e + \ldots$ where $C_e$ is an edge capacity. This constraint is not implied by any cut bound or Shannon-type inequality.

Tightening

The resulting outer bound $\mathcal{C}_{\text{ZY}} \subsetneq \mathcal{C}_{\text{Shannon}}$ excludes rate tuples that the Shannon bound allows but that are information-theoretically infeasible. The gap is small but non-zero, demonstrating that non-Shannon inequalities are operationally relevant for network capacity.

Historical Note: The Discovery of Non-Shannon Inequalities

1998–present

For decades after Shannon's work, it was widely believed (or at least hoped) that the basic information inequalities — non-negativity of entropy, mutual information, and conditional mutual information — were sufficient to characterize all entropy relationships. Zhang and Yeung's 1998 discovery that this is false for $n \geq 4$ was a conceptual earthquake. It showed that the entropy region has a fundamentally more complex structure than the polyhedral cone defined by Shannon's axioms. Since then, infinitely many non-Shannon inequalities have been discovered (Dougherty, Freiling, Zeger; Matúš), but a complete characterization of $\Gamma_n^*$ for any $n \geq 4$ remains one of the deepest open problems in information theory.

Common Mistake: Shannon Inequalities Are NOT Sufficient for Four or More Variables

Mistake:

Proving an outer bound for a network with four or more auxiliary random variables using only Shannon-type inequalities and claiming it is tight.

Correction:

For problems involving four or more random variables, Shannon-type inequalities may not characterize the entropy region completely. The outer bound may be strictly loose. To claim tightness, either prove a matching inner bound, or explicitly verify that non-Shannon inequalities do not further constrain the region. In practice, for many problems of interest, Shannon-type inequalities are sufficient — but this must be verified, not assumed.

Entropy Region

The set of all achievable entropy vectors for $n$ random variables, characterizing all possible relationships among the joint entropies of all subsets.

Related: The Entropy Region

Non-Shannon Inequality

An information inequality that holds for all joint distributions but cannot be derived from the basic Shannon-type inequalities (non-negativity of mutual information, chain rule, submodularity).

Related: The Zhang-Yeung Inequality

Non-Shannon Inequalities