Non-Shannon Inequalities

Beyond Shannon's Inequalities

All the information inequalities we have used in this book β€” the chain rule, the data processing inequality, submodularity of entropy β€” follow from a single source: the non-negativity of mutual information (or equivalently, the non-negativity of KL divergence). These are called Shannon-type (or polymatroidal) inequalities. For three or fewer random variables, Shannon-type inequalities characterize all valid entropy relationships.

But for four or more random variables, there exist valid inequalities that cannot be derived from Shannon's axioms. These are non-Shannon inequalities, and their existence has profound implications for network information theory and network coding.

Definition:

The Entropy Region

For nn random variables (X1,…,Xn)(X_1, \ldots, X_n), the entropy vector is the 2nβˆ’12^n - 1 dimensional vector h=(H(XS))SβŠ†[n],Sβ‰ βˆ…\mathbf{h} = (H(X_S))_{S \subseteq [n], S \neq \emptyset} containing the entropy of every non-empty subset. The entropy region Ξ“nβˆ—\Gamma_n^* is the closure of all entropy vectors achievable by some joint distribution on (X1,…,Xn)(X_1, \ldots, X_n). The Shannon outer bound Ξ“n\Gamma_n is the cone defined by all Shannon-type inequalities (elemental inequalities plus monotonicity and submodularity).

For n≀3n \leq 3, Ξ“nβˆ—=Ξ“n\Gamma_n^* = \Gamma_n β€” Shannon-type inequalities are sufficient. For nβ‰₯4n \geq 4, Ξ“nβˆ—βŠŠΞ“n\Gamma_n^* \subsetneq \Gamma_n β€” non-Shannon inequalities carve out additional constraints that the entropy region must satisfy.

Theorem: The Zhang-Yeung Inequality

For four random variables (X1,X2,X3,X4)(X_1, X_2, X_3, X_4), the following inequality holds: 2I(X3;X4)≀I(X1;X2)+I(X1;X3,X4)+3I(X3;X4∣X1)+I(X3;X4∣X2)2I(X_3; X_4) \leq I(X_1; X_2) + I(X_1; X_3, X_4) + 3I(X_3; X_4 | X_1) + I(X_3; X_4 | X_2) This inequality is not implied by any combination of Shannon-type inequalities. It was the first non-Shannon inequality discovered (Zhang and Yeung, 1998).

The Zhang-Yeung inequality reveals that the entropy structure of four or more random variables is richer than what Shannon's axioms capture. Intuitively, there are correlations among four variables that cannot be decomposed into pairwise and conditional pairwise relationships. This is analogous to how a tetrahedron has geometric properties not determined by its faces alone.

Definition:

Polymatroidal Region and Shannon-Type Inequalities

The polymatroidal region Ξ“n\Gamma_n is the set of all vectors h∈R2nβˆ’1\mathbf{h} \in \mathbb{R}^{2^n - 1} satisfying:

  1. h(βˆ…)=0h(\emptyset) = 0 (normalization)
  2. h(S)≀h(T)h(S) \leq h(T) for SβŠ†TS \subseteq T (monotonicity)
  3. h(SβˆͺT)+h(S∩T)≀h(S)+h(T)h(S \cup T) + h(S \cap T) \leq h(S) + h(T) for all S,TS, T (submodularity) These are the Shannon-type inequalities. They define a polyhedral cone. An inequality is non-Shannon if it is valid for Ξ“nβˆ—\Gamma_n^* but not implied by the polymatroidal constraints.

Implications for Network Coding and Capacity

Non-Shannon inequalities have direct implications for:

  1. Network coding capacity: The capacity region of a network coding problem is characterized by the projection of Ξ“nβˆ—\Gamma_n^* onto the rate variables. Since Ξ“nβˆ—βŠŠΞ“n\Gamma_n^* \subsetneq \Gamma_n, outer bounds based on Shannon-type inequalities alone may be strictly loose. Tighter outer bounds using non-Shannon inequalities have been found for specific networks.

  2. Distributed source coding: The rate region for distributed lossy compression depends on the entropy structure of the sources. Non-Shannon inequalities can tighten the outer bound.

  3. Secret key agreement: The secret key capacity for multiple terminals involves optimizing over the entropy region. Non-Shannon constraints can change the capacity.

The difficulty is that infinitely many non-Shannon inequalities exist for nβ‰₯4n \geq 4, and no finite characterization of Ξ“nβˆ—\Gamma_n^* is known for nβ‰₯4n \geq 4. This means that proving tight outer bounds for multi-terminal problems is fundamentally harder than the two-user and three-user cases.

Example: A Non-Shannon Inequality Tightens a Network Coding Bound

Consider a network coding problem with four source-sink pairs where the Shannon outer bound gives capacity region CShannon\mathcal{C}_{\text{Shannon}}. Show that the Zhang-Yeung inequality can strictly tighten this bound.

Historical Note: The Discovery of Non-Shannon Inequalities

1998–present

For decades after Shannon's work, it was widely believed (or at least hoped) that the basic information inequalities β€” non-negativity of entropy, mutual information, and conditional mutual information β€” were sufficient to characterize all entropy relationships. Zhang and Yeung's 1998 discovery that this is false for nβ‰₯4n \geq 4 was a conceptual earthquake. It showed that the entropy region has a fundamentally more complex structure than the polyhedral cone defined by Shannon's axioms. Since then, infinitely many non-Shannon inequalities have been discovered (Dougherty, Freiling, Zeger; MatΓΊΕ‘), but a complete characterization of Ξ“nβˆ—\Gamma_n^* for any nβ‰₯4n \geq 4 remains one of the deepest open problems in information theory.

Common Mistake: Shannon Inequalities Are NOT Sufficient for Four or More Variables

Mistake:

Proving an outer bound for a network with four or more auxiliary random variables using only Shannon-type inequalities and claiming it is tight.

Correction:

For problems involving four or more random variables, Shannon-type inequalities may not characterize the entropy region completely. The outer bound may be strictly loose. To claim tightness, either prove a matching inner bound, or explicitly verify that non-Shannon inequalities do not further constrain the region. In practice, for many problems of interest, Shannon-type inequalities are sufficient β€” but this must be verified, not assumed.

Entropy Region

The set of all achievable entropy vectors for nn random variables, characterizing all possible relationships among the joint entropies of all subsets.

Related: The Entropy Region

Non-Shannon Inequality

An information inequality that holds for all joint distributions but cannot be derived from the basic Shannon-type inequalities (non-negativity of mutual information, chain rule, submodularity).

Related: The Zhang-Yeung Inequality