Jensen's Inequality
Convexity Meets Expectation
Jensen's inequality is arguably the single most important inequality in information theory. Its statement is geometric: if you have a convex function and take its expectation, the result is at least as large as the function evaluated at the mean. This simple observation has profound consequences: it implies the non-negativity of Kullback-Leibler divergence (), the concavity of entropy, and the fact that fading hurts average capacity in wireless channels. Let us make this precise.
Definition: Convex and Concave Functions
Convex and Concave Functions
A function is convex if for all and : It is strictly convex if equality holds only when or . A function is (strictly) concave if is (strictly) convex. Equivalently, for twice-differentiable : convex iff for all .
Key examples: is convex; is concave on ; is convex; is concave on .
Theorem: Jensen's Inequality
Let be a random variable with finite mean and a convex function. Then: provided exists. If is concave, the inequality reverses: . If is strictly convex (or concave), equality holds if and only if is constant almost surely.
A convex function "bows upward," so averaging the function values is at least as large as evaluating the function at the average input. Geometrically: the chord connecting two points on a convex curve lies above the curve. Taking a weighted average (expectation) over many points preserves this property.
Supporting hyperplane
Since is convex, there exists a tangent line (supporting hyperplane) at the point : Here is a subgradient if is not differentiable.
Substitute $X$ and take expectations
Substituting : Taking expectations:
Strictness condition
If is strictly convex, the supporting hyperplane inequality is strict unless a.s. Therefore equality in Jensen requires .
Example: Jensen Implies Non-Negativity of KL Divergence
Use Jensen's inequality to prove the information inequality: for any two probability distributions on the same alphabet, with equality iff .
Setup
Apply Jensen
Since is concave (equivalently, is convex):
Equality condition
Equality in Jensen for strictly concave requires to be constant a.s. under , which means for all in the support. Since both sum to 1, and .
Example: Fading Hurts Average Capacity
Consider a fading channel with instantaneous SNR (random) and capacity nats. Show that the average capacity under fading is strictly less than the capacity at the average SNR.
Identify the concave function
is strictly concave for (since ).
Apply Jensen (concave version)
By Jensen's inequality for concave : with equality iff is deterministic (no fading).
Interpretation
The left side is the ergodic capacity under fading; the right side is the AWGN capacity at the mean SNR. The strict inequality says that randomness in the channel always hurts average throughput. This is a fundamental result in wireless communications: fading is a source of penalty (unless we can adapt to it via CSI).
Jensen's Gap for Fading Capacity
Compare the ergodic capacity with the AWGN capacity as the fading severity varies. The gap quantifies the penalty of fading.
Parameters
Geometric Proof of Jensen's Inequality
Definition: Jensen's Inequality (Multivariate)
Jensen's Inequality (Multivariate)
If is convex and is a random vector with finite mean: The proof follows the same supporting hyperplane argument, using a subgradient .
Jensen and the AM-GM Inequality
The classical AM-GM inequality is a special case of Jensen applied to (convex) with the uniform distribution on . More generally, for any concave : .
Common Mistake: Getting the Direction Wrong
Mistake:
Writing and concluding that fading helps.
Correction:
is concave, so Jensen gives . The inequality reverses relative to the convex case. Always check: convex "" on the right; concave "" on the right.
Fading Channel Capacity with CSI
Caire and Shamai characterized the capacity of fading channels with various degrees of channel state information at transmitter and receiver. Jensen's inequality is central to their analysis: without CSI at the transmitter, the capacity loss due to fading is exactly the Jensen gap . With transmitter CSI, water-filling partially recovers this loss.
Key Takeaway
Jensen's inequality connects the geometry of convex/concave functions with expectation. Its two most important applications for us are: (1) the information inequality , which is the foundation of information theory, and (2) the fading penalty , which quantifies how randomness in the channel hurts throughput.
Historical Note: Johan Jensen and the 1906 Paper
1906Johan Ludwig William Valdemar Jensen (1859--1925) was a Danish mathematician and engineer who worked at the Copenhagen Telephone Company. He published the inequality that bears his name in 1906 in Acta Mathematica. The inequality was known in special cases (AM-GM, Cauchy) much earlier, but Jensen gave the first general formulation for convex functions. His engineering background is fitting: the inequality has become one of the most-used tools in communications theory.
Convex function
A function satisfying for all and . Equivalently, if twice differentiable.
Jensen gap
The difference for convex . Measures how much randomness in inflates the expected value of .
Related: {{Ref:Gloss Convex Function}}
Comparison of Probability Inequalities
| Inequality | Requirement | Bound on | Tail decay | Tightness |
|---|---|---|---|---|
| Markov | , finite | Loosest; tight for 2-point mass | ||
| Chebyshev | Finite | Tight for 2-point mass | ||
| Chernoff | Finite MGF in neighborhood of 0 | Exponential | Exponentially tight (correct rate) | |
| Hoeffding | Independent, bounded | Exponential in | Good for bounded sums; ignores variance | |
| Jensen | Convex | N/A (bounds expectations) | N/A | Tight iff is constant |
Quick Check
You know only that and . Which inequality gives a bound on ?
Chebyshev
Markov
Chernoff
Hoeffding
Markov requires only and finite mean. Gives .