Ferkans — Interactive Telecom Tutor

Convexity Meets Expectation

Jensen's inequality is arguably the single most important inequality in information theory. Its statement is geometric: if you have a convex function and take its expectation, the result is at least as large as the function evaluated at the mean. This simple observation has profound consequences: it implies the non-negativity of Kullback-Leibler divergence ( $D(P \| Q) \geq 0$ ), the concavity of entropy, and the fact that fading hurts average capacity in wireless channels. Let us make this precise.

Definition:
Convex and Concave Functions

A function $g: \mathbb{R} \to \mathbb{R}$ is convex if for all $x, y \in \mathbb{R}$ and $\lambda \in [0, 1]$ : $g(\lambda x + (1-\lambda) y) \leq \lambda g(x) + (1-\lambda) g(y).$ It is strictly convex if equality holds only when $\lambda \in \{0, 1\}$ or $x = y$ . A function is (strictly) concave if $-g$ is (strictly) convex. Equivalently, for twice-differentiable $g$ : convex iff $g''(x) \geq 0$ for all $x$ .

Key examples: $e^x$ is convex; $\log x$ is concave on $(0, \infty)$ ; $x^2$ is convex; $\sqrt{x}$ is concave on $[0, \infty)$ .

Theorem: Jensen's Inequality

Let $X$ be a random variable with finite mean and $g$ a convex function. Then: $g(\mathbb{E}[X]) \leq \mathbb{E}[g(X)],$ provided $\mathbb{E}[g(X)]$ exists. If $g$ is concave, the inequality reverses: $g(\mathbb{E}[X]) \geq \mathbb{E}[g(X)]$ . If $g$ is strictly convex (or concave), equality holds if and only if $X$ is constant almost surely.

A convex function "bows upward," so averaging the function values is at least as large as evaluating the function at the average input. Geometrically: the chord connecting two points on a convex curve lies above the curve. Taking a weighted average (expectation) over many points preserves this property.

Proof

Supporting hyperplane

Since $g$ is convex, there exists a tangent line (supporting hyperplane) at the point $\mu = \mathbb{E}[X]$ : $g(x) \geq g(\mu) + g'(\mu)(x - \mu) \quad \text{for all } x.$ Here $g'(\mu)$ is a subgradient if $g$ is not differentiable.

Substitute $X$ and take expectations

Substituting $x = X$ : $g(X) \geq g(\mu) + g'(\mu)(X - \mu).$ Taking expectations: $\mathbb{E}[g(X)] \geq g(\mu) + g'(\mu)\,\mathbb{E}[X - \mu] = g(\mu) = g(\mathbb{E}[X]).$

Strictness condition

If $g$ is strictly convex, the supporting hyperplane inequality is strict unless $X = \mu$ a.s. Therefore equality in Jensen requires $\text{Var}(X) = 0$ .

,

Example: Jensen Implies Non-Negativity of KL Divergence

Use Jensen's inequality to prove the information inequality: $D(P \| Q) \geq 0$ for any two probability distributions $P, Q$ on the same alphabet, with equality iff $P = Q$ .

Solution

Setup

$D(P \| Q) = \sum_x P(x) \log \frac{P(x)}{Q(x)} = -\sum_x P(x) \log \frac{Q(x)}{P(x)} = -\mathbb{E}_P\!\left[\log \frac{Q(X)}{P(X)}\right].$

Apply Jensen

Since $\log$ is concave (equivalently, $-\log$ is convex): $D(P \| Q) = -\mathbb{E}_P\!\left[\log \frac{Q(X)}{P(X)}\right] \geq -\log\!\left(\mathbb{E}_P\!\left[\frac{Q(X)}{P(X)}\right]\right) = -\log\!\left(\sum_x Q(x)\right) = -\log 1 = 0.$

Equality condition

Equality in Jensen for strictly concave $\log$ requires $Q(X)/P(X)$ to be constant a.s. under $P$ , which means $Q(x) = cP(x)$ for all $x$ in the support. Since both sum to 1, $c = 1$ and $P = Q$ .

Example: Fading Hurts Average Capacity

Consider a fading channel with instantaneous SNR $\gamma > 0$ (random) and capacity $\log(1 + \gamma)$ nats. Show that the average capacity under fading is strictly less than the capacity at the average SNR.

Solution

Identify the concave function

$g(\gamma) = \log(1 + \gamma)$ is strictly concave for $\gamma > 0$ (since $g''(\gamma) = -1/(1+\gamma)^2 < 0$ ).

Apply Jensen (concave version)

By Jensen's inequality for concave $g$ : $\mathbb{E}[\log(1 + \gamma)] \leq \log(1 + \mathbb{E}[\gamma]),$ with equality iff $\gamma$ is deterministic (no fading).

Interpretation

The left side is the ergodic capacity under fading; the right side is the AWGN capacity at the mean SNR. The strict inequality says that randomness in the channel always hurts average throughput. This is a fundamental result in wireless communications: fading is a source of penalty (unless we can adapt to it via CSI).

Jensen's Gap for Fading Capacity

Compare the ergodic capacity $\mathbb{E}[\log(1 + \gamma)]$ with the AWGN capacity $\log(1 + \mathbb{E}[\gamma])$ as the fading severity varies. The gap quantifies the penalty of fading.

Parameters

Mean SNR (dB)10

Fading model

Geometric Proof of Jensen's Inequality

Visual demonstration of Jensen's inequality: the chord of a convex function lies above the function, and the tangent line lies below. Expectation as a weighted average preserves these relationships.

The blue curve is

g(x) = e^x

(convex). The red dot at

(\mathbb{E}[X], g(\mathbb{E}[X]))

lies below the green dot at

(\mathbb{E}[X], \mathbb{E}[g(X)])

.

Definition:
Jensen's Inequality (Multivariate)

If $g: \mathbb{R}^n \to \mathbb{R}$ is convex and $\mathbf{X} \in \mathbb{R}^n$ is a random vector with finite mean: $g(\mathbb{E}[\mathbf{X}]) \leq \mathbb{E}[g(\mathbf{X})].$ The proof follows the same supporting hyperplane argument, using a subgradient $\mathbf{v} \in \partial g(\mathbb{E}[\mathbf{X}])$ .

Jensen and the AM-GM Inequality

The classical AM-GM inequality $\frac{1}{n}\sum x_i \geq (\prod x_i)^{1/n}$ is a special case of Jensen applied to $g(x) = -\log x$ (convex) with the uniform distribution on $\{x_1, \ldots, x_n\}$ . More generally, for any concave $\phi$ : $\phi(\mathbb{E}[X]) \geq \mathbb{E}[\phi(X)]$ .

Common Mistake: Getting the Direction Wrong

Mistake:

Writing $\mathbb{E}[\log(1 + \gamma)] \geq \log(1 + \mathbb{E}[\gamma])$ and concluding that fading helps.

Correction:

$\log$ is concave, so Jensen gives $\mathbb{E}[\log(1+\gamma)] \leq \log(1 + \mathbb{E}[\gamma])$ . The inequality reverses relative to the convex case. Always check: convex $\Rightarrow$ " $\geq$ " on the right; concave $\Rightarrow$ " $\leq$ " on the right.

🎓CommIT Contribution(1999)

Fading Channel Capacity with CSI

G. Caire, S. Shamai (Shitz) — IEEE Trans. Inf. Theory, vol. 45, no. 6

Caire and Shamai characterized the capacity of fading channels with various degrees of channel state information at transmitter and receiver. Jensen's inequality is central to their analysis: without CSI at the transmitter, the capacity loss due to fading is exactly the Jensen gap $\log(1 + \mathbb{E}[\gamma]) - \mathbb{E}[\log(1 + \gamma)]$ . With transmitter CSI, water-filling partially recovers this loss.

fadingcapacityCSIView Paper →

Key Takeaway

Jensen's inequality connects the geometry of convex/concave functions with expectation. Its two most important applications for us are: (1) the information inequality $D(P \| Q) \geq 0$ , which is the foundation of information theory, and (2) the fading penalty $\mathbb{E}[\log(1+\gamma)] \leq \log(1+\mathbb{E}[\gamma])$ , which quantifies how randomness in the channel hurts throughput.

Historical Note: Johan Jensen and the 1906 Paper

1906

Johan Ludwig William Valdemar Jensen (1859--1925) was a Danish mathematician and engineer who worked at the Copenhagen Telephone Company. He published the inequality that bears his name in 1906 in Acta Mathematica. The inequality was known in special cases (AM-GM, Cauchy) much earlier, but Jensen gave the first general formulation for convex functions. His engineering background is fitting: the inequality has become one of the most-used tools in communications theory.

Convex function

A function $g$ satisfying $g(\lambda x + (1-\lambda)y) \leq \lambda g(x) + (1-\lambda) g(y)$ for all $x, y$ and $\lambda \in [0,1]$ . Equivalently, $g''(x) \geq 0$ if twice differentiable.

Jensen gap

The difference $\mathbb{E}[g(X)] - g(\mathbb{E}[X]) \geq 0$ for convex $g$ . Measures how much randomness in $X$ inflates the expected value of $g$ .

Related: {{Ref:Gloss Convex Function}}

Comparison of Probability Inequalities

Inequality	Requirement	Bound on $\mathbb{P}(X \geq a)$	Tail decay	Tightness
Markov	$X \geq 0$ , finite $\mathbb{E}[X]$	$\mathbb{E}[X]/a$	$O(1/a)$	Loosest; tight for 2-point mass
Chebyshev	Finite $\text{Var}(X)$	$\sigma^2/(a-\mu)^2$	$O(1/a^2)$	Tight for 2-point mass
Chernoff	Finite MGF in neighborhood of 0	$\inf_{t>0} e^{-ta} M_X(t)$	Exponential	Exponentially tight (correct rate)
Hoeffding	Independent, bounded $X_i \in [a_i, b_i]$	$\exp(-2t^2/\sum(b_i-a_i)^2)$	Exponential in $n$	Good for bounded sums; ignores variance
Jensen	Convex $g$	N/A (bounds expectations)	N/A	Tight iff $X$ is constant

Quick Check

You know only that $X \geq 0$ and $\mathbb{E}[X] = 5$ . Which inequality gives a bound on $\mathbb{P}(X \geq 100)$ ?

Chebyshev

Markov

Chernoff

Hoeffding