Markov's Inequality

Why Tail Bounds?

In many engineering problems we know the mean (or a few moments) of a random variable but not its full distribution. The question is: how much can we say about the probability that the random variable is far from its mean? Probability inequalities give us universal answers --- bounds that hold for every distribution satisfying the moment conditions. Markov's inequality is the simplest and most fundamental of these bounds. Every other inequality in this chapter is, at its core, a consequence of Markov applied to a cleverly chosen non-negative function.

Definition:

Tail Probability

For a random variable XX and a threshold aRa \in \mathbb{R}, the upper tail probability is P(Xa)\mathbb{P}(X \geq a). More generally, for any event AA, we write P(A)=E[IA]\mathbb{P}(A) = \mathbb{E}[I_A] where IAI_A is the indicator function of AA.

Theorem: Markov's Inequality

Let XX be a non-negative random variable with finite mean. Then for every a>0a > 0, P(Xa)E[X]a.\mathbb{P}(X \geq a) \leq \frac{\mathbb{E}[X]}{a}.

The indicator I{Xa}I_{\{X \geq a\}} is bounded above by the linear function X/aX/a for all X0X \geq 0. Taking expectations of both sides gives the result. The bound uses only the first moment, so it must be loose enough to accommodate the worst-case distribution (a two-point mass).

,

Example: Markov Bound for the Exponential Distribution

Let XExp(λ)X \sim \text{Exp}(\lambda) with E[X]=1/λ\mathbb{E}[X] = 1/\lambda. Compare the Markov bound P(Xa)\mathbb{P}(X \geq a) with the exact tail probability for a=5/λa = 5/\lambda.

Generalized Markov

If gg is a non-negative, non-decreasing function and g(a)>0g(a) > 0, then P(Xa)E[g(X)]g(a).\mathbb{P}(X \geq a) \leq \frac{\mathbb{E}[g(X)]}{g(a)}. Chebyshev uses g(x)=x2g(x) = x^2, and Chernoff uses g(x)=etxg(x) = e^{tx}. The art of tail bounding is choosing gg wisely.

Markov Bound Tightness

Compare the Markov bound E[X]/a\mathbb{E}[X]/a against the exact tail probability P(Xa)\mathbb{P}(X \geq a) for several distributions. Observe how the bound is always valid but can be very loose.

Parameters
1

Indicator vs. Linear Bound

Indicator vs. Linear Bound
The step function I{xa}I_{\{x \geq a\}} (red) lies below the line x/ax/a (blue) for all x0x \geq 0. Taking expectations of both sides yields Markov's inequality.

Historical Note: Andrey Markov and the Inequality

1884

Andrey Andreyevich Markov (1856--1922) published this inequality in 1884 as part of his work on the convergence of sums of random variables. The inequality predates Chebyshev's better-known result, though Chebyshev's work on moments heavily influenced Markov. The simplicity and universality of Markov's inequality make it one of the most-used tools in probability theory --- despite being, by construction, the loosest possible moment-based bound.

Common Mistake: Markov Requires Non-Negativity

Mistake:

Applying Markov's inequality to a random variable that takes negative values: P(X5)E[X]/5\mathbb{P}(X \geq 5) \leq \mathbb{E}[X]/5 when XX can be negative.

Correction:

Markov's inequality requires X0X \geq 0 almost surely. For a general XX, apply Markov to X|X| or (Xc)2(X - c)^2 for some constant cc. Chebyshev's inequality is precisely Markov applied to (Xμ)2(X - \mu)^2.

Tail probability

The probability that a random variable exceeds a given threshold: P(Xa)\mathbb{P}(X \geq a). Tail bounds provide upper bounds on this quantity using partial information about the distribution (moments, MGF).

Related: {{Ref:Gloss Concentration}}

Indicator function

IA=1I_A = 1 if event AA occurs, 00 otherwise. Satisfies P(A)=E[IA]\mathbb{P}(A) = \mathbb{E}[I_A].