Ferkans — Interactive Telecom Tutor

Why Tail Bounds?

In many engineering problems we know the mean (or a few moments) of a random variable but not its full distribution. The question is: how much can we say about the probability that the random variable is far from its mean? Probability inequalities give us universal answers --- bounds that hold for every distribution satisfying the moment conditions. Markov's inequality is the simplest and most fundamental of these bounds. Every other inequality in this chapter is, at its core, a consequence of Markov applied to a cleverly chosen non-negative function.

Definition:
Tail Probability

For a random variable $X$ and a threshold $a \in \mathbb{R}$ , the upper tail probability is $\mathbb{P}(X \geq a)$ . More generally, for any event $A$ , we write $\mathbb{P}(A) = \mathbb{E}[I_A]$ where $I_A$ is the indicator function of $A$ .

Theorem: Markov's Inequality

Let $X$ be a non-negative random variable with finite mean. Then for every $a > 0$ , $\mathbb{P}(X \geq a) \leq \frac{\mathbb{E}[X]}{a}.$

The indicator $I_{\{X \geq a\}}$ is bounded above by the linear function $X/a$ for all $X \geq 0$ . Taking expectations of both sides gives the result. The bound uses only the first moment, so it must be loose enough to accommodate the worst-case distribution (a two-point mass).

Proof

Indicator bound

For $x \geq 0$ and $a > 0$ , the inequality $I_{\{x \geq a\}} \leq x/a$ holds pointwise: if $x \geq a$ then $1 \leq x/a$ , and if $0 \leq x < a$ then $0 \leq x/a$ .

Take expectations

Since $X \geq 0$ , we substitute $X$ for $x$ : $\mathbb{P}(X \geq a) = \mathbb{E}[I_{\{X \geq a\}}] \leq \mathbb{E}\!\left[\frac{X}{a}\right] = \frac{\mathbb{E}[X]}{a}.$

Tightness

Equality holds for the two-point distribution $X = a$ with probability $\mathbb{E}[X]/a$ and $X = 0$ otherwise. This is the worst-case distribution: all its mass is concentrated at the threshold.

,

Example: Markov Bound for the Exponential Distribution

Let $X \sim \text{Exp}(\lambda)$ with $\mathbb{E}[X] = 1/\lambda$ . Compare the Markov bound $\mathbb{P}(X \geq a)$ with the exact tail probability for $a = 5/\lambda$ .

Solution

Markov bound

$\mathbb{P}(X \geq 5/\lambda) \leq \frac{1/\lambda}{5/\lambda} = \frac{1}{5} = 0.2.$

Exact tail

$\mathbb{P}(X \geq 5/\lambda) = e^{-\lambda \cdot 5/\lambda} = e^{-5} \approx 0.0067.$

Comparison

The Markov bound overshoots by a factor of about 30. This looseness is the price we pay for using only the first moment. We will see that Chebyshev (using two moments) and Chernoff (using the MGF) yield progressively tighter bounds.

Generalized Markov

If $g$ is a non-negative, non-decreasing function and $g(a) > 0$ , then $\mathbb{P}(X \geq a) \leq \frac{\mathbb{E}[g(X)]}{g(a)}.$ Chebyshev uses $g(x) = x^2$ , and Chernoff uses $g(x) = e^{tx}$ . The art of tail bounding is choosing $g$ wisely.

Markov Bound Tightness

Compare the Markov bound $\mathbb{E}[X]/a$ against the exact tail probability $\mathbb{P}(X \geq a)$ for several distributions. Observe how the bound is always valid but can be very loose.

Parameters

Distribution

Distribution parameter1

Indicator vs. Linear Bound — The step function $I_{\{x \geq a\}}$ (red) lies below the line $x/a$ (blue) for all $x \geq 0$ . Taking expectations of both sides yields Markov's inequality.

Historical Note: Andrey Markov and the Inequality

1884

Andrey Andreyevich Markov (1856--1922) published this inequality in 1884 as part of his work on the convergence of sums of random variables. The inequality predates Chebyshev's better-known result, though Chebyshev's work on moments heavily influenced Markov. The simplicity and universality of Markov's inequality make it one of the most-used tools in probability theory --- despite being, by construction, the loosest possible moment-based bound.

Common Mistake: Markov Requires Non-Negativity

Mistake:

Applying Markov's inequality to a random variable that takes negative values: $\mathbb{P}(X \geq 5) \leq \mathbb{E}[X]/5$ when $X$ can be negative.

Correction:

Markov's inequality requires $X \geq 0$ almost surely. For a general $X$ , apply Markov to $|X|$ or $(X - c)^2$ for some constant $c$ . Chebyshev's inequality is precisely Markov applied to $(X - \mu)^2$ .

Tail probability

The probability that a random variable exceeds a given threshold: $\mathbb{P}(X \geq a)$ . Tail bounds provide upper bounds on this quantity using partial information about the distribution (moments, MGF).

Related: {{Ref:Gloss Concentration}}

Indicator function

$I_A = 1$ if event $A$ occurs, $0$ otherwise. Satisfies $\mathbb{P}(A) = \mathbb{E}[I_A]$ .

Markov's Inequality