Markov's Inequality
Why Tail Bounds?
In many engineering problems we know the mean (or a few moments) of a random variable but not its full distribution. The question is: how much can we say about the probability that the random variable is far from its mean? Probability inequalities give us universal answers --- bounds that hold for every distribution satisfying the moment conditions. Markov's inequality is the simplest and most fundamental of these bounds. Every other inequality in this chapter is, at its core, a consequence of Markov applied to a cleverly chosen non-negative function.
Definition: Tail Probability
Tail Probability
For a random variable and a threshold , the upper tail probability is . More generally, for any event , we write where is the indicator function of .
Theorem: Markov's Inequality
Let be a non-negative random variable with finite mean. Then for every ,
The indicator is bounded above by the linear function for all . Taking expectations of both sides gives the result. The bound uses only the first moment, so it must be loose enough to accommodate the worst-case distribution (a two-point mass).
Indicator bound
For and , the inequality holds pointwise: if then , and if then .
Take expectations
Since , we substitute for :
Tightness
Equality holds for the two-point distribution with probability and otherwise. This is the worst-case distribution: all its mass is concentrated at the threshold.
Example: Markov Bound for the Exponential Distribution
Let with . Compare the Markov bound with the exact tail probability for .
Markov bound
Exact tail
Comparison
The Markov bound overshoots by a factor of about 30. This looseness is the price we pay for using only the first moment. We will see that Chebyshev (using two moments) and Chernoff (using the MGF) yield progressively tighter bounds.
Generalized Markov
If is a non-negative, non-decreasing function and , then Chebyshev uses , and Chernoff uses . The art of tail bounding is choosing wisely.
Markov Bound Tightness
Compare the Markov bound against the exact tail probability for several distributions. Observe how the bound is always valid but can be very loose.
Parameters
Indicator vs. Linear Bound
Historical Note: Andrey Markov and the Inequality
1884Andrey Andreyevich Markov (1856--1922) published this inequality in 1884 as part of his work on the convergence of sums of random variables. The inequality predates Chebyshev's better-known result, though Chebyshev's work on moments heavily influenced Markov. The simplicity and universality of Markov's inequality make it one of the most-used tools in probability theory --- despite being, by construction, the loosest possible moment-based bound.
Common Mistake: Markov Requires Non-Negativity
Mistake:
Applying Markov's inequality to a random variable that takes negative values: when can be negative.
Correction:
Markov's inequality requires almost surely. For a general , apply Markov to or for some constant . Chebyshev's inequality is precisely Markov applied to .
Tail probability
The probability that a random variable exceeds a given threshold: . Tail bounds provide upper bounds on this quantity using partial information about the distribution (moments, MGF).
Related: {{Ref:Gloss Concentration}}
Indicator function
if event occurs, otherwise. Satisfies .