Chebyshev's Inequality
Beyond the First Moment
Markov's inequality uses only the mean. If we also know the variance, we can say much more. The idea is simple: apply Markov to the non-negative random variable , which encodes how far deviates from its mean. The result is Chebyshev's inequality --- a two-sided tail bound that depends only on the first two moments.
Theorem: Chebyshev's Inequality
Let be a random variable with mean and finite variance . Then for every , Equivalently, setting for :
At most a fraction of the probability mass can lie more than standard deviations from the mean --- regardless of the shape of the distribution. For , at most 25% of the mass lies beyond ; for , at most 11%. Compare this with the Gaussian, where .
Define a non-negative RV
Let . Then and .
Apply Markov
For , applying Markov's inequality (TMarkov's Inequality) to with threshold :
Translate back
The event is the same as . Therefore .
Example: Chebyshev for the Uniform Distribution
Let with and . Use Chebyshev to bound and compare with the exact value.
Chebyshev bound
The bound exceeds 1, so it is useless in this case. Chebyshev only becomes informative when .
Exact value
Lesson
Chebyshev is useful when is several standard deviations from the mean. For , we get , and the exact answer is 0 (since ).
Definition: Cantelli's Inequality (One-Sided Chebyshev)
Cantelli's Inequality (One-Sided Chebyshev)
For a random variable with mean and variance , and any : This is tighter than the naive obtained by halving the two-sided Chebyshev bound.
Cantelli's inequality is proved by applying Markov to for an optimally chosen .
Chebyshev Bound vs. Exact Tail
Compare the Chebyshev bound with the exact tail probability for various distributions.
Parameters
Common Mistake: Chebyshev Treats All Distributions as Worst-Case
Mistake:
Using Chebyshev's inequality when the distribution is known to be light-tailed (e.g., Gaussian or bounded) and concluding that the tails are heavy.
Correction:
Chebyshev must accommodate the worst-case distribution with the given mean and variance --- which is a two-point mass. For light-tailed distributions, the Chernoff bound (Section 10.3) or Hoeffding's inequality (Section 10.4) give exponentially tighter bounds.
Chebyshev and the Weak Law of Large Numbers
Chebyshev's inequality gives the simplest proof of the Weak Law of Large Numbers. For i.i.d. with mean and variance , the sample mean has . Chebyshev gives: as . This is convergence in probability.
Quick Check
If has mean and standard deviation , what does Chebyshev's inequality say about ?
Chebyshev gives for .
Historical Note: Pafnuty Chebyshev and the Method of Moments
1867Pafnuty Lvovich Chebyshev (1821--1894) proved his inequality around 1867 as part of a broader program to develop the method of moments for studying convergence of distributions. His student Markov later isolated the simpler first-moment version. Chebyshev's approach --- bounding tail probabilities using moment information --- became a cornerstone of probability theory and remains the first tool we reach for when exact distributions are unavailable.
Concentration inequality
A bound on the probability that a random variable deviates from some value (typically its mean). Markov, Chebyshev, Chernoff, and Hoeffding are all concentration inequalities, with increasing tightness.
Related: {{Ref:Gloss Tail Probability}}