Chebyshev's Inequality

Beyond the First Moment

Markov's inequality uses only the mean. If we also know the variance, we can say much more. The idea is simple: apply Markov to the non-negative random variable (Xμ)2(X - \mu)^2, which encodes how far XX deviates from its mean. The result is Chebyshev's inequality --- a two-sided tail bound that depends only on the first two moments.

Theorem: Chebyshev's Inequality

Let XX be a random variable with mean μ=E[X]\mu = \mathbb{E}[X] and finite variance Var(X)=σ2\text{Var}(X) = \sigma^2. Then for every ϵ>0\epsilon > 0, P(Xμϵ)σ2ϵ2.\mathbb{P}(|X - \mu| \geq \epsilon) \leq \frac{\sigma^2}{\epsilon^2}. Equivalently, setting ϵ=kσ\epsilon = k\sigma for k>0k > 0: P(Xμkσ)1k2.\mathbb{P}(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}.

At most a fraction 1/k21/k^2 of the probability mass can lie more than kk standard deviations from the mean --- regardless of the shape of the distribution. For k=2k = 2, at most 25% of the mass lies beyond μ±2σ\mu \pm 2\sigma; for k=3k = 3, at most 11%. Compare this with the Gaussian, where P(Z3)0.27%\mathbb{P}(|Z| \geq 3) \approx 0.27\%.

,

Example: Chebyshev for the Uniform Distribution

Let XUniform(0,1)X \sim \text{Uniform}(0, 1) with μ=1/2\mu = 1/2 and σ2=1/12\sigma^2 = 1/12. Use Chebyshev to bound P(X1/21/4)\mathbb{P}(|X - 1/2| \geq 1/4) and compare with the exact value.

Definition:

Cantelli's Inequality (One-Sided Chebyshev)

For a random variable XX with mean μ\mu and variance σ2\sigma^2, and any a>0a > 0: P(Xμa)σ2σ2+a2.\mathbb{P}(X - \mu \geq a) \leq \frac{\sigma^2}{\sigma^2 + a^2}. This is tighter than the naive σ2/(2a2)\sigma^2/(2a^2) obtained by halving the two-sided Chebyshev bound.

Cantelli's inequality is proved by applying Markov to (Xμ+t)2(X - \mu + t)^2 for an optimally chosen t>0t > 0.

Chebyshev Bound vs. Exact Tail

Compare the Chebyshev bound σ2/ϵ2\sigma^2/\epsilon^2 with the exact tail probability P(Xμϵ)\mathbb{P}(|X - \mu| \geq \epsilon) for various distributions.

Parameters
1

Common Mistake: Chebyshev Treats All Distributions as Worst-Case

Mistake:

Using Chebyshev's inequality when the distribution is known to be light-tailed (e.g., Gaussian or bounded) and concluding that the tails are heavy.

Correction:

Chebyshev must accommodate the worst-case distribution with the given mean and variance --- which is a two-point mass. For light-tailed distributions, the Chernoff bound (Section 10.3) or Hoeffding's inequality (Section 10.4) give exponentially tighter bounds.

Chebyshev and the Weak Law of Large Numbers

Chebyshev's inequality gives the simplest proof of the Weak Law of Large Numbers. For i.i.d. X1,,XnX_1, \ldots, X_n with mean μ\mu and variance σ2\sigma^2, the sample mean Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i has Var(Xˉn)=σ2/n\text{Var}(\bar{X}_n) = \sigma^2/n. Chebyshev gives: P(Xˉnμϵ)σ2nϵ20\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2} \to 0 as nn \to \infty. This is convergence in probability.

Quick Check

If XX has mean μ\mu and standard deviation σ\sigma, what does Chebyshev's inequality say about P(Xμ3σ)\mathbb{P}(|X - \mu| \geq 3\sigma)?

1/3\leq 1/3

1/9\leq 1/9

1/6\leq 1/6

1/27\leq 1/27

Historical Note: Pafnuty Chebyshev and the Method of Moments

1867

Pafnuty Lvovich Chebyshev (1821--1894) proved his inequality around 1867 as part of a broader program to develop the method of moments for studying convergence of distributions. His student Markov later isolated the simpler first-moment version. Chebyshev's approach --- bounding tail probabilities using moment information --- became a cornerstone of probability theory and remains the first tool we reach for when exact distributions are unavailable.

Concentration inequality

A bound on the probability that a random variable deviates from some value (typically its mean). Markov, Chebyshev, Chernoff, and Hoeffding are all concentration inequalities, with increasing tightness.

Related: {{Ref:Gloss Tail Probability}}