Common Discrete Distributions

A Toolkit of Named Distributions

A handful of discrete distributions appear so frequently in applications that they have earned their own names. Each captures a different random mechanism: coin flips (Bernoulli/binomial), waiting times (geometric/negative binomial), rare events (Poisson), equal likelihood (discrete uniform), and sampling without replacement (hypergeometric). Knowing these distributions β€” and recognizing when a problem maps to one of them β€” saves enormous effort in both analysis and computation.

Definition:

Bernoulli Distribution

X∼Bernoulli(p)X \sim \text{Bernoulli}(p) with p∈[0,1]p \in [0, 1]:

P(k)=pk(1βˆ’p)1βˆ’k,k∈{0,1}.P(k) = p^k (1-p)^{1-k}, \quad k \in \{0, 1\}.

  • Mean: E[X]=p\mathbb{E}[X] = p.
  • Variance: Var(X)=p(1βˆ’p)\text{Var}(X) = p(1-p).
  • MGF: MX(t)=(1βˆ’p)+petM_X(t) = (1-p) + pe^t.

The Bernoulli distribution models a single trial with two outcomes. It is the building block for the binomial distribution (sum of i.i.d. Bernoulli trials).

Definition:

Binomial Distribution

X∼Binomial(n,p)X \sim \text{Binomial}(n, p): the number of successes in nn independent Bernoulli(pp) trials.

P(k)=(nk)pk(1βˆ’p)nβˆ’k,k=0,1,…,n.P(k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n.

  • Mean: E[X]=np\mathbb{E}[X] = np.
  • Variance: Var(X)=np(1βˆ’p)\text{Var}(X) = np(1-p).
  • MGF: MX(t)=[(1βˆ’p)+pet]nM_X(t) = \left[(1-p) + pe^t\right]^n.
,

Theorem: Mean and Variance of the Binomial via Indicators

If X∼Binomial(n,p)X \sim \text{Binomial}(n, p), then E[X]=np\mathbb{E}[X] = np and Var(X)=np(1βˆ’p)\text{Var}(X) = np(1-p).

Example: Bit Errors in a Transmitted Block

A transmitter sends a block of n=1000n = 1000 bits over a binary symmetric channel with bit error probability p=0.01p = 0.01. Let XX be the number of errors. Find E[X]\mathbb{E}[X] and Var(X)\text{Var}(X), and estimate P(X>15)\mathbb{P}(X > 15).

Binomial PMF Explorer

Explore how the binomial PMF changes shape as nn and pp vary. For large nn and small pp, the binomial approaches the Poisson. For moderate pp and large nn, it approaches the Gaussian.

Parameters
20
0.5

Definition:

Geometric Distribution

X∼Geometric(p)X \sim \text{Geometric}(p): the number of trials until the first success.

P(k)=(1βˆ’p)kβˆ’1p,k=1,2,3,…P(k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots

  • Mean: E[X]=1/p\mathbb{E}[X] = 1/p.
  • Variance: Var(X)=(1βˆ’p)/p2\text{Var}(X) = (1-p)/p^2.
  • MGF: MX(t)=pet1βˆ’(1βˆ’p)etM_X(t) = \frac{pe^t}{1 - (1-p)e^t} for t<βˆ’ln⁑(1βˆ’p)t < -\ln(1-p).

Some authors define the geometric distribution as the number of failures before the first success, giving P(k)=(1βˆ’p)kpP(k) = (1-p)^k p for k=0,1,2,…k = 0, 1, 2, \ldots and mean (1βˆ’p)/p(1-p)/p. We follow the convention of Caire's FSP course, counting the trial on which the first success occurs.

,

Theorem: Memoryless Property of the Geometric Distribution

If X∼Geometric(p)X \sim \text{Geometric}(p), then for all m,nβ‰₯1m, n \geq 1:

P(X>m+n∣X>m)=P(X>n).\mathbb{P}(X > m + n \mid X > m) = \mathbb{P}(X > n).

Moreover, the geometric distribution is the only discrete distribution with this memoryless property.

Given that you have already waited mm trials without success, the remaining wait has the same distribution as if you were starting fresh. This is the discrete analogue of the memoryless property of the exponential distribution.

,

Geometric Memoryless Property

Visualize the memoryless property: the conditional PMF of Xβˆ’mX - m given X>mX > m is identical to the original PMF, regardless of mm.

Parameters
0.3
5

Definition:

Negative Binomial Distribution

X∼NegBin(r,p)X \sim \text{NegBin}(r, p): the number of trials until the rr-th success.

P(k)=(kβˆ’1rβˆ’1)pr(1βˆ’p)kβˆ’r,k=r,r+1,r+2,…P(k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots

  • Mean: E[X]=r/p\mathbb{E}[X] = r/p.
  • Variance: Var(X)=r(1βˆ’p)/p2\text{Var}(X) = r(1-p)/p^2.
  • MGF: MX(t)=(pet1βˆ’(1βˆ’p)et)rM_X(t) = \left(\frac{pe^t}{1 - (1-p)e^t}\right)^r for t<βˆ’ln⁑(1βˆ’p)t < -\ln(1-p).

The negative binomial is a sum of rr independent Geometric(pp) random variables. For r=1r = 1, it reduces to the geometric distribution.

Definition:

Poisson Distribution

X∼Poisson(λ)X \sim \text{Poisson}(\lambda) with λ>0\lambda > 0:

P(k)=eβˆ’Ξ»Ξ»kk!,k=0,1,2,…P(k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots

  • Mean: E[X]=Ξ»\mathbb{E}[X] = \lambda.
  • Variance: Var(X)=Ξ»\text{Var}(X) = \lambda.
  • MGF: MX(t)=exp⁑ ⁣(Ξ»(etβˆ’1))M_X(t) = \exp\!\left(\lambda(e^t - 1)\right).

The Poisson distribution is remarkable in that its mean and variance are equal. This "equi-dispersion" property is a quick diagnostic: if the sample mean and variance of a count data set are roughly equal, the Poisson may be a good model.

,

Theorem: Poisson Limit Theorem (Law of Rare Events)

If Xn∼Binomial(n,pn)X_n \sim \text{Binomial}(n, p_n) with npnβ†’Ξ»np_n \to \lambda as nβ†’βˆžn \to \infty, then for every fixed kβ‰₯0k \geq 0:

lim⁑nβ†’βˆžP(Xn=k)=eβˆ’Ξ»Ξ»kk!.\lim_{n \to \infty} \mathbb{P}(X_n = k) = \frac{e^{-\lambda} \lambda^k}{k!}.

When the number of trials is large but each trial has a small success probability, the binomial distribution is well approximated by the Poisson. This is the "law of rare events" β€” the Poisson distribution naturally governs counts of rare occurrences.

,

Poisson PMF vs Ξ»\lambda

Explore how the Poisson PMF changes as Ξ»\lambda increases. For large Ξ»\lambda, the distribution becomes approximately Gaussian by the CLT.

Parameters
5

Definition:

Discrete Uniform Distribution

X∼Uniform{a,a+1,…,b}X \sim \text{Uniform}\{a, a+1, \ldots, b\} for integers a≀ba \leq b:

P(k)=1bβˆ’a+1,k=a,a+1,…,b.P(k) = \frac{1}{b - a + 1}, \quad k = a, a+1, \ldots, b.

  • Mean: E[X]=(a+b)/2\mathbb{E}[X] = (a + b)/2.
  • Variance: Var(X)=(bβˆ’a)(bβˆ’a+2)12\text{Var}(X) = \frac{(b - a)(b - a + 2)}{12}.

The fair die (a=1a = 1, b=6b = 6) is the prototypical example. The discrete uniform distribution maximizes entropy over its support β€” we will formalize this in Section 5.5.

Definition:

Hypergeometric Distribution

Draw rr items without replacement from a population of nn items, of which n1n_1 are "good." Let XX = number of good items drawn.

P(k)=(n1k)(nβˆ’n1rβˆ’k)(nr),k=max⁑(0,rβˆ’n+n1),…,min⁑(r,n1).P(k) = \frac{\binom{n_1}{k}\binom{n - n_1}{r - k}}{\binom{n}{r}}, \quad k = \max(0, r - n + n_1), \ldots, \min(r, n_1).

  • Mean: E[X]=rβ‹…n1/n\mathbb{E}[X] = r \cdot n_1/n.
  • Variance: Var(X)=rβ‹…n1nβ‹…nβˆ’n1nβ‹…nβˆ’rnβˆ’1\text{Var}(X) = r \cdot \frac{n_1}{n} \cdot \frac{n - n_1}{n} \cdot \frac{n - r}{n - 1}.

When nn is much larger than rr, sampling without replacement is approximately the same as sampling with replacement, and the hypergeometric approaches the Binomial(r,n1/n)(r, n_1/n). The correction factor (nβˆ’r)/(nβˆ’1)(n-r)/(n-1) is the finite-population correction.

,

Summary of Common Discrete Distributions

DistributionPMF P(k)P(k)MeanVariance
Bernoulli(p)(p)pk(1βˆ’p)1βˆ’kp^k(1-p)^{1-k}, k∈{0,1}k \in \{0,1\}ppp(1βˆ’p)p(1-p)
Binomial(n,p)(n,p)(nk)pk(1βˆ’p)nβˆ’k\binom{n}{k}p^k(1-p)^{n-k}npnpnp(1βˆ’p)np(1-p)
Geometric(p)(p)(1βˆ’p)kβˆ’1p(1-p)^{k-1}p, kβ‰₯1k \geq 11/p1/p(1βˆ’p)/p2(1-p)/p^2
NegBin(r,p)(r,p)(kβˆ’1rβˆ’1)pr(1βˆ’p)kβˆ’r\binom{k-1}{r-1}p^r(1-p)^{k-r}r/pr/pr(1βˆ’p)/p2r(1-p)/p^2
Poisson(Ξ»)(\lambda)eβˆ’Ξ»Ξ»k/k!e^{-\lambda}\lambda^k/k!Ξ»\lambdaΞ»\lambda
Uniform{a,…,b}\{a,\ldots,b\}1/(bβˆ’a+1)1/(b-a+1)(a+b)/2(a+b)/2(bβˆ’a)(bβˆ’a+2)/12(b-a)(b-a+2)/12
Hypergeometric(n1k)(nβˆ’n1rβˆ’k)/(nr)\binom{n_1}{k}\binom{n-n_1}{r-k}/\binom{n}{r}rn1/nrn_1/nSee definition

Why This Matters: Poisson Models for Network Traffic

The Poisson distribution is the workhorse model for packet arrivals in telecommunication networks. If users initiate sessions independently at a low individual rate, the total number of arrivals in a time interval is approximately Poisson (by the law of rare events). This is the starting point for queueing theory β€” the Poisson arrival process feeds into the M/M/1 queue, the simplest model for a network switch.

⚠️Engineering Note

When Poisson Fails: Overdispersion in Real Networks

Real network traffic often exhibits overdispersion β€” the variance exceeds the mean, violating the Poisson assumption. This arises from burstiness and long-range dependence. The negative binomial distribution, which allows Var(X)>E[X]\text{Var}(X) > \mathbb{E}[X], is a common alternative. Always check equi-dispersion before blindly applying Poisson models.

Quick Check

A random variable XX has PMF P(k)=eβˆ’3β‹…3k/k!P(k) = e^{-3} \cdot 3^k / k! for k=0,1,2,…k = 0, 1, 2, \ldots What are E[X]\mathbb{E}[X] and Var(X)\text{Var}(X)?

E[X]=3\mathbb{E}[X] = 3, Var(X)=9\text{Var}(X) = 9

E[X]=3\mathbb{E}[X] = 3, Var(X)=3\text{Var}(X) = 3

E[X]=9\mathbb{E}[X] = 9, Var(X)=3\text{Var}(X) = 3

E[X]=3\mathbb{E}[X] = 3, Var(X)=3\text{Var}(X) = \sqrt{3}

Historical Note: Poisson and the Law of Rare Events

1837

SimΓ©on Denis Poisson published his eponymous distribution in 1837 in his work Recherches sur la probabilitΓ© des jugements. But the distribution did not become famous until Ladislaus Bortkiewicz's 1898 monograph, which used it to model the number of Prussian cavalry soldiers killed by horse kicks per year β€” the classic "law of small numbers" example. Bortkiewicz showed that across 14 Prussian army corps over 20 years, the number of deaths per corps per year followed a Poisson distribution with Ξ»β‰ˆ0.7\lambda \approx 0.7 remarkably well.

Poisson Distribution

A discrete distribution with parameter Ξ»>0\lambda > 0, PMF eβˆ’Ξ»Ξ»k/k!e^{-\lambda}\lambda^k/k!, and the property that mean equals variance equals Ξ»\lambda.

Related: Probability Mass Function (PMF)

Moment Generating Function (MGF)

MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}]. When it exists in a neighborhood of t=0t = 0, it uniquely determines the distribution and generates moments via E[Xk]=MX(k)(0)\mathbb{E}[X^k] = M_X^{(k)}(0).

Related: Expectation, Variance

Key Takeaway

The Poisson distribution arises as the limit of the binomial when nn is large and pp is small with np→λnp \to \lambda. Whenever you count rare events among many independent trials, the Poisson is the natural model. Its defining signature is that mean equals variance.