The Poisson Approximation

Why the Poisson Distribution?

The binomial distribution Bin(n,p)\text{Bin}(n, p) is the natural model for counting successes in nn independent trials. But when nn is large and pp is small — so that the expected count λ=np\lambda = np is moderate — the binomial PMF is numerically unwieldy (large factorials). The Poisson distribution provides an elegant and accurate approximation that depends on the single parameter λ\lambda. More than a mere computational convenience, the Poisson distribution is the natural model for rare events: the number of packet arrivals in a time slot, the number of bit errors in a long codeword, the number of active users in a massive access system.

Definition:

Poisson Distribution

A discrete random variable XX has the Poisson distribution with parameter λ>0\lambda > 0, written XPoisson(λ)X \sim \text{Poisson}(\lambda), if: P(k)=P(X=k)=eλλkk!,k=0,1,2,P(k) = \mathbb{P}(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}, \qquad k = 0, 1, 2, \ldots

The PMF sums to 1 because k=0λk/k!=eλ\sum_{k=0}^{\infty} \lambda^k / k! = e^{\lambda}. The mean and variance are both equal to λ\lambda: E[X]=Var(X)=λ\mathbb{E}[X] = \text{Var}(X) = \lambda.

,

Theorem: Mean and Variance of the Poisson Distribution

If XPoisson(λ)X \sim \text{Poisson}(\lambda), then E[X]=λ\mathbb{E}[X] = \lambda and Var(X)=λ\text{Var}(X) = \lambda.

The equality of mean and variance is a distinctive fingerprint of the Poisson distribution. In data analysis, if the sample variance of count data is close to the sample mean, a Poisson model is a natural first choice.

Theorem: Poisson Limit Theorem

Let XnBin(n,pn)X_n \sim \text{Bin}(n, p_n) where npnλ>0n p_n \to \lambda > 0 as nn \to \infty. Then for every fixed k0k \geq 0: limn(nk)pnk(1pn)nk=eλλkk!\lim_{n \to \infty} \binom{n}{k} p_n^k (1 - p_n)^{n-k} = e^{-\lambda} \frac{\lambda^k}{k!} That is, XnX_n converges in distribution to Poisson(λ)\text{Poisson}(\lambda).

When nn is large and pp is small, each trial is an almost-never event but there are many of them. The total count depends only on the product λ=np\lambda = np. The Poisson distribution captures this regime: it is the distribution of the total count of many independent rare events.

, ,

Example: Typographical Errors

A 500-page book has 200 typographical errors scattered uniformly at random. What is the probability that a given page has no errors? Exactly 1 error?

Theorem: Le Cam's Inequality

Let X1,,XnX_1, \ldots, X_n be independent Bernoulli random variables with P(Xi=1)=pi\mathbb{P}(X_i = 1) = p_i. Let W=i=1nXiW = \sum_{i=1}^{n} X_i and λ=i=1npi\lambda = \sum_{i=1}^{n} p_i. Then: dTV(L(W),Poisson(λ))i=1npi2d_{\mathrm{TV}}(\mathcal{L}(W), \text{Poisson}(\lambda)) \leq \sum_{i=1}^{n} p_i^2 where dTVd_{\mathrm{TV}} denotes total variation distance: dTV(P,Q)=12k=0P(k)Q(k)d_{\mathrm{TV}}(P, Q) = \frac{1}{2}\sum_{k=0}^{\infty} |P(k) - Q(k)|.

The bound says the Poisson approximation is accurate when each individual pip_i is small, even if the pip_i are not all equal. The total variation distance is at most pi2\sum p_i^2, which is small when each pi1p_i \ll 1. For the homogeneous case pi=pp_i = p, the bound becomes np2=λpnp^2 = \lambda p, which vanishes as p0p \to 0 with λ\lambda fixed.

,

Le Cam for the Homogeneous Case

When pi=pp_i = p for all ii, Le Cam's inequality gives: dTV(Bin(n,p),Poisson(np))np2=λpd_{\mathrm{TV}}(\text{Bin}(n, p), \text{Poisson}(np)) \leq np^2 = \lambda p So if λ=np\lambda = np is moderate and pp is small (equivalently, nn is large), the Poisson approximation is accurate. For instance, with n=1000n = 1000 and p=0.005p = 0.005 (λ=5\lambda = 5), the bound is 5×0.005=0.0255 \times 0.005 = 0.025: the total variation distance is at most 2.5%.

Poisson Approximation to the Binomial

Compare the binomial PMF Bin(n,λ/n)\text{Bin}(n, \lambda/n) to the Poisson PMF Poisson(λ)\text{Poisson}(\lambda) and observe convergence as nn grows. The total variation distance is displayed.

Parameters
5
20

Birthday Problem: Collision Probability

The birthday problem asks: in a group of kk people, what is the probability that at least two share a birthday? This is a Poisson-approximable rare event. Compare the exact formula with the Poisson approximation.

Parameters
365
60

Example: Prussian Horse Kicks (the Classic Example)

Ladislaus Bortkiewicz (1898) collected data on deaths from horse kicks in the Prussian army: 14 corps observed over 20 years (280 corps-years), with a total of 196 deaths. Test whether the Poisson model fits.

Example: Packet Arrivals in a Time Slot

A router receives packets from n=10,000n = 10{,}000 independent sources, each transmitting in a given time slot with probability p=5×104p = 5 \times 10^{-4}. What is the probability that at most 3 packets arrive?

Why This Matters: Poisson Model for Massive Access

In massive machine-type communication (mMTC), a large number NN of devices (say N=106N = 10^6) are registered, but in any given time slot each device transmits with a small probability pp (say p=104p = 10^{-4}). The number of active devices KK is then well-modeled by Poisson(λ)\text{Poisson}(\lambda) with λ=Np=100\lambda = Np = 100. This Poisson model is the foundation of the unsourced random access framework developed by Polyanskiy (2017), which is a major research direction in the CommIT group.

🔧Engineering Note

When to Use Poisson vs. Exact Binomial

In modern computing, evaluating the binomial PMF is trivial for moderate nn (say n106n \leq 10^6). The Poisson approximation is valuable when: (1) nn is unknown or variable but λ\lambda can be estimated; (2) the model is inherently Poisson (e.g., arrivals from a Poisson process); (3) you need closed-form expressions for further analysis (e.g., deriving capacity formulas that involve sums over Poisson-distributed counts).

Historical Note: Siméon Denis Poisson and the Law of Small Numbers

19th–20th century

Siméon Denis Poisson published his treatise Recherches sur la probabilité des jugements in 1837, where the distribution that bears his name appears as a limit of the binomial. The catchy name "law of small numbers" (Gesetz der kleinen Zahlen) was coined by Ladislaus Bortkiewicz in 1898, who demonstrated the empirical fit on data ranging from Prussian cavalry deaths to children's suicides. The Poisson distribution later became the cornerstone of queueing theory through the work of A. K. Erlang on telephone traffic (1909).

Historical Note: Lucien Le Cam and Approximation Theory

20th century

Lucien Le Cam proved his celebrated inequality in 1960 while working at UC Berkeley. The result was a byproduct of his broader program on the approximation of statistical experiments. Le Cam's inequality remains the standard tool for bounding the error of the Poisson approximation and has been refined by Stein, Chen, and others into the powerful "Stein-Chen method" for Poisson approximation of dependent rare events.

Poisson Limit: textBin(n,lambda/n)totextPoisson(lambda)\\text{Bin}(n, \\lambda/n) \\to \\text{Poisson}(\\lambda)

Watch the binomial PMF bars morph into the Poisson PMF as nn increases from 5 to 500 with λ=5\lambda = 5 fixed.
As ntoinftyn \\to \\infty with np=lambdanp = \\lambda fixed, the binomial bars converge to the Poisson dots.

Theorem: Sum of Independent Poisson Random Variables

If XPoisson(λ1)X \sim \text{Poisson}(\lambda_1) and YPoisson(λ2)Y \sim \text{Poisson}(\lambda_2) are independent, then X+YPoisson(λ1+λ2)X + Y \sim \text{Poisson}(\lambda_1 + \lambda_2).

Poisson counts are closed under addition of independent components. If packets arrive from two independent sources at rates λ1\lambda_1 and λ2\lambda_2, the total arrival count is Poisson with rate λ1+λ2\lambda_1 + \lambda_2. This superposition property is fundamental to queueing theory.

Common Mistake: Mean Equals Variance Does Not Imply Poisson

Mistake:

Concluding that data is Poisson just because the sample mean and sample variance are approximately equal.

Correction:

The condition E[X]=Var(X)\mathbb{E}[X] = \text{Var}(X) is necessary but not sufficient for a Poisson distribution. Many other distributions (e.g., certain negative binomial or compound Poisson distributions) can also have equal mean and variance. A proper goodness-of-fit test (e.g., chi-squared) is needed.

Common Mistake: Applying Poisson Approximation When pp Is Not Small

Mistake:

Using the Poisson approximation Bin(n,p)Poisson(np)\text{Bin}(n, p) \approx \text{Poisson}(np) when pp is, say, 0.3 and n=50n = 50.

Correction:

Le Cam's bound gives dTVnp2=50×0.09=4.5d_{\mathrm{TV}} \leq np^2 = 50 \times 0.09 = 4.5, which is useless (total variation is at most 1). The Poisson approximation requires p1p \ll 1. For moderate pp and large nn, use the normal approximation (CLT) instead.

Quick Check

If XtextBin(1000,0.003)X \sim \\text{Bin}(1000, 0.003), which distribution best approximates XX?

Poisson(3)\text{Poisson}(3)

Poisson(0.003)\text{Poisson}(0.003)

N(3,3)\mathcal{N}(3, 3)

textPoisson(1000)\\text{Poisson}(1000)

Quick Check

Le Cam's inequality bounds the total variation distance between the distribution of a sum of independent Bernoullis and a Poisson by:

ipi\sum_i p_i

maxipi\\max_i p_i

ipi2\sum_i p_i^2

(ipi)2(\sum_i p_i)^2

🎓CommIT Contribution(2017)

Unsourced Random Access and the Poisson User Model

Y. Polyanskiy, G. CaireIEEE International Symposium on Information Theory (ISIT)

The unsourced random access paradigm introduced by Polyanskiy (2017) models the number of active users in a massive IoT system as a Poisson random variable. This elegant abstraction removes the need to identify individual users and focuses on the fundamental limits of communicating a list of messages. The CommIT group at TU Berlin has been a leading contributor to practical coding schemes for this setting, including slotted transmission and tree-based decoding algorithms.

massive-accesspoisson-modelrandom-access

Poisson distribution

A discrete distribution on {0,1,2,}\{0, 1, 2, \ldots\} with parameter λ>0\lambda > 0 and PMF P(X=k)=eλλk/k!\mathbb{P}(X = k) = e^{-\lambda}\lambda^k/k!. Both the mean and variance equal λ\lambda.

Related: binomial distribution

total variation distance

For discrete distributions PP and QQ on the same space: dTV(P,Q)=12xP(x)Q(x)d_{\mathrm{TV}}(P, Q) = \frac{1}{2}\sum_x |P(x) - Q(x)|. It equals the maximum difference maxAP(A)Q(A)\max_A |P(A) - Q(A)| over all events AA.

Related: Poisson distribution

Key Takeaway

The Poisson limit theorem shows that Bin(n,λ/n)Poisson(λ)\text{Bin}(n, \lambda/n) \to \text{Poisson}(\lambda) as nn \to \infty: the Poisson distribution is the natural model for the count of many independent rare events. Le Cam's inequality dTVpi2d_{\mathrm{TV}} \leq \sum p_i^2 quantifies the approximation error. The Poisson distribution's additivity and single-parameter simplicity make it indispensable for modeling packet arrivals, interference events, and user activity in telecommunications.