Binomial, Poisson, and Exponential Distributions
Success counts, event counts, waiting times — three distributions that look unrelated until you slide n and p and watch them merge into one road.
Discrete & Exponential — Counting Probability Models
- Step 1: n=20, p=0.5 → symmetric bell. Change p to 0.1 → skews right.
- Step 2: Set n=50, p=0.06 → np≈3. Compare with Poisson(λ=3) below — nearly identical.
- Step 3: Keep n large, p small → watch binomial converge toward Poisson.
// Formula used here — Binomial
Worked example (flip a coin 10 times, probability of exactly 3 heads)
• p=0.5, n=10, k=3
• p³ = 0.5³: probability of "3 heads in a row"
• (1−p)⁷ = 0.5⁷: probability of "remaining 7 are all tails"
• Multiply: probability of one specific sequence like HHHTTTTTTТ
• But there are other arrangements with exactly 3 heads (HTHTTT… etc.). Total: ₁₀C₃ = 120 ways
• So P(X=3) = 120 × 0.5³ × 0.5⁷ ≈ 0.117
Key properties
• Mean = np (10 × 0.5 = 5 — matches intuition)
• Variance = np(1−p). Variance is largest when p = 0.5 — outcomes are most uncertain when it's a coin toss
- Step 1: λ=1 → concentrated near 0. λ=5 → approaches a bell shape.
- Step 2: Raise λ to 20 → looks just like a normal bell (Central Limit Theorem preview).
- Step 3: λ is both the mean and variance. Set λ=3 and check "E[X] = Var[X] = λ = 3.00" at the top-left of the graph.
// Formula used here — Poisson
Worked example (average 3 emails per hour — probability of exactly 5?)
• λ=3, k=5
• λ⁵ = 3⁵ = 243: "5 events' worth of power" (larger λ = more likely to see many)
• e⁻³ ≈ 0.050: the probability of seeing zero events — baseline
• 5! = 120: we don't care about the order of the 5 arrivals
• P(X=5) = 243 × 0.050 / 120 ≈ 0.101
Connection to binomial — worth remembering
• Make n very large and p very small while keeping np = λ → binomial becomes Poisson
• Try n=50, p=0.06 in the simulator above — it matches Poisson(λ=3) almost perfectly
• The defining property: mean = variance = λ. If this doesn't hold for your data, Poisson is the wrong model
- Step 1: λ=1 → mean wait = 1. λ=2 → mean wait = 0.5. Larger λ = "happens sooner."
- Step 2: Set λ to 0.1 → long tail. Rare events have long waits.
- Step 3: Memoryless: having waited 10 min doesn't change how much longer you'll wait.
// Formula used here — Exponential
Worked example (buses arrive every 5 min on average. Probability of waiting more than 10 min?)
• λ = 1/5 (0.2 per minute), t = 10
• P(X > 10) = e⁻⁰·²ˣ¹⁰ = e⁻² ≈ 0.135 → about 13.5%
• The right formula gives "probability of waiting t or more." It drops rapidly toward zero as t grows
Mirror relationship with Poisson
• Poisson counts "how many events in a fixed time" (count distribution)
• Exponential measures "how long until the next event" (time distribution)
• Same λ, two perspectives — one for counts, one for waiting times
Memorylessness — the defining feature of the exponential
• Even after waiting 10 minutes, the probability of "arriving within the next 5 min" is the same as for someone who just showed up
• "I've waited a long time so it must come soon" doesn't apply
• This only holds when the event rate is constant over time (e.g., radioactive decay)
• Real buses have schedules, so real waiting is not memoryless
• The opposite, in fact: with a schedule, the longer you've waited the closer the next bus is — so arrival probability rises with waiting time. That's what being non-memoryless actually looks like
// How the three distributions connect — learning them separately is a waste
Textbooks place these on different pages, making them look like three unrelated distributions. In reality they're deeply connected. Once the links become visible, the whole picture snaps into focus.
- Binomial → Poisson: defect rate 0.5%, inspect 200 items (n=200, p=0.005) → approximate with Poisson λ=1
- Poisson ↔ Exponential: "λ events per hour" is Poisson; "time until next event" is exponential. Same phenomenon, two angles
- Binomial → Normal: when np ≥ 5 and n(1−p) ≥ 5, approximate with N(np, np(1−p)) — a consequence of the Central Limit Theorem
// Common misconceptions
Only when p is small and n is large. Try using Poisson to approximate a coin flip (p=0.5, n=20) — it won't even be close. Use the simulator above to see for yourself.
Wrong under exponential (memoryless) assumptions. With a real bus schedule, the hazard ("probability of arrival in the next minute") rises the longer you've waited — that's what being non-memoryless looks like mathematically (territory handled by Weibull or deterministic arrival models). The exponential applies only to purely random events.
※ Weibull is outside StatPlay's scope, but in most textbooks it sits as the very next chapter after the exponential. If the idea pulls you in, that's the page to flip to.
np is the mean. The variance is np(1−p). Forgetting the (1−p) factor is a classic slip. Variance is smallest when p is near 0 or 1 (outcomes are nearly certain).
// Shapes you'll meet again
Around discrete distributions, the same rewrites and the same mean/variance pairings keep showing up.
- The complement shape: "probability of 3 or more successes" reappears as 1 − P(0) − P(1) − P(2). "Subtract the lower tail from the whole" is the same picture each time
- The Poisson-approximation shape: in regimes where n is large and p is small (defect rate 2%, n = 100), Poisson with λ = np = 2 stands in for the binomial
- The exponential survival shape: a 1000-hour mean lifetime with P(X > 500) = e⁻⁰·⁵ ≈ 0.607. P(X > t) = e⁻λt is the same shape that returns whenever lifetimes or waiting times are involved
- Mean–variance pairs: Poisson carries mean = variance = λ, while binomial carries variance = np(1−p). Each distribution travels with its own paired shape
// Further reading
The Birthday Paradox — 50% with Just 23 People The complement-event picture exposes how badly intuition fails on probabilityEach minute: call or no call.
(mean 60/λ min)