Standard Normal — The Origin of Everything
Honestly — without this single curve, none of what follows (tests, confidence intervals, the t-distribution, regression) would work.
The standard normal N(0, 1) is a bell curve with mean 0 and standard deviation 1.
The one-line trick lets every normal distribution collapse onto this same curve — and that's how a single paper table can compute probabilities for the entire world.
In other words, it's not the final boss of statistics; it's the origin. Once you own this, the rest reads as "applications of the standard normal".
- Step 1: Set k to 1.0 → area is ~68%. That's "±1σ covers about 70%."
- Step 2: Set k to 1.96 → area is ~95%. You'll see this number everywhere in testing and CIs.
- Step 3: Stretch k to 3.0 → nearly 100%. Almost nothing lies outside.
▶ "68 - 95 - 99.7" — no memorization, just see it
Slide the width k; the blue-filled area IS the probability.
± 1σ already covers ~68%, ± 2σ is ~95%, ± 3σ is nearly everything.
That famous number z = 1.96? It's the two-tail 5% critical value — hypothesis tests and confidence intervals all start there.
- Step 1: Press ▶ Standardize → the μ=2, σ=1.5 curve morphs smoothly into N(0,1).
- Step 2: Change μ to −2, σ to 2.5, then ▶ again → a totally different curve snaps onto the same pink one.
- Step 3: Pause the progress slider midway → watch μ approach 0 and σ approach 1 in real time.
- Step 4: Set σ to 0.5 → a sharp peak flattens as it merges into the standard normal.
▶ Watch every normal collapse onto "that one curve"
Height, IQ, blood pressure readings, factory part errors — real-world normal-ish things all have different means and spreads.
Yet apply z = (x − μ) / σ and they all snap onto that pink curve.
It auto-plays on scroll (▶ to replay). That's why every statistical formula needs only one standard-normal table.
Normal Distribution — Shaping Mean & Spread
The general version of the standard normal is N(μ, σ²).
μ sets the center, σ sets the spread.
Slide the parameters and the curve glides; the probability of falling inside [a, b] (pink area) updates live.
That pink area IS the "percentage" you hear in the news.
Say adult male heights follow N(170, 36) (mean 170cm, σ=6cm). What share falls in 165–175cm?
Standardize and compute z-scores — you get ≈ 59.6%.
Test scores, measurement errors, IQ — anything roughly normal gets its "X% of people in this range" from exactly this area.
The sliders below use a standardized scale (μ=0, σ=1 range) so you can feel the same principle.
Tip: drag directly on the graph to move the a/b bounds — whichever handle is closest follows your finger.
- Step 1: Keep μ=0, σ=1, set a=−1, b=1 → ~68.3%. That's "±1σ covers ~70%."
- Step 2: Shrink σ to 0.5 → the pink area for the same [−1, 1] explodes. Less spread = almost everyone is in range.
- Step 3: Slide μ to 2 → the whole curve shifts. Same a,b, but area changes dramatically.
- Step 4: Drag directly on the graph → the nearest a/b boundary follows your finger.
Probability Rules — Intuition with Venn Diagrams
Addition rule, multiplication rule, conditional probability — see them as areas before memorizing formulas.
Symbol cheat-sheet: ∪ = "or" (union), ∩ = "and" (intersection), P(A|B) = "probability of A given B happened"
P(A∪B) = P(A) + P(B) − P(A∩B) is just "area of two circles minus the overlap."
Conditional probability P(A|B) is "the fraction of B's circle occupied by A."
Press the Independence button to snap P(A∩B) = P(A)·P(B) — that's what independence means.
Draw one card from a 52-card deck. A = heart (13/52 = 0.25), B = face card (12/52 ≈ 0.23).
A∩B = heart face card (3/52 ≈ 0.06). → P(A∪B) = 0.25 + 0.23 − 0.06 = 0.42.
Independence example: two dice. A = 1st is even, B = 2nd is ≥3. The 1st roll doesn't affect the 2nd, so independent. P(A∩B) = 1/2 × 2/3 = 1/3.
- Step 1: Set P(A)=0.4, P(B)=0.3, P(A∩B)=0.12 → check the P(A∪B) readout below the graph shows 0.58. That's 0.4+0.3−0.12 = 0.58, the addition rule.
- Step 2: Press "Set Independent" → P(A∩B) auto-adjusts to P(A)·P(B). That's what independence means.
- Step 3: Drag P(A∩B) near 0 → the circles separate. This is "mutually exclusive" (can't happen together).
- Step 4: Push P(A∩B) close to P(B) → the P(A|B) readout approaches 1. If B happens, A almost certainly happens too.
▶ Interactive Venn Diagram
Bayes' Theorem — The Posterior Plot Twist
"The test has 99% sensitivity & 95% specificity, and you tested positive" — is there a 99% chance you're sick?
…Answer: only 16.7%. More than half of doctors get this classic quiz wrong.
Walk through the numbers. In a town of 1,000, 10 people are sick and 990 are healthy. Test everyone and 60 come back positive — 10 truly sick (true positives) + 50 healthy but wrongly flagged (false positives). If you're one of those 60, your chance of actually being sick is 10 ÷ 60 = 16.7%. The larger the healthy majority, the more false positives dilute the real cases.
- Step 1: Sweep prevalence between 0.001 (general population) and 0.4 (high-risk group) → with the same 99% sensitivity and 95% specificity, PPV swings from 1.96% to 92.86%. Feel how strongly the prior dominates.
- Step 2: At prevalence 0.001, PPV is just 1.96% — only ~2 of 100 positives are actually sick. Feel the gap from intuition.
- Step 3: Keep prevalence at 0.001 but raise specificity to 99.9% → false positives plummet, PPV improves dramatically.
- Step 4: Watch the "town of 1,000" diagram and compare TP (true positive) vs FP (false positive) counts.
Discrete & Exponential — Counting Probability Models
- Step 1: Binomial: set n=20, p=0.5 → symmetric bell. Change p to 0.1 → skews right.
- Step 2: Set binomial n=50, p=0.06 → np≈3. Compare with Poisson(λ=3) — nearly identical shapes.
- Step 3: Set Poisson λ=3 and check "E[X] = Var[X] = λ = 3.00" in the top-left of the graph. Raise λ to 20 → it approaches a bell.
- Step 1: λ=3 → average of 3 events. Most mass sits between 0 and 6. Check "E[X] = Var[X] = λ = 3.00" at the top-left.
- Step 2: Lower λ to 1 → peak shifts to 0. Rare events dominate.
- Step 3: Raise λ to 10 → starts looking bell-shaped. The normal is emerging.
- Step 4: Push λ to 20 → nearly normal in shape. As λ grows, Poisson can be approximated by a normal distribution (Central Limit Theorem at work).
- Step 1: λ=1 → mean wait time = 1. The curve drops sharply.
- Step 2: Set λ to 0.2 → gentle decay. Rare events mean long waits.
- Step 3: Set λ to 3 → steep drop. Frequent events = short waits (mean 1/3).
- Step 4: Every λ gives the same "L" shape. That is memorylessness — past waiting does not affect the future.
Each minute: call or no call.
(mean 60/λ min)
▼ What comes next — into Inference
Probability and distributions are locked in. Next up: inferring population truths from samples.
The Central Limit Theorem guarantees sample means go normal, the Law of Large Numbers says they converge to the truth.
Then confidence intervals (how precise?) and hypothesis tests (is there a difference?),
capped by the t, χ², F trio for the real world where σ is unknown.
Statistical Inference — learning about populations from samples
Central Limit Theorem — Why Normal Is King
A fact worth pausing on — no matter how skewed the base distribution is,
if you take n samples and average, then repeat, the distribution of those averages
converges on its own to a bell (normal).
The lab below shows left = the raw skewed source side-by-side with right = the sample-mean distribution,
so you can watch the bell emerge. Crank n up and the bell tightens (SE = σ/√n).
- Step 1: Use the dropdown above to choose "Exponential" and drag the n slider to 1, hit ▶ → still heavily skewed. Not a bell at all.
- Step 2: Drag the n slider to 5 and run → starting to look bell-ish, but still skewed.
- Step 3: Drag the n slider to 30 and run → nearly normal. "n≥30" is a practical rule of thumb, not a theorem — heavily skewed distributions may need more.
- Step 4: Switch the dropdown to "Bimodal" and repeat → even a two-peaked distribution morphs into a bell. Worth watching twice.
Law of Large Numbers — Converging to Truth
10 heads in a row at the start of a coin-flip? Not that weird.
But flip it 10,000 times and the head-ratio locks onto almost exactly 0.5.
That's the Law of Large Numbers — the more samples you draw, the more observed values get pulled toward the truth.
This is why statistics counts as evidence, not a vague hunch.
- Step 1: Set p=0.5, hit ▶ → the line wobbles wildly at first, then gets pulled toward 0.5.
- Step 2: RESET and run again → the early path is different every time, but it always converges.
- Step 3: Change p to 0.8 and simulate → the red line now converges to 0.8.
- Step 4: Set p to 0.05 (rare event) → it hugs zero early on, but still converges to p. The law holds.
Confidence Interval — What 95% Really Means
The 95% confidence interval is famously misunderstood.
It does NOT mean "the true value is inside with 95% probability". The correct reading:
"repeat this sampling many times, and ~95% of the resulting intervals will capture the true value".
The lab below brute-forces that intuition. Thin pink = the unlucky intervals that missed.
Once the pink share settles around ~5%, you've got it.
- Step 1: At 95% confidence, n=30, hit ▶ → pink (missed) intervals should be ~5% of the total.
- Step 2: Drop confidence to 80% and regenerate → more pink. Narrower net = more misses.
- Step 3: Back to 95%, set n to 200 → intervals get much tighter. The power of large samples.
- Step 4: Set n to 5 → intervals are huge. With few samples, you need a wide net to catch the truth.
Hypothesis Testing — Reject or Fail to Reject
Think of testing as a trial.
You start by assuming H₀ ("the drug has no effect" = "innocent"). Then if your computed test statistic z lands in the pre-chosen rejection region, you convict — that is, reject H₀.
Two panels below: ① geometry of z and rejection regions (two-sided, right, left), and ② false alarms (α) vs. misses (β).
- Step 1: Panel ①: z=1.96, α=0.05, two-sided → right on the boundary. p≈0.05. The watershed.
α (significance level) = the threshold for "too extreme to be coincidence." It sets the width of the rejection zone. - Step 2: Drag the "Observed z" slider to 2.5 → deep in the rejection zone, p-value shrinks. "Strong evidence."
- Step 3: Switch the test-type dropdown to "Right" → same z=1.96 but rejection area is one-sided; p-value halves.
▶ ① Basics: z-statistic & rejection region
- Step 1: δ=2, α=0.05 → high power (most of the purple curve falls in the rejection region).
- Step 2: Lower δ to 0.5 → purple and blue nearly overlap; power drops sharply. Small effects are hard to detect.
- Step 3: Tighten α to 0.01 → rejection region shrinks, β (misses) increases. The trade-off in action.
- Step 4: Drag horizontally on the chart → move the critical boundary and feel the α vs β tug-of-war.
▶ ② Two kinds of errors: α, β, power
This panel is a hands-on playground for the α (Type I) / β (Type II) / power trade-off.
Come back whenever you want to feel the trade-off. For the deeper conceptual write-up (the 2×2 matrix, why α and β live in different worlds), see the "Type I & II in a 2×2 table" column.
Tip: drag horizontally on the chart to slide the critical boundary (α).
Proportion Test & Estimation — from sample proportion to the truth
A world where every data point is just "success" or "failure." The sample proportion p̂ = x/n is the starting point, and when n is large enough the normal approximation kicks in (thanks to the Central Limit Theorem).
We'll work through ① interval estimation → ② one-sample test → ③ two-sample test. Compare each step with what you learned for means — spotting the similarities makes the differences easy to absorb.
- Step 1: n=15, p̂=0.50, 95% → the interval is really wide. Just 15 people isn't enough precision.
- Step 2: Push n to 100 → the interval tightens up fast. Feel the power of sample size.
- Step 3: Set p̂ to 0.90 → variance p(1-p) shrinks, so the CI narrows. p̂=0.50 gives the widest interval.
- Step 4: Keep p̂=0.90, drop n to 10 → if ⚠ appears, the normal approximation conditions aren't met.
▶ ① Confidence interval for a proportion
"What does a 95% CI actually mean?" — easy to lose the thread by chasing the words alone. Reframed: "if you repeated the same survey many times, about 95% of the intervals would capture the true proportion."
Run it 200 times and see whether that 95% claim holds up.
▶ ①-b CI simulation
- Step 1: n=100, p̂=0.60, p₀=0.50, α=0.05, two-sided → "Is 60% significantly different from 50%?"
- Step 2: Change p₀ to 0.55 → z shrinks and you can no longer reject. Small differences are hard to detect.
- Step 3: Increase n to 400 → same p̂=0.60, p₀=0.55 but now you reject. The power of sample size.
- Step 4: Switch test type to "Right" → one-sided test asking only "greater than 50%?" The p-value halves.
▶ ② One-sample z-test for a proportion
Same flow as testing a population mean. Set up H₀: p = p₀, compute z from the sample, and check whether it lands in the rejection region.
The only twist is that the standard error becomes √(p₀(1−p₀)/n). Nail that, and the rest is familiar.
- Step 1: n₁=n₂=100, p̂₁=0.60, p̂₂=0.45 → is the difference significant?
- Step 2: Move p̂₂ toward 0.55 → z shrinks, rejection gets harder.
- Step 3: Increase both to n₁=n₂=400 → same gap, more power.
- Step 4: Try n₁=50, n₂=200 (asymmetric) → the smaller n is the bottleneck.
▶ ③ Two-proportion z-test
"Drug A vs. Drug B — which works better?" "Ad A vs. Ad B — is the click-through rate really different?" This test is for comparing two groups.
The key idea: under H₀: p₁ = p₂, we pool both samples into a single pooled proportion to build a shared SE. Get that, and the rest mirrors the one-sample test.
Three Test Distributions — Meet t, χ² & F
t, χ², F are all derived from the normal. Think of them as "the standard normal, scaled to reflect that we only ever see a sample".
Use them for: t — testing a mean when the population variance is unknown (i.e. nearly every real test of a mean);
χ² — testing a variance, independence, goodness-of-fit for categorical data;
F — ratios of variances (ANOVA, the overall F in regression).
Slide df: t converges to N(0,1) as df→∞, and χ²/F get more symmetric with more df. The Central Limit Theorem is quietly doing the work under the hood.
- Step 1: n=3 → df=1, extremely heavy tails. Compare with the normal (gray dashed line).
- Step 2: n=10 → still heavier tails than normal, but getting closer.
- Step 3: n=31 → nearly indistinguishable from N(0,1). The CI bars nearly overlap.
▶ t distribution
Use for: testing means with unknown variance, regression t-values.
Flavor: heavier tails than N(0,1); matches N(0,1) as df→∞.
With small samples, a normal-based CI is too narrow — overconfident. The t-distribution honestly reflects that extra uncertainty. As n grows, t converges to normal — that's what the two bars show.
- Step 1: Keep "Fair die" selected and press Roll a few times → even a fair die varies each time. Watch the bar chart and χ² statistic change.
- Step 2: Switch to "Loaded" and roll → face 1 jumps out, χ² enters the rejection region.
- Step 3: Increase the rolls slider → more samples detect smaller biases (higher power).
- Step 4: Move the df slider to explore the shape. df=30 → nearly normal.
▶ χ² distribution
Use for: variance tests, chi-square tests of independence / goodness-of-fit.
Flavor: non-negative, right-skewed. Mean = k, variance = 2k. Goes bell-shaped with large df.
The chi-squared distribution measures "how big is the gap between observed and expected." It tells you whether that gap is just random noise or a real bias.
- Step 1: Set A and B to the same SD (e.g. 10 and 10) → F≈1, not rejected.
- Step 2: Increase B only (e.g. A=5, B=15) → F grows and enters the rejection region.
- Step 3: Increase n → same SD gap but lower p-value (more power).
- Step 4: Alternate A and B's SD to feel what "variance ratio" means.
▶ F distribution
Use for: ANOVA, overall F-test in regression.
Flavor: non-negative, right-skewed. Shape depends on both df.
The F-distribution evaluates the ratio of two groups' spread. ANOVA also uses this F-statistic to test whether group means differ.
Chi-Squared Test — Quantifying the Gap
Goodness-of-fit asks: "Does the observed category distribution match a theoretical one?" Classic example: is the die fair?
Test of independence asks: "Are two categorical variables independent?" Compute χ² = Σ (O−E)²/E across every cell of the contingency table.
Why divide by E? → A deviation of 2 from an expected 10 matters more than 2 from an expected 1,000. Dividing by E turns raw gaps into relative ones.
Both use a χ²-distributed statistic; the p-value is the right-tail area. df = k−1 for goodness-of-fit, (r−1)(c−1) for independence.
- Step 1: ① Goodness-of-fit: select "Fair die" and auto-roll → as n grows, χ² stays low, p-value stays high (fail to reject).
- Step 2: Switch to "Loaded" and auto-roll → χ² shoots up, p-value drops below α. The cheat is caught.
- Step 3: ② Independence: click only the top-left cell to create imbalance → χ² spikes, independence is rejected.
- Step 4: Reset and click cells evenly → χ² stays small. No imbalance = independence holds.
▶ ① Goodness-of-Fit — Is the Die Fair?
- Step 1: Click the top-left and bottom-right cells repeatedly → diagonal bias raises χ², verdict: "not independent."
- Step 2: Reset, then click all cells evenly → χ² stays small. No bias = independent.
- Step 3: Set α to 0.01 → higher bar for rejection. Same data might flip the verdict.
- Step 4: Flood a single cell → huge gap from expected frequencies, p-value plummets.
▶ ② Test of Independence — Are Two Variables Independent?
ANOVA — Feel What the F-Statistic Really Means
ANOVA compares "how different the groups are" vs. "how spread out each group is."
When the difference outweighs the spread, we conclude the groups really differ. Move the sliders to feel it.
- Step 1: Click ▶ 100 trials — runs 100 experiments where 3 identical groups are compared with 3 pairwise t-tests.
- Step 2: Check the red dot ratio — theory predicts 14.3%. What do you get?
- Step 3: Click ▶ 1000 trials — as trials accumulate, the rate converges toward 14.3%.
▶ Why You Can't Just Repeat t-Tests
The F-test (ANOVA) below is how we avoid the false-positive inflation you just saw in the simulation above.
Experiment Guide — Feel the F-Statistic- Step 1: Set between-group difference to zero → F ≈ 1. All groups look like one population.
- Step 2: Increase between-group difference → F rises, p drops. Watch it cross the rejection threshold.
- Step 3: Increase within-group spread → same difference but F drops. "Real differences can hide in noise."
- Step 4: Increase sample size → F rises. Larger samples detect smaller effects (statistical power).
▶ Between vs. Within — Feel the F-Statistic
▼ What comes next — into Modeling
You can now test for differences and independence. Next: finding relationships and predicting.
Correlation measures the "link" between two variables, then simple regression turns it into prediction.
Then multiple regression controls for confounders to measure true effects —
tests, CIs, and the F-distribution all converge here. The full statistical toolkit comes together.
Modeling — finding relationships and making predictions
Correlation — Measuring the Link Between Two Variables
The correlation coefficient r measures whether two variables move together. +1 means a perfect positive linear relationship, −1 a perfect negative one, 0 means no linear relationship. Click the canvas to add points and watch r update in real time. The yellow dashed lines mark the means, splitting the space into four quadrants — more points in green quadrants means positive correlation, more in red means negative. This coloring is the "tug-of-war of signs" in the formula Σ(x−x̄)(y−ȳ).
- Step 1: Set r = 0.80, click Generate → an upward-sloping band. Points cluster in the green quadrants.
- Step 2: Change to r = −0.60, Generate → downward slope. More points in the red quadrants.
- Step 3: Set r = 0.00, Generate → points spread evenly across all four quadrants. A "cloud," not a band.
- Step 4: CLEAR, then manually place a U-shape → r ≈ 0 yet there's an obvious pattern! r only captures linear relationships.
- Step 1: Watch the animation. Points appear one by one, patterns look totally different…
- Step 2: Check each plot's r ≈ 0.816. They're all nearly identical!
- Step 3: Regression lines appear → lines are nearly identical too. Yet II is curved, III has an outlier, IV is dominated by one point.
- Step 4: Click "Replay" to watch again. Numbers alone don't tell the whole story.
▶ The pitfall of r — Anscombe's Quartet
Simple Regression — Drawing the OLS Line
Regression with just one explanatory variable is simple regression. It assumes a linear relationship: when x increases by 1, y moves by β₁. Ordinary least squares (OLS) picks the line that minimizes the sum of squared vertical residuals. Click the canvas to add points and watch the line snap into place. Green bars are residuals. R² (in 0–1) measures how much of y the line explains.
- Step 1: Hit "Random 20 pts" → a regression line and R² appear. Check the green bars (residuals).
- Step 2: Click far from the line to add one outlier → the line jerks toward it. Watch how far a single point can drag the fit.
- Step 3: CLEAR and place 5 points nearly in a line → R² ≈ 1.0. A perfect linear relationship.
- Step 4: CLEAR and arrange points in a circle → R² ≈ 0. A line can't capture this pattern.
Multiple Regression — Predict with multiple variables
One variable gives a line; two give a plane in 3D. But the point isn't geometry — it's removing confounding to isolate each variable's true effect. Start with the side-by-side comparison to see the moment β₁ shifts.
- Step 1: At default (corr = −0.5), compare β₁ left vs. right. Simple regression is smaller — the study effect is underestimated.
- Step 2: Set correlation to 0 → both β₁ values nearly match. "No confounding, no bias."
- Step 3: Set correlation to +0.5 → now simple regression β₁ is too large. Confounding can bias in either direction.
- Step 4: Check R² too. Multiple regression is always ≥ simple — adding a variable improved explanatory power.
▶ Simple vs. Multiple Regression — Watch β₁ Shift
- Step 1: Drag the 3D plot to rotate. The translucent surface is the regression plane — data points align along it.
- Step 2: Move the study slider → the prediction dot slides along the x₁ direction. The tilt = β₁.
- Step 3: Move sleep too → it moves along x₂. The tilt = β₂. Each variable's contribution is visible.
- Step 4: Hit Resample a few times → β₁, β₂, R² shift slightly each time. Estimates have variability too.
▼ The Big Picture
From the standard normal through probability, distributions, inference, and regression — the core scope of introductory statistics, end to end.
Every page is a descendant of N(0,1):
t = "normal with unknown σ," χ² = "sum of squared normals," F = "ratio of χ²'s," regression tests use t and F.
Dive deeper into each topic's dedicated page for formulas and detailed explanations.