StatPlay

P1 / Standard Normal

Standard Normal — The Origin of Everything

Let's start with the curve that dominates all of statistics — the standard normal. The Central Limit Theorem, hypothesis testing, confidence intervals — everything circles back here. Touch the bell curve first.

Honestly — without this single curve, none of what follows (tests, confidence intervals, the t-distribution, regression) would work.
The standard normal N(0, 1) is a bell curve with mean 0 and standard deviation 1. The one-line trick lets every normal distribution collapse onto this same curve — and that's how a single paper table can compute probabilities for the entire world.
In other words, it's not the final boss of statistics; it's the origin. Once you own this, the rest reads as "applications of the standard normal".

Experiment Guide — try these in order

Step 1: Set k to 1.0 → area is ~68%. That's "±1σ covers about 70%."
Step 2: Set k to 1.96 → area is ~95%. You'll see this number everywhere in testing and CIs.
Step 3: Stretch k to 3.0 → nearly 100%. Almost nothing lies outside.

▶ "68 - 95 - 99.7" — no memorization, just see it

Slide the width k; the blue-filled area IS the probability. ± 1σ already covers ~68%, ± 2σ is ~95%, ± 3σ is nearly everything.
That famous number z = 1.96? It's the two-tail 5% critical value — hypothesis tests and confidence intervals all start there.

Range ±k σ = 1.0

P( |Z| ≤ k )—

P( Z ≤ k )—

Outside prob.—

Experiment Guide — try these in order

Step 1: Press ▶ Standardize → the μ=2, σ=1.5 curve morphs smoothly into N(0,1).
Step 2: Change μ to −2, σ to 2.5, then ▶ again → a totally different curve snaps onto the same pink one.
Step 3: Pause the progress slider midway → watch μ approach 0 and σ approach 1 in real time.
Step 4: Set σ to 0.5 → a sharp peak flattens as it merges into the standard normal.

▶ Watch every normal collapse onto "that one curve"

Height, IQ, blood pressure readings, factory part errors — real-world normal-ish things all have different means and spreads. Yet apply z = (x − μ) / σ and they all snap onto that pink curve.
It auto-plays on scroll (▶ to replay). That's why every statistical formula needs only one standard-normal table.

μ = 2.0

σ = 1.5

Progress = 0

OriginalN(2.0, 1.5²)

Transformed mean—

Transformed σ—

📖 Formulas & detailed explanation → dedicated page

📊 Column: What Is Standardization? — The universal translator for "normal" →

UP NEXT —the normal as a tool ▸ P2 Normal Distribution

▸ Dedicated page for this topic

P2 / Normal & Standardization

Normal Distribution — Shaping Mean & Spread

The standard normal was fixed at μ=0, σ=1. Real data has any mean and spread. Move μ and σ to wield the normal as a tool. Standardization maps it back to Z, so any normal connects to the standard normal.

The general version of the standard normal is N(μ, σ²). μ sets the center, σ sets the spread. Slide the parameters and the curve glides; the probability of falling inside [a, b] (pink area) updates live.
That pink area IS the "percentage" you hear in the news. Say adult male heights follow N(170, 36) (mean 170cm, σ=6cm). What share falls in 165–175cm? Standardize and compute z-scores — you get ≈ 59.6%. Test scores, measurement errors, IQ — anything roughly normal gets its "X% of people in this range" from exactly this area. The sliders below use a standardized scale (μ=0, σ=1 range) so you can feel the same principle.
Tip: drag directly on the graph to move the a/b bounds — whichever handle is closest follows your finger.

Experiment Guide — try these in order

Step 1: Keep μ=0, σ=1, set a=−1, b=1 → ~68.3%. That's "±1σ covers ~70%."
Step 2: Shrink σ to 0.5 → the pink area for the same [−1, 1] explodes. Less spread = almost everyone is in range.
Step 3: Slide μ to 2 → the whole curve shifts. Same a,b, but area changes dramatically.
Step 4: Drag directly on the graph → the nearest a/b boundary follows your finger.

μ (mean) = 0

σ (std dev) = 1

Interval [a,b] : a = -1

b = 1

P(a ≤ X ≤ b)—

z-score (a)—

z-score (b)—

📖 Formulas & detailed explanation → dedicated page

📊 Column: What Is Hensachi? — See "Top X%" through the normal curve →

UP NEXT —probability rules with Venn diagrams ▸ P3 Probability Rules

▸ Dedicated page for this topic

P3 / Probability Rules

Probability Rules — Intuition with Venn Diagrams

You've got the shape of the normal. Now let's step back to the foundations — the probability rules that make all of it work. Addition, multiplication, and conditional probability visualized as overlapping areas.

Addition rule, multiplication rule, conditional probability — see them as areas before memorizing formulas.
Symbol cheat-sheet: ∪ = "or" (union), ∩ = "and" (intersection), P(A|B) = "probability of A given B happened"
P(A∪B) = P(A) + P(B) − P(A∩B) is just "area of two circles minus the overlap." Conditional probability P(A|B) is "the fraction of B's circle occupied by A."
Press the Independence button to snap P(A∩B) = P(A)·P(B) — that's what independence means.

Concrete example
Draw one card from a 52-card deck. A = heart (13/52 = 0.25), B = face card (12/52 ≈ 0.23).
A∩B = heart face card (3/52 ≈ 0.06). → P(A∪B) = 0.25 + 0.23 − 0.06 = 0.42.
Independence example: two dice. A = 1st is even, B = 2nd is ≥3. The 1st roll doesn't affect the 2nd, so independent. P(A∩B) = 1/2 × 2/3 = 1/3.

Experiment Guide — try these in order

Step 1: Set P(A)=0.4, P(B)=0.3, P(A∩B)=0.12 → check the P(A∪B) readout below the graph shows 0.58. That's 0.4+0.3−0.12 = 0.58, the addition rule.
Step 2: Press "Set Independent" → P(A∩B) auto-adjusts to P(A)·P(B). That's what independence means.
Step 3: Drag P(A∩B) near 0 → the circles separate. This is "mutually exclusive" (can't happen together).
Step 4: Push P(A∩B) close to P(B) → the P(A|B) readout approaches 1. If B happens, A almost certainly happens too.

▶ Interactive Venn Diagram

P(A) = 0.40

P(B) = 0.30

P(A∩B) = 0.12

P(A∪B)—

P(A|B)—

P(B|A)—

Independent?—

📖 Formulas & detailed explanation → dedicated page

📊 Column: The Birthday Paradox — How far off our probability intuition can be →

UP NEXT —flipping the conditional ▸ P4 Bayes' Theorem

▸ Dedicated page for this topic

P4 / Bayes' Theorem

Bayes' Theorem — The Posterior Plot Twist

Distribution toolbox complete. Finally, Bayes' theorem flips the conditioning. You test positive — what's the chance you're actually sick? Intuition fails here; let's build it.

"The test has 99% sensitivity & 95% specificity, and you tested positive" — is there a 99% chance you're sick?
…Answer: only 16.7%. More than half of doctors get this classic quiz wrong.
Walk through the numbers. In a town of 1,000, 10 people are sick and 990 are healthy. Test everyone and 60 come back positive — 10 truly sick (true positives) + 50 healthy but wrongly flagged (false positives). If you're one of those 60, your chance of actually being sick is 10 ÷ 60 = 16.7%. The larger the healthy majority, the more false positives dilute the real cases.

Experiment Guide — try these in order

Step 1: Sweep prevalence between 0.001 (general population) and 0.4 (high-risk group) → with the same 99% sensitivity and 95% specificity, PPV swings from 1.96% to 92.86%. Feel how strongly the prior dominates.
Step 2: At prevalence 0.001, PPV is just 1.96% — only ~2 of 100 positives are actually sick. Feel the gap from intuition.
Step 3: Keep prevalence at 0.001 but raise specificity to 99.9% → false positives plummet, PPV improves dramatically.
Step 4: Watch the "town of 1,000" diagram and compare TP (true positive) vs FP (false positive) counts.

Prevalence — how many of 1,000 are sick? = 50 / 1,000

Sensitivity — % of sick people the test correctly flags = 99%

Specificity — % of healthy people correctly cleared = 95%

True positives (sick & tested +)—

False positives (healthy but tested +)—

If you tested +, chance you're sick—

If you tested NEG, chance you're healthy—

📖 Formulas & detailed explanation → dedicated page

UP NEXT —discrete and exponential distributions ▸ P5 Discrete & Exponential

▸ Dedicated page for this topic

P5 / Binomial / Poisson / Exp

Discrete & Exponential — Counting Probability Models

The normal is continuous and symmetric. But real data isn't always — coin flips are discrete, arrivals follow Poisson, wait times are exponential. Meet the other distributions that round out the toolkit.

Three core distributions every stats learner runs into: binomial, Poisson, and exponential. Slide through success counts, event counts, and waiting times to feel the bridge between discrete and continuous.

Experiment Guide — try these in order

Step 1: Binomial: set n=20, p=0.5 → symmetric bell. Change p to 0.1 → skews right.
Step 2: Set binomial n=50, p=0.06 → np≈3. Compare with Poisson(λ=3) — nearly identical shapes.
Step 3: Set Poisson λ=3 and check "E[X] = Var[X] = λ = 3.00" in the top-left of the graph. Raise λ to 20 → it approaches a bell.

When? → How many heads in 10 coin flips; how many defects in 100 items

Binomial B(n, p)

trials n 20

success p 0.35

As n → ∞ with np → λ, the binomial approaches Poisson.

Experiment Guide — Poisson

Step 1: λ=3 → average of 3 events. Most mass sits between 0 and 6. Check "E[X] = Var[X] = λ = 3.00" at the top-left.
Step 2: Lower λ to 1 → peak shifts to 0. Rare events dominate.
Step 3: Raise λ to 10 → starts looking bell-shaped. The normal is emerging.
Step 4: Push λ to 20 → nearly normal in shape. As λ grows, Poisson can be approximated by a normal distribution (Central Limit Theorem at work).

When? → Phone calls per hour at a call center; traffic accidents per day

Poisson(λ)

rate λ 3

As λ grows, Poisson approaches the normal distribution.

Experiment Guide — Exponential

Step 1: λ=1 → mean wait time = 1. The curve drops sharply.
Step 2: Set λ to 0.2 → gentle decay. Rare events mean long waits.
Step 3: Set λ to 3 → steep drop. Frequent events = short waits (mean 1/3).
Step 4: Every λ gives the same "L" shape. That is memorylessness — past waiting does not affect the future.

When? → Time until next phone call; how long until a light bulb burns out

Exponential(λ) — waiting time

rate λ 1

Memoryless: past waiting time tells you nothing about the future.

Memoryless Demo — Does "already waited 10 min" matter?

Left: normal-like waiting. The longer you wait, the more likely arrival becomes. Right: exponential. Move t all you want — the curve never changes. That's memorylessness.

Normal (everyday intuition)

← as t grows, shifts left ("should come soon")

Exponential (memoryless)

← no matter how much t changes, same shape

already waited t0

rate λ0.20

One Phenomenon, Three Views — A Call Center Hour

One process — "λ calls per hour" — viewed through three distributions simultaneously. Move λ and all three update. Same phenomenon, different angles.

calls per hour λ3.0

Binomial B(60, λ/60)

60 one-minute slots.
Each minute: call or no call.

Poisson Poi(λ)

How many calls total in one hour?

Exponential Exp(λ)

How long until next call?
(mean 60/λ min)

📖 Formulas & detailed explanation → dedicated page

UP NEXT —averages become normal ▸ I1 Central Limit Theorem

▼ What comes next — into Inference

Probability and distributions are locked in. Next up: inferring population truths from samples.
The Central Limit Theorem guarantees sample means go normal, the Law of Large Numbers says they converge to the truth. Then confidence intervals (how precise?) and hypothesis tests (is there a difference?), capped by the t, χ², F trio for the real world where σ is unknown.

INFERENCE

Statistical Inference — learning about populations from samples

▸ Dedicated page for this topic

I1 / Central Limit Theorem

Central Limit Theorem — Why Normal Is King

Normal distribution basics down. Now for statistical inference. What shape does the average of many samples take? Here comes the Central Limit Theorem: whatever you start with — dice, Poisson, anything — the average is pulled toward that same normal curve.

A fact worth pausing on — no matter how skewed the base distribution is, if you take n samples and average, then repeat, the distribution of those averages converges on its own to a bell (normal).
The lab below shows left = the raw skewed source side-by-side with right = the sample-mean distribution, so you can watch the bell emerge. Crank n up and the bell tightens (SE = σ/√n).

Experiment Guide — try these in order

Step 1: Use the dropdown above to choose "Exponential" and drag the n slider to 1, hit ▶ → still heavily skewed. Not a bell at all.
Step 2: Drag the n slider to 5 and run → starting to look bell-ish, but still skewed.
Step 3: Drag the n slider to 30 and run → nearly normal. "n≥30" is a practical rule of thumb, not a theorem — heavily skewed distributions may need more.
Step 4: Switch the dropdown to "Bimodal" and repeat → even a two-peaked distribution morphs into a bell. Worth watching twice.

Base distribution (the skewed one)

Sample size n = 30

Trials0

Mean of sample means—

SD of sample means—

Theoretical SE = σ/√n—

📖 Formulas & detailed explanation → dedicated page

UP NEXT —does the sample mean really converge? ▸ I2 Law of Large Numbers

▸ Dedicated page for this topic

I2 / Law of Large Numbers

Law of Large Numbers — Converging to Truth

The Central Limit Theorem showed that averages become normal. But does the sample mean actually converge to the true mean as n grows? That guarantee is the Law of Large Numbers. The Central Limit Theorem describes the shape; the Law of Large Numbers says the center won't run away.

10 heads in a row at the start of a coin-flip? Not that weird. But flip it 10,000 times and the head-ratio locks onto almost exactly 0.5.
That's the Law of Large Numbers — the more samples you draw, the more observed values get pulled toward the truth. This is why statistics counts as evidence, not a vague hunch.

Experiment Guide — try these in order

Step 1: Set p=0.5, hit ▶ → the line wobbles wildly at first, then gets pulled toward 0.5.
Step 2: RESET and run again → the early path is different every time, but it always converges.
Step 3: Change p to 0.8 and simulate → the red line now converges to 0.8.
Step 4: Set p to 0.05 (rare event) → it hugs zero early on, but still converges to p. The law holds.

Probability p = 0.5

Trials0

Current mean—

Theoretical0.50

📖 Formulas & detailed explanation → dedicated page

UP NEXT —how to quantify uncertainty with finite n ▸ I3 Confidence Interval

▸ Dedicated page for this topic

I3 / Confidence Intervals

Confidence Interval — What 95% Really Means

LLN says "at infinity, you're right." But in practice we always have a finite sample. So instead of a single point, drape a net around it — that's a confidence interval. Wider net, easier to catch; narrower, more precise. Watch the trade-off play out.

The 95% confidence interval is famously misunderstood.
It does NOT mean "the true value is inside with 95% probability". The correct reading: "repeat this sampling many times, and ~95% of the resulting intervals will capture the true value".
The lab below brute-forces that intuition. Thin pink = the unlucky intervals that missed. Once the pink share settles around ~5%, you've got it.

Experiment Guide — try these in order

Step 1: At 95% confidence, n=30, hit ▶ → pink (missed) intervals should be ~5% of the total.
Step 2: Drop confidence to 80% and regenerate → more pink. Narrower net = more misses.
Step 3: Back to 95%, set n to 200 → intervals get much tighter. The power of large samples.
Step 4: Set n to 5 → intervals are huge. With few samples, you need a wide net to catch the truth.

n = 30

Confidence = 95%

Intervals built0

Coverage—

Expected95%

📖 Formulas & detailed explanation → dedicated page

UP NEXT —from width to yes/no ▸ I4 Hypothesis Testing

▸ Dedicated page for this topic

I4 / Hypothesis Testing

Hypothesis Testing — Reject or Fail to Reject

If a CI expresses uncertainty as a width, hypothesis testing turns it into a yes/no decision. Under the null world, could this data have happened? If it's too unlikely, reject. Same distribution, same σ/n — just a different question.

Think of testing as a trial.
You start by assuming H₀ ("the drug has no effect" = "innocent"). Then if your computed test statistic z lands in the pre-chosen rejection region, you convict — that is, reject H₀.
Two panels below: ① geometry of z and rejection regions (two-sided, right, left), and ② false alarms (α) vs. misses (β).

Experiment Guide — try these in order

Step 1: Panel ①: z=1.96, α=0.05, two-sided → right on the boundary. p≈0.05. The watershed.
α (significance level) = the threshold for "too extreme to be coincidence." It sets the width of the rejection zone.
Step 2: Drag the "Observed z" slider to 2.5 → deep in the rejection zone, p-value shrinks. "Strong evidence."
Step 3: Switch the test-type dropdown to "Right" → same z=1.96 but rejection area is one-sided; p-value halves.

▶ ① Basics: z-statistic & rejection region

Observed z = 1.96

α = 0.05

Test type

Test statistic z—

Critical value—

p-value—

Decision—

Experiment Guide — try these in order

Step 1: δ=2, α=0.05 → high power (most of the purple curve falls in the rejection region).
Step 2: Lower δ to 0.5 → purple and blue nearly overlap; power drops sharply. Small effects are hard to detect.
Step 3: Tighten α to 0.01 → rejection region shrinks, β (misses) increases. The trade-off in action.
Step 4: Drag horizontally on the chart → move the critical boundary and feel the α vs β tug-of-war.

▶ ② Two kinds of errors: α, β, power

This panel is a hands-on playground for the α (Type I) / β (Type II) / power trade-off.
Come back whenever you want to feel the trade-off. For the deeper conceptual write-up (the 2×2 matrix, why α and β live in different worlds), see the "Type I & II in a 2×2 table" column.
Tip: drag horizontally on the chart to slide the critical boundary (α).

Effect size δ = 2.0

α = 0.050

α (Type I error)—

Critical value—

β (Type II error)—

Power 1−β—

📖 Formulas & detailed explanation → dedicated page

UP NEXT —from means to proportions ▸ I5 Proportion Test

▸ Dedicated page for this topic

I5 / Proportion Test & Estimation

Proportion Test & Estimation — from sample proportion to the truth

The logic is exactly the same as when we did confidence intervals and hypothesis tests for means. The only twist: data is now "success or failure" counts instead of continuous measurements, so the standard error formula changes shape. Play with the sliders here and the textbook formulas read more easily afterward.

A world where every data point is just "success" or "failure." The sample proportion p̂ = x/n is the starting point, and when n is large enough the normal approximation kicks in (thanks to the Central Limit Theorem).
We'll work through ① interval estimation → ② one-sample test → ③ two-sample test. Compare each step with what you learned for means — spotting the similarities makes the differences easy to absorb.

Experiment Guide — try these in order

Step 1: n=15, p̂=0.50, 95% → the interval is really wide. Just 15 people isn't enough precision.
Step 2: Push n to 100 → the interval tightens up fast. Feel the power of sample size.
Step 3: Set p̂ to 0.90 → variance p(1-p) shrinks, so the CI narrows. p̂=0.50 gives the widest interval.
Step 4: Keep p̂=0.90, drop n to 10 → if ⚠ appears, the normal approximation conditions aren't met.

▶ ① Confidence interval for a proportion

Sample size n = 15

Sample proportion p̂ = 0.50

Confidence level

Standard error SE—

CI lower bound—

CI upper bound—

Margin of error E—

Simulation — verify what "95%" really means

"What does a 95% CI actually mean?" — easy to lose the thread by chasing the words alone. Reframed: "if you repeated the same survey many times, about 95% of the intervals would capture the true proportion."
Run it 200 times and see whether that 95% claim holds up.

▶ ①-b CI simulation

True proportion p = 0.50

Sample size n = 30

Confidence level

Generated0

Coverage rate—

Experiment Guide — experience proportion testing

Step 1: n=100, p̂=0.60, p₀=0.50, α=0.05, two-sided → "Is 60% significantly different from 50%?"
Step 2: Change p₀ to 0.55 → z shrinks and you can no longer reject. Small differences are hard to detect.
Step 3: Increase n to 400 → same p̂=0.60, p₀=0.55 but now you reject. The power of sample size.
Step 4: Switch test type to "Right" → one-sided test asking only "greater than 50%?" The p-value halves.

▶ ② One-sample z-test for a proportion

Same flow as testing a population mean. Set up H₀: p = p₀, compute z from the sample, and check whether it lands in the rejection region.
The only twist is that the standard error becomes √(p₀(1−p₀)/n). Nail that, and the rest is familiar.

Sample size n = 100

Sample proportion p̂ = 0.60

Null hypothesis p₀ = 0.50

Significance level α = 0.050

Test type

Test statistic z—

Critical value—

p-value—

Decision—

Experiment Guide — compare two groups

Step 1: n₁=n₂=100, p̂₁=0.60, p̂₂=0.45 → is the difference significant?
Step 2: Move p̂₂ toward 0.55 → z shrinks, rejection gets harder.
Step 3: Increase both to n₁=n₂=400 → same gap, more power.
Step 4: Try n₁=50, n₂=200 (asymmetric) → the smaller n is the bottleneck.

▶ ③ Two-proportion z-test

"Drug A vs. Drug B — which works better?" "Ad A vs. Ad B — is the click-through rate really different?" This test is for comparing two groups.
The key idea: under H₀: p₁ = p₂, we pool both samples into a single pooled proportion to build a shared SE. Get that, and the rest mirrors the one-sample test.

n₁ = 100

p̂₁ = 0.60

n₂ = 100

p̂₂ = 0.45

Significance level α = 0.050

Test type

Pooled proportion p̂—

Test statistic z—

p-value—

Decision—

📖 Formulas & detailed explanation → dedicated page

UP NEXT —into the world where σ is unknown ▸ I6 t, χ², F

▸ Dedicated page for this topic

I6 / t · χ² · F Distributions

Three Test Distributions — Meet t, χ² & F

Up to now we've tested means assuming σ is known. In practice you must estimate σ too — and the moment you do, Z morphs into t. Test a variance directly: χ². Compare two variances: F. All descendants of N(0,1); the name changes based on what you don't know.

t, χ², F are all derived from the normal. Think of them as "the standard normal, scaled to reflect that we only ever see a sample".
Use them for: t — testing a mean when the population variance is unknown (i.e. nearly every real test of a mean); χ² — testing a variance, independence, goodness-of-fit for categorical data; F — ratios of variances (ANOVA, the overall F in regression).
Slide df: t converges to N(0,1) as df→∞, and χ²/F get more symmetric with more df. The Central Limit Theorem is quietly doing the work under the hood.

Experiment Guide — try these in order

Step 1: n=3 → df=1, extremely heavy tails. Compare with the normal (gray dashed line).
Step 2: n=10 → still heavier tails than normal, but getting closer.
Step 3: n=31 → nearly indistinguishable from N(0,1). The CI bars nearly overlap.

▶ t distribution

Built from: t = Z / √(χ²ₖ/k), Z~N(0,1).
Use for: testing means with unknown variance, regression t-values.
Flavor: heavier tails than N(0,1); matches N(0,1) as df→∞.

Sample size n = 5 (df = 4)

Confidence: 90% 95% 99%

↔ Drag the graph horizontally to change n

Sample size n 5

df 4

Confidence 95%

t critical ±2.776

z critical ±1.960

t / z ratio 1.416

With small samples, a normal-based CI is too narrow — overconfident. The t-distribution honestly reflects that extra uncertainty. As n grows, t converges to normal — that's what the two bars show.

Experiment Guide — χ² distribution

Step 1: Keep "Fair die" selected and press Roll a few times → even a fair die varies each time. Watch the bar chart and χ² statistic change.
Step 2: Switch to "Loaded" and roll → face 1 jumps out, χ² enters the rejection region.
Step 3: Increase the rolls slider → more samples detect smaller biases (higher power).
Step 4: Move the df slider to explore the shape. df=30 → nearly normal.

▶ χ² distribution

Built from: χ²ₖ = Z₁² + Z₂² + ... + Zₖ² (sum of k squared standard normals).
Use for: variance tests, chi-square tests of independence / goodness-of-fit.
Flavor: non-negative, right-skewed. Mean = k, variance = 2k. Goes bell-shaped with large df.

Rolls n = 60

df (k) = 5

↔ Drag the graph horizontally to change df

Rolls n 60

df 5

Test statistic χ² —

χ² critical 11.070

p-value —

Verdict —

The chi-squared distribution measures "how big is the gap between observed and expected." It tells you whether that gap is just random noise or a real bias.

Experiment Guide — F distribution

Step 1: Set A and B to the same SD (e.g. 10 and 10) → F≈1, not rejected.
Step 2: Increase B only (e.g. A=5, B=15) → F grows and enters the rejection region.
Step 3: Increase n → same SD gap but lower p-value (more power).
Step 4: Alternate A and B's SD to feel what "variance ratio" means.

▶ F distribution

Built from: F = (χ²ₘ/m) / (χ²ₙ/n) (ratio of two independent χ² / df).
Use for: ANOVA, overall F-test in regression.
Flavor: non-negative, right-skewed. Shape depends on both df.

Group A SD = 8

Group B SD = 15

Sample size n = 20 (df₁ = df₂ = 19)

↔ Drag the graph horizontally to change n

Group A SD 8

Group B SD 15

df₁, df₂ 19, 19

F statistic 3.516

F critical 2.168

p-value 0.005

The F-distribution evaluates the ratio of two groups' spread. ANOVA also uses this F-statistic to test whether group means differ.

📖 Formulas & detailed explanation → dedicated page

UP NEXT —quantifying "deviation" and testing it ▸ I7 Chi-Squared Test

▸ Dedicated page for this topic

I7 / Chi-Squared Test

Chi-Squared Test — Quantifying the Gap

So far we've tested means. But some data is purely categorical — survey choices, dice outcomes, disease × smoking. The chi-squared test quantifies "deviation from expectation" for these counts. The bigger the mismatch, the brighter the χ² statistic glows.

Goodness-of-fit asks: "Does the observed category distribution match a theoretical one?" Classic example: is the die fair?
Test of independence asks: "Are two categorical variables independent?" Compute χ² = Σ (O−E)²/E across every cell of the contingency table.
Why divide by E? → A deviation of 2 from an expected 10 matters more than 2 from an expected 1,000. Dividing by E turns raw gaps into relative ones.
Both use a χ²-distributed statistic; the p-value is the right-tail area. df = k−1 for goodness-of-fit, (r−1)(c−1) for independence.

Experiment Guide — try these in order

Step 1: ① Goodness-of-fit: select "Fair die" and auto-roll → as n grows, χ² stays low, p-value stays high (fail to reject).
Step 2: Switch to "Loaded" and auto-roll → χ² shoots up, p-value drops below α. The cheat is caught.
Step 3: ② Independence: click only the top-left cell to create imbalance → χ² spikes, independence is rejected.
Step 4: Reset and click cells evenly → χ² stays small. No imbalance = independence holds.

▶ ① Goodness-of-Fit — Is the Die Fair?

🎲 Click a bar on the left to add one roll (Shift+click to subtract)

Significance α = 0.05

Rolls n 0

Test statistic χ² —

df —

p-value —

Decision —

Experiment Guide — try these in order

Step 1: Click the top-left and bottom-right cells repeatedly → diagonal bias raises χ², verdict: "not independent."
Step 2: Reset, then click all cells evenly → χ² stays small. No bias = independent.
Step 3: Set α to 0.01 → higher bar for rejection. Same data might flip the verdict.
Step 4: Flood a single cell → huge gap from expected frequencies, p-value plummets.

▶ ② Test of Independence — Are Two Variables Independent?

Click a cell on the left to add +1 (Shift+click for −1)

Significance α = 0.05

Total n 0

Test statistic χ² —

df —

p-value —

Decision —

📖 Formulas & detailed explanation → dedicated page

UP NEXT —comparing three or more groups at once ▸ I8 ANOVA

▸ Dedicated page for this topic

I8 / ANOVA

ANOVA — Feel What the F-Statistic Really Means

The chi-squared test quantified "category deviations." But how do we test whether means differ across three or more groups? Repeating t-tests inflates false positives — ANOVA compares all groups in a single test.

ANOVA compares "how different the groups are" vs. "how spread out each group is."
When the difference outweighs the spread, we conclude the groups really differ. Move the sliders to feel it.

Experiment Guide — Multiple Comparisons Problem

Step 1: Click ▶ 100 trials — runs 100 experiments where 3 identical groups are compared with 3 pairwise t-tests.
Step 2: Check the red dot ratio — theory predicts 14.3%. What do you get?
Step 3: Click ▶ 1000 trials — as trials accumulate, the rate converges toward 14.3%.

▶ Why You Can't Just Repeat t-Tests

First, experience why ANOVA is needed. With 3 identical groups (N(0,1), n=20), repeat 3 pairwise t-tests — how bad does the false-positive rate get?

Trials 0

False positives 0

FP rate 0.0%

The F-test (ANOVA) below is how we avoid the false-positive inflation you just saw in the simulation above.

Experiment Guide — Feel the F-Statistic

Step 1: Set between-group difference to zero → F ≈ 1. All groups look like one population.
Step 2: Increase between-group difference → F rises, p drops. Watch it cross the rejection threshold.
Step 3: Increase within-group spread → same difference but F drops. "Real differences can hide in noise."
Step 4: Increase sample size → F rises. Larger samples detect smaller effects (statistical power).

▶ Between vs. Within — Feel the F-Statistic

Between-group diff = 1.5

Within-group spread (σ) = 1.5

n per group = 20

F —

Between df —

Within df —

p —

SSB —

SSW —

η² —

Decision —

📖 Formulas & detailed explanation → dedicated page

UP NEXT —measuring the "link" between two variables ▸ M0 Correlation

▼ What comes next — into Modeling

You can now test for differences and independence. Next: finding relationships and predicting.
Correlation measures the "link" between two variables, then simple regression turns it into prediction. Then multiple regression controls for confounders to measure true effects — tests, CIs, and the F-distribution all converge here. The full statistical toolkit comes together.

MODELING

Modeling — finding relationships and making predictions

▸ Dedicated page for this topic

M0 / Correlation

Correlation — Measuring the Link Between Two Variables

The chi-squared test quantified associations between categories. But how do we measure the relationship between continuous numbers — height vs. weight, study time vs. test scores? → The correlation coefficient r compresses "how much they move together" into a single number from −1 to +1.

The correlation coefficient r measures whether two variables move together. +1 means a perfect positive linear relationship, −1 a perfect negative one, 0 means no linear relationship. Click the canvas to add points and watch r update in real time. The yellow dashed lines mark the means, splitting the space into four quadrants — more points in green quadrants means positive correlation, more in red means negative. This coloring is the "tug-of-war of signs" in the formula Σ(x−x̄)(y−ȳ).

Experiment Guide — Feel the Correlation

Step 1: Set r = 0.80, click Generate → an upward-sloping band. Points cluster in the green quadrants.
Step 2: Change to r = −0.60, Generate → downward slope. More points in the red quadrants.
Step 3: Set r = 0.00, Generate → points spread evenly across all four quadrants. A "cloud," not a band.
Step 4: CLEAR, then manually place a U-shape → r ≈ 0 yet there's an obvious pattern! r only captures linear relationships.

Target r = 0.80

Points n = 50

↑ Click canvas to add points

Correlation r—

R²—

Covariance—

Experiment Guide — Peek Behind the Numbers

Step 1: Watch the animation. Points appear one by one, patterns look totally different…
Step 2: Check each plot's r ≈ 0.816. They're all nearly identical!
Step 3: Regression lines appear → lines are nearly identical too. Yet II is curved, III has an outlier, IV is dominated by one point.
Step 4: Click "Replay" to watch again. Numbers alone don't tell the whole story.

▶ The pitfall of r — Anscombe's Quartet

📖 Formulas & detailed explanation → dedicated page

UP NEXT —turning relationships into predictions ▸ M1 Simple Regression

▸ Dedicated page for this topic

M1 / Simple Regression

Simple Regression — Drawing the OLS Line

Correlation measured "how strong the link is." Next: "if x goes up by 1, how much does y move?" — simple regression turns the relationship into prediction. One line through two variables — and the t-tests and CIs you just learned power the inference on its slope β̂.

Regression with just one explanatory variable is simple regression. It assumes a linear relationship: when x increases by 1, y moves by β₁. Ordinary least squares (OLS) picks the line that minimizes the sum of squared vertical residuals. Click the canvas to add points and watch the line snap into place. Green bars are residuals. R² (in 0–1) measures how much of y the line explains.

Experiment Guide — try these in order

Step 1: Hit "Random 20 pts" → a regression line and R² appear. Check the green bars (residuals).
Step 2: Click far from the line to add one outlier → the line jerks toward it. Watch how far a single point can drag the fit.
Step 3: CLEAR and place 5 points nearly in a line → R² ≈ 1.0. A perfect linear relationship.
Step 4: CLEAR and arrange points in a circle → R² ≈ 0. A line can't capture this pattern.

↑ Click the canvas to add points

Slope β₁—

Intercept β₀—

R²—

Correlation r—

📖 Formulas & detailed explanation → dedicated page

UP NEXT —controlling for everything else ▸ M2 Multiple Regression

▸ Dedicated page for this topic

M2 / Multiple Regression

Multiple Regression — Predict with multiple variables

Simple regression predicted scores from study hours alone. But what if heavy studiers sleep less — and lost sleep drags scores down? The true effect of studying gets masked by a hidden variable. Multiple regression controls for other factors to isolate each variable's real contribution.

One variable gives a line; two give a plane in 3D. But the point isn't geometry — it's removing confounding to isolate each variable's true effect. Start with the side-by-side comparison to see the moment β₁ shifts.

Experiment Guide — Feel Confounding

Step 1: At default (corr = −0.5), compare β₁ left vs. right. Simple regression is smaller — the study effect is underestimated.
Step 2: Set correlation to 0 → both β₁ values nearly match. "No confounding, no bias."
Step 3: Set correlation to +0.5 → now simple regression β₁ is too large. Confounding can bias in either direction.
Step 4: Check R² too. Multiple regression is always ≥ simple — adding a variable improved explanatory power.

▶ Simple vs. Multiple Regression — Watch β₁ Shift

Same data, two models: "study hours only" vs. "study + sleep." Adjust the correlation slider to change confounding strength.

Study↔Sleep correlation = −0.50

Sample size n = 50

Simple β₁ —

Multiple β₁ —

Gap (confounding bias) —

Simple R² —

Multiple R² —

Experiment Guide — Feel the Regression Plane in 3D

Step 1: Drag the 3D plot to rotate. The translucent surface is the regression plane — data points align along it.
Step 2: Move the study slider → the prediction dot slides along the x₁ direction. The tilt = β₁.
Step 3: Move sleep too → it moves along x₂. The tilt = β₂. Each variable's contribution is visible.
Step 4: Hit Resample a few times → β₁, β₂, R² shift slightly each time. Estimates have variability too.

Study hours x₁ = 5.0h

Sleep hours x₂ = 7.0h

—

β̂₁ (per study hour)—

β̂₂ (per sleep hour)—

R²—

📖 Formulas & detailed explanation → dedicated page

📊 Column: Feel multiple regression through income prediction →

▼ The Big Picture

From the standard normal through probability, distributions, inference, and regression — the core scope of introductory statistics, end to end.
Every page is a descendant of N(0,1): t = "normal with unknown σ," χ² = "sum of squared normals," F = "ratio of χ²'s," regression tests use t and F. Dive deeper into each topic's dedicated page for formulas and detailed explanations.

Columns

Tools

Standard Normal — The Origin of Everything

▶ "68 - 95 - 99.7" — no memorization, just see it

▶ Watch every normal collapse onto "that one curve"

Normal Distribution — Shaping Mean & Spread

Probability Rules — Intuition with Venn Diagrams

▶ Interactive Venn Diagram

Bayes' Theorem — The Posterior Plot Twist

Discrete & Exponential — Counting Probability Models

▼ What comes next — into Inference

Central Limit Theorem — Why Normal Is King

Law of Large Numbers — Converging to Truth

Confidence Interval — What 95% Really Means

Hypothesis Testing — Reject or Fail to Reject

▶ ① Basics: z-statistic & rejection region

▶ ② Two kinds of errors: α, β, power

Proportion Test & Estimation — from sample proportion to the truth

▶ ① Confidence interval for a proportion

▶ ①-b CI simulation

▶ ② One-sample z-test for a proportion

▶ ③ Two-proportion z-test

Three Test Distributions — Meet t, χ² & F

▶ t distribution

▶ χ² distribution

▶ F distribution

Chi-Squared Test — Quantifying the Gap

▶ ① Goodness-of-Fit — Is the Die Fair?

▶ ② Test of Independence — Are Two Variables Independent?

ANOVA — Feel What the F-Statistic Really Means

▶ Why You Can't Just Repeat t-Tests

▶ Between vs. Within — Feel the F-Statistic

▼ What comes next — into Modeling

Correlation — Measuring the Link Between Two Variables

▶ The pitfall of r — Anscombe's Quartet

Simple Regression — Drawing the OLS Line

Multiple Regression — Predict with multiple variables

▶ Simple vs. Multiple Regression — Watch β₁ Shift

▼ The Big Picture