Skip to main content

STATPLAY

[ PROBABILITYINFERENCEMODELING ]

Statistics is way more fun than it looks.
Forget rote formulas — here the graphs move on their own, and the idea takes shape as you watch.

Come here when your textbook makes sense — but nothing becomes a picture.

▼ What this site is for
If you're learning statistics on your own and got stuck at "wait, what does this formula even mean?" — this is your first step back.
This is a place to rebuild intuition. Systematic study and calculation drills belong to your textbook and problem set. But the "classic misreading" of the 95% confidence interval, or why the t-distribution even exists — drag a slider and you'll see it.
Once "oh, that's what it means" lands, go back to your textbook with confidence. That's the whole point.
▼ Where did your textbook lose you?
You don't have to go through everything in order. Jump straight to the concept that made you stop. You're not the only one who got stuck there.
What even is a normal distribution? | What is standardization actually doing? | How do I actually run a test? | Bayes' theorem makes no sense | The Central Limit Theorem stays fuzzy
Once "oh, that's what it means" hits, go back to your textbook — it'll read differently.

Columns

Formulas alone rarely make the picture vivid. Start from everyday surprises and the math behind them becomes intuitive.
What Is Hensachi? — Japan's School Score Is Just a Rescaled z-Score →
A 10-point gap in hensachi holds 5× the rarity. See what the number really measures, interactively. Related: Standard Normal · Normal Distribution · Central Limit Theorem
The Birthday Paradox — 23 People, 50%+ Chance →
With just 23 people, a shared birthday is more likely than not. Experience the paradox interactively. Related: Probability Rules · Discrete & Exponential · Chi-Squared Test
What Is Standardization? — The Universal Translator for "Normal" →
182 cm tall vs. TOEIC 860 — which is further from "average"? Learn how Z=(X−μ)/σ lets you compare across units in 3 minutes. Related: Standard Normal · Normal Distribution · Central Limit Theorem · Law of Large Numbers · Hypothesis Testing · Confidence Interval
Income Prediction — How Far Can Statistics Go? →
Age, gender, prefecture — just three inputs reveal a range of income. A hands-on column on multiple regression. Related: Simple Regression · Multiple Regression · Correlation · Confidence Interval
Type I vs Type II Errors — One 2×2 Table Sorts Them Out →
α, β, power, and effect size all live on a single 2×2 table. Once you see it, the confusion disappears. Related: Hypothesis Testing · Confidence Interval · Bayes' Theorem · ANOVA · Proportion Test · Three Test Distributions
Standard Deviation vs Standard Error — Telling SD and SE Apart in One Picture →
Standard deviation (SD) measures the spread of individuals; standard error (SE) measures the precision of the mean. Same-looking formulas, opposite reactions to n. Related: Central Limit Theorem · Confidence Interval · Hypothesis Testing · Law of Large Numbers · Three Test Distributions

Tools

Interactive Distribution Tables — Normal · t · χ² · F →
Four distribution tables from the back of your textbook, synced with live graphs. Click any cell and see its position highlighted on the curve in real time.
PROBABILITY

Probability & Distributions — quantifying uncertainty

P1 / Standard Normal

Standard Normal — The Origin of Everything

Let's start with the curve that dominates all of statistics — the standard normal. The Central Limit Theorem, hypothesis testing, confidence intervals — everything circles back here. Touch the bell curve first.

Honestly — without this single curve, none of what follows (tests, confidence intervals, the t-distribution, regression) would work.
The standard normal N(0, 1) is a bell curve with mean 0 and standard deviation 1. The one-line trick lets every normal distribution collapse onto this same curve — and that's how a single paper table can compute probabilities for the entire world.
In other words, it's not the final boss of statistics; it's the origin. Once you own this, the rest reads as "applications of the standard normal".

Experiment Guide — try these in order
  1. Step 1: Set k to 1.0 → area is ~68%. That's "±1σ covers about 70%."
  2. Step 2: Set k to 1.96 → area is ~95%. You'll see this number everywhere in testing and CIs.
  3. Step 3: Stretch k to 3.0 → nearly 100%. Almost nothing lies outside.

▶ "68 - 95 - 99.7" — no memorization, just see it

Slide the width k; the blue-filled area IS the probability. ± 1σ already covers ~68%, ± 2σ is ~95%, ± 3σ is nearly everything.
That famous number z = 1.96? It's the two-tail 5% critical value — hypothesis tests and confidence intervals all start there.

P( |Z| ≤ k )
P( Z ≤ k )
Outside prob.
Experiment Guide — try these in order
  1. Step 1: Press ▶ Standardize → the μ=2, σ=1.5 curve morphs smoothly into N(0,1).
  2. Step 2: Change μ to −2, σ to 2.5, then ▶ again → a totally different curve snaps onto the same pink one.
  3. Step 3: Pause the progress slider midway → watch μ approach 0 and σ approach 1 in real time.
  4. Step 4: Set σ to 0.5 → a sharp peak flattens as it merges into the standard normal.

▶ Watch every normal collapse onto "that one curve"

Height, IQ, blood pressure readings, factory part errors — real-world normal-ish things all have different means and spreads. Yet apply z = (x − μ) / σ and they all snap onto that pink curve.
It auto-plays on scroll (▶ to replay). That's why every statistical formula needs only one standard-normal table.

OriginalN(2.0, 1.5²)
Transformed mean
Transformed σ
UP NEXT —the normal as a tool P2 Normal Distribution
P2 / Normal & Standardization

Normal Distribution — Shaping Mean & Spread

The standard normal was fixed at μ=0, σ=1. Real data has any mean and spread. Move μ and σ to wield the normal as a tool. Standardization maps it back to Z, so any normal connects to the standard normal.

The general version of the standard normal is N(μ, σ²). μ sets the center, σ sets the spread. Slide the parameters and the curve glides; the probability of falling inside [a, b] (pink area) updates live.
That pink area IS the "percentage" you hear in the news. Say adult male heights follow N(170, 36) (mean 170cm, σ=6cm). What share falls in 165–175cm? Standardize and compute z-scores — you get ≈ 59.6%. Test scores, measurement errors, IQ — anything roughly normal gets its "X% of people in this range" from exactly this area. The sliders below use a standardized scale (μ=0, σ=1 range) so you can feel the same principle.
Tip: drag directly on the graph to move the a/b bounds — whichever handle is closest follows your finger.

Experiment Guide — try these in order
  1. Step 1: Keep μ=0, σ=1, set a=−1, b=1 → ~68.3%. That's "±1σ covers ~70%."
  2. Step 2: Shrink σ to 0.5 → the pink area for the same [−1, 1] explodes. Less spread = almost everyone is in range.
  3. Step 3: Slide μ to 2 → the whole curve shifts. Same a,b, but area changes dramatically.
  4. Step 4: Drag directly on the graph → the nearest a/b boundary follows your finger.
P(a ≤ X ≤ b)
z-score (a)
z-score (b)
UP NEXT —probability rules with Venn diagrams P3 Probability Rules
P3 / Probability Rules

Probability Rules — Intuition with Venn Diagrams

You've got the shape of the normal. Now let's step back to the foundations — the probability rules that make all of it work. Addition, multiplication, and conditional probability visualized as overlapping areas.

Addition rule, multiplication rule, conditional probability — see them as areas before memorizing formulas.
Symbol cheat-sheet: ∪ = "or" (union), ∩ = "and" (intersection), P(A|B) = "probability of A given B happened"
P(A∪B) = P(A) + P(B) − P(A∩B) is just "area of two circles minus the overlap." Conditional probability P(A|B) is "the fraction of B's circle occupied by A."
Press the Independence button to snap P(A∩B) = P(A)·P(B) — that's what independence means.

Concrete example
Draw one card from a 52-card deck. A = heart (13/52 = 0.25), B = face card (12/52 ≈ 0.23).
A∩B = heart face card (3/52 ≈ 0.06). → P(A∪B) = 0.25 + 0.23 − 0.06 = 0.42.
Independence example: two dice. A = 1st is even, B = 2nd is ≥3. The 1st roll doesn't affect the 2nd, so independent. P(A∩B) = 1/2 × 2/3 = 1/3.
Experiment Guide — try these in order
  1. Step 1: Set P(A)=0.4, P(B)=0.3, P(A∩B)=0.12 → check the P(A∪B) readout below the graph shows 0.58. That's 0.4+0.3−0.12 = 0.58, the addition rule.
  2. Step 2: Press "Set Independent" → P(A∩B) auto-adjusts to P(A)·P(B). That's what independence means.
  3. Step 3: Drag P(A∩B) near 0 → the circles separate. This is "mutually exclusive" (can't happen together).
  4. Step 4: Push P(A∩B) close to P(B) → the P(A|B) readout approaches 1. If B happens, A almost certainly happens too.

▶ Interactive Venn Diagram

P(A∪B)
P(A|B)
P(B|A)
Independent?
UP NEXT —flipping the conditional P4 Bayes' Theorem
P4 / Bayes' Theorem

Bayes' Theorem — The Posterior Plot Twist

Distribution toolbox complete. Finally, Bayes' theorem flips the conditioning. You test positive — what's the chance you're actually sick? Intuition fails here; let's build it.

"The test has 99% sensitivity & 95% specificity, and you tested positive" — is there a 99% chance you're sick?
Answer: only 16.7%. More than half of doctors get this classic quiz wrong.
Walk through the numbers. In a town of 1,000, 10 people are sick and 990 are healthy. Test everyone and 60 come back positive — 10 truly sick (true positives) + 50 healthy but wrongly flagged (false positives). If you're one of those 60, your chance of actually being sick is 10 ÷ 60 = 16.7%. The larger the healthy majority, the more false positives dilute the real cases.

Experiment Guide — try these in order
  1. Step 1: Sweep prevalence between 0.001 (general population) and 0.4 (high-risk group) → with the same 99% sensitivity and 95% specificity, PPV swings from 1.96% to 92.86%. Feel how strongly the prior dominates.
  2. Step 2: At prevalence 0.001, PPV is just 1.96% — only ~2 of 100 positives are actually sick. Feel the gap from intuition.
  3. Step 3: Keep prevalence at 0.001 but raise specificity to 99.9% → false positives plummet, PPV improves dramatically.
  4. Step 4: Watch the "town of 1,000" diagram and compare TP (true positive) vs FP (false positive) counts.
True positives (sick & tested +)
False positives (healthy but tested +)
If you tested +, chance you're sick
If you tested NEG, chance you're healthy
UP NEXT —discrete and exponential distributions P5 Discrete & Exponential
P5 / Binomial / Poisson / Exp

Discrete & Exponential — Counting Probability Models

The normal is continuous and symmetric. But real data isn't always — coin flips are discrete, arrivals follow Poisson, wait times are exponential. Meet the other distributions that round out the toolkit.
Three core distributions every stats learner runs into: binomial, Poisson, and exponential. Slide through success counts, event counts, and waiting times to feel the bridge between discrete and continuous.
Experiment Guide — try these in order
  1. Step 1: Binomial: set n=20, p=0.5 → symmetric bell. Change p to 0.1 → skews right.
  2. Step 2: Set binomial n=50, p=0.06 → np≈3. Compare with Poisson(λ=3) — nearly identical shapes.
  3. Step 3: Set Poisson λ=3 and check "E[X] = Var[X] = λ = 3.00" in the top-left of the graph. Raise λ to 20 → it approaches a bell.
When? → How many heads in 10 coin flips; how many defects in 100 items
Binomial B(n, p)
20
0.35
As n → ∞ with np → λ, the binomial approaches Poisson.
Experiment Guide — Poisson
  1. Step 1: λ=3 → average of 3 events. Most mass sits between 0 and 6. Check "E[X] = Var[X] = λ = 3.00" at the top-left.
  2. Step 2: Lower λ to 1 → peak shifts to 0. Rare events dominate.
  3. Step 3: Raise λ to 10 → starts looking bell-shaped. The normal is emerging.
  4. Step 4: Push λ to 20 → nearly normal in shape. As λ grows, Poisson can be approximated by a normal distribution (Central Limit Theorem at work).
When? → Phone calls per hour at a call center; traffic accidents per day
Poisson(λ)
3
As λ grows, Poisson approaches the normal distribution.
Experiment Guide — Exponential
  1. Step 1: λ=1 → mean wait time = 1. The curve drops sharply.
  2. Step 2: Set λ to 0.2 → gentle decay. Rare events mean long waits.
  3. Step 3: Set λ to 3 → steep drop. Frequent events = short waits (mean 1/3).
  4. Step 4: Every λ gives the same "L" shape. That is memorylessness — past waiting does not affect the future.
When? → Time until next phone call; how long until a light bulb burns out
Exponential(λ) — waiting time
1
Memoryless: past waiting time tells you nothing about the future.
Memoryless Demo — Does "already waited 10 min" matter?
Left: normal-like waiting. The longer you wait, the more likely arrival becomes. Right: exponential. Move t all you want — the curve never changes. That's memorylessness.
Normal (everyday intuition)
← as t grows, shifts left ("should come soon")
Exponential (memoryless)
← no matter how much t changes, same shape
0
0.20
One Phenomenon, Three Views — A Call Center Hour
One process — "λ calls per hour" — viewed through three distributions simultaneously. Move λ and all three update. Same phenomenon, different angles.
3.0
Binomial B(60, λ/60)
60 one-minute slots.
Each minute: call or no call.
Poisson Poi(λ)
How many calls total in one hour?
Exponential Exp(λ)
How long until next call?
(mean 60/λ min)
UP NEXT —averages become normal I1 Central Limit Theorem

▼ What comes next — into Inference

Probability and distributions are locked in. Next up: inferring population truths from samples.
The Central Limit Theorem guarantees sample means go normal, the Law of Large Numbers says they converge to the truth. Then confidence intervals (how precise?) and hypothesis tests (is there a difference?), capped by the t, χ², F trio for the real world where σ is unknown.

INFERENCE

Statistical Inference — learning about populations from samples

I1 / Central Limit Theorem

Central Limit Theorem — Why Normal Is King

Normal distribution basics down. Now for statistical inference. What shape does the average of many samples take? Here comes the Central Limit Theorem: whatever you start with — dice, Poisson, anything — the average is pulled toward that same normal curve.

A fact worth pausing on — no matter how skewed the base distribution is, if you take n samples and average, then repeat, the distribution of those averages converges on its own to a bell (normal).
The lab below shows left = the raw skewed source side-by-side with right = the sample-mean distribution, so you can watch the bell emerge. Crank n up and the bell tightens (SE = σ/√n).

Experiment Guide — try these in order
  1. Step 1: Use the dropdown above to choose "Exponential" and drag the n slider to 1, hit ▶ → still heavily skewed. Not a bell at all.
  2. Step 2: Drag the n slider to 5 and run → starting to look bell-ish, but still skewed.
  3. Step 3: Drag the n slider to 30 and run → nearly normal. "n≥30" is a practical rule of thumb, not a theorem — heavily skewed distributions may need more.
  4. Step 4: Switch the dropdown to "Bimodal" and repeat → even a two-peaked distribution morphs into a bell. Worth watching twice.
Trials0
Mean of sample means
SD of sample means
Theoretical SE = σ/√n
UP NEXT —does the sample mean really converge? I2 Law of Large Numbers
I2 / Law of Large Numbers

Law of Large Numbers — Converging to Truth

The Central Limit Theorem showed that averages become normal. But does the sample mean actually converge to the true mean as n grows? That guarantee is the Law of Large Numbers. The Central Limit Theorem describes the shape; the Law of Large Numbers says the center won't run away.

10 heads in a row at the start of a coin-flip? Not that weird. But flip it 10,000 times and the head-ratio locks onto almost exactly 0.5.
That's the Law of Large Numbers — the more samples you draw, the more observed values get pulled toward the truth. This is why statistics counts as evidence, not a vague hunch.

Experiment Guide — try these in order
  1. Step 1: Set p=0.5, hit ▶ → the line wobbles wildly at first, then gets pulled toward 0.5.
  2. Step 2: RESET and run again → the early path is different every time, but it always converges.
  3. Step 3: Change p to 0.8 and simulate → the red line now converges to 0.8.
  4. Step 4: Set p to 0.05 (rare event) → it hugs zero early on, but still converges to p. The law holds.
Trials0
Current mean
Theoretical0.50
UP NEXT —how to quantify uncertainty with finite n I3 Confidence Interval
I3 / Confidence Intervals

Confidence Interval — What 95% Really Means

LLN says "at infinity, you're right." But in practice we always have a finite sample. So instead of a single point, drape a net around it — that's a confidence interval. Wider net, easier to catch; narrower, more precise. Watch the trade-off play out.

The 95% confidence interval is famously misunderstood.
It does NOT mean "the true value is inside with 95% probability". The correct reading: "repeat this sampling many times, and ~95% of the resulting intervals will capture the true value".
The lab below brute-forces that intuition. Thin pink = the unlucky intervals that missed. Once the pink share settles around ~5%, you've got it.

Experiment Guide — try these in order
  1. Step 1: At 95% confidence, n=30, hit ▶ → pink (missed) intervals should be ~5% of the total.
  2. Step 2: Drop confidence to 80% and regenerate → more pink. Narrower net = more misses.
  3. Step 3: Back to 95%, set n to 200 → intervals get much tighter. The power of large samples.
  4. Step 4: Set n to 5 → intervals are huge. With few samples, you need a wide net to catch the truth.
Intervals built0
Coverage
Expected95%
UP NEXT —from width to yes/no I4 Hypothesis Testing
I4 / Hypothesis Testing

Hypothesis Testing — Reject or Fail to Reject

If a CI expresses uncertainty as a width, hypothesis testing turns it into a yes/no decision. Under the null world, could this data have happened? If it's too unlikely, reject. Same distribution, same σ/n — just a different question.

Think of testing as a trial.
You start by assuming H₀ ("the drug has no effect" = "innocent"). Then if your computed test statistic z lands in the pre-chosen rejection region, you convict — that is, reject H₀.
Two panels below: ① geometry of z and rejection regions (two-sided, right, left), and ② false alarms (α) vs. misses (β).

Experiment Guide — try these in order
  1. Step 1: Panel ①: z=1.96, α=0.05, two-sided → right on the boundary. p≈0.05. The watershed.
    α (significance level) = the threshold for "too extreme to be coincidence." It sets the width of the rejection zone.
  2. Step 2: Drag the "Observed z" slider to 2.5 → deep in the rejection zone, p-value shrinks. "Strong evidence."
  3. Step 3: Switch the test-type dropdown to "Right" → same z=1.96 but rejection area is one-sided; p-value halves.

▶ ① Basics: z-statistic & rejection region

Test statistic z
Critical value
p-value
Decision
Experiment Guide — try these in order
  1. Step 1: δ=2, α=0.05 → high power (most of the purple curve falls in the rejection region).
  2. Step 2: Lower δ to 0.5 → purple and blue nearly overlap; power drops sharply. Small effects are hard to detect.
  3. Step 3: Tighten α to 0.01 → rejection region shrinks, β (misses) increases. The trade-off in action.
  4. Step 4: Drag horizontally on the chart → move the critical boundary and feel the α vs β tug-of-war.

▶ ② Two kinds of errors: α, β, power

This panel is a hands-on playground for the α (Type I) / β (Type II) / power trade-off.
Come back whenever you want to feel the trade-off. For the deeper conceptual write-up (the 2×2 matrix, why α and β live in different worlds), see the "Type I & II in a 2×2 table" column.
Tip: drag horizontally on the chart to slide the critical boundary (α).

α (Type I error)
Critical value
β (Type II error)
Power 1−β
UP NEXT —from means to proportions I5 Proportion Test
I5 / Proportion Test & Estimation

Proportion Test & Estimation — from sample proportion to the truth

The logic is exactly the same as when we did confidence intervals and hypothesis tests for means. The only twist: data is now "success or failure" counts instead of continuous measurements, so the standard error formula changes shape. Play with the sliders here and the textbook formulas read more easily afterward.

A world where every data point is just "success" or "failure." The sample proportion p̂ = x/n is the starting point, and when n is large enough the normal approximation kicks in (thanks to the Central Limit Theorem).
We'll work through ① interval estimation② one-sample test③ two-sample test. Compare each step with what you learned for means — spotting the similarities makes the differences easy to absorb.

Experiment Guide — try these in order
  1. Step 1: n=15, p̂=0.50, 95% → the interval is really wide. Just 15 people isn't enough precision.
  2. Step 2: Push n to 100 → the interval tightens up fast. Feel the power of sample size.
  3. Step 3: Set p̂ to 0.90 → variance p(1-p) shrinks, so the CI narrows. p̂=0.50 gives the widest interval.
  4. Step 4: Keep p̂=0.90, drop n to 10 → if ⚠ appears, the normal approximation conditions aren't met.

▶ ① Confidence interval for a proportion

Standard error SE
CI lower bound
CI upper bound
Margin of error E
Simulation — verify what "95%" really means

"What does a 95% CI actually mean?" — easy to lose the thread by chasing the words alone. Reframed: "if you repeated the same survey many times, about 95% of the intervals would capture the true proportion."
Run it 200 times and see whether that 95% claim holds up.

▶ ①-b CI simulation

Generated0
Coverage rate
Experiment Guide — experience proportion testing
  1. Step 1: n=100, p̂=0.60, p₀=0.50, α=0.05, two-sided → "Is 60% significantly different from 50%?"
  2. Step 2: Change p₀ to 0.55 → z shrinks and you can no longer reject. Small differences are hard to detect.
  3. Step 3: Increase n to 400 → same p̂=0.60, p₀=0.55 but now you reject. The power of sample size.
  4. Step 4: Switch test type to "Right" → one-sided test asking only "greater than 50%?" The p-value halves.

▶ ② One-sample z-test for a proportion

Same flow as testing a population mean. Set up H₀: p = p₀, compute z from the sample, and check whether it lands in the rejection region.
The only twist is that the standard error becomes √(p₀(1−p₀)/n). Nail that, and the rest is familiar.

Test statistic z
Critical value
p-value
Decision
Experiment Guide — compare two groups
  1. Step 1: n₁=n₂=100, p̂₁=0.60, p̂₂=0.45 → is the difference significant?
  2. Step 2: Move p̂₂ toward 0.55 → z shrinks, rejection gets harder.
  3. Step 3: Increase both to n₁=n₂=400 → same gap, more power.
  4. Step 4: Try n₁=50, n₂=200 (asymmetric) → the smaller n is the bottleneck.

▶ ③ Two-proportion z-test

"Drug A vs. Drug B — which works better?" "Ad A vs. Ad B — is the click-through rate really different?" This test is for comparing two groups.
The key idea: under H₀: p₁ = p₂, we pool both samples into a single pooled proportion to build a shared SE. Get that, and the rest mirrors the one-sample test.

Pooled proportion p̂
Test statistic z
p-value
Decision
UP NEXT —into the world where σ is unknown I6 t, χ², F
I6 / t · χ² · F Distributions

Three Test Distributions — Meet t, χ² & F

Up to now we've tested means assuming σ is known. In practice you must estimate σ too — and the moment you do, Z morphs into t. Test a variance directly: χ². Compare two variances: F. All descendants of N(0,1); the name changes based on what you don't know.

t, χ², F are all derived from the normal. Think of them as "the standard normal, scaled to reflect that we only ever see a sample".
Use them for: t — testing a mean when the population variance is unknown (i.e. nearly every real test of a mean); χ² — testing a variance, independence, goodness-of-fit for categorical data; F — ratios of variances (ANOVA, the overall F in regression).
Slide df: t converges to N(0,1) as df→∞, and χ²/F get more symmetric with more df. The Central Limit Theorem is quietly doing the work under the hood.

Experiment Guide — try these in order
  1. Step 1: n=3 → df=1, extremely heavy tails. Compare with the normal (gray dashed line).
  2. Step 2: n=10 → still heavier tails than normal, but getting closer.
  3. Step 3: n=31 → nearly indistinguishable from N(0,1). The CI bars nearly overlap.

▶ t distribution

Built from: t = Z / √(χ²ₖ/k), Z~N(0,1).
Use for: testing means with unknown variance, regression t-values.
Flavor: heavier tails than N(0,1); matches N(0,1) as df→∞.
Confidence:
↔ Drag the graph horizontally to change n
Sample size n 5
df 4
Confidence 95%
t critical ±2.776
z critical ±1.960
t / z ratio 1.416

With small samples, a normal-based CI is too narrow — overconfident. The t-distribution honestly reflects that extra uncertainty. As n grows, t converges to normal — that's what the two bars show.

Experiment Guide — χ² distribution
  1. Step 1: Keep "Fair die" selected and press Roll a few times → even a fair die varies each time. Watch the bar chart and χ² statistic change.
  2. Step 2: Switch to "Loaded" and roll → face 1 jumps out, χ² enters the rejection region.
  3. Step 3: Increase the rolls slider → more samples detect smaller biases (higher power).
  4. Step 4: Move the df slider to explore the shape. df=30 → nearly normal.

▶ χ² distribution

Built from: χ²ₖ = Z₁² + Z₂² + ... + Zₖ² (sum of k squared standard normals).
Use for: variance tests, chi-square tests of independence / goodness-of-fit.
Flavor: non-negative, right-skewed. Mean = k, variance = 2k. Goes bell-shaped with large df.
↔ Drag the graph horizontally to change df
Rolls n 60
df 5
Test statistic χ²
χ² critical 11.070
p-value
Verdict

The chi-squared distribution measures "how big is the gap between observed and expected." It tells you whether that gap is just random noise or a real bias.

Experiment Guide — F distribution
  1. Step 1: Set A and B to the same SD (e.g. 10 and 10) → F≈1, not rejected.
  2. Step 2: Increase B only (e.g. A=5, B=15) → F grows and enters the rejection region.
  3. Step 3: Increase n → same SD gap but lower p-value (more power).
  4. Step 4: Alternate A and B's SD to feel what "variance ratio" means.

▶ F distribution

Built from: F = (χ²ₘ/m) / (χ²ₙ/n) (ratio of two independent χ² / df).
Use for: ANOVA, overall F-test in regression.
Flavor: non-negative, right-skewed. Shape depends on both df.
↔ Drag the graph horizontally to change n
Group A SD 8
Group B SD 15
df₁, df₂ 19, 19
F statistic 3.516
F critical 2.168
p-value 0.005

The F-distribution evaluates the ratio of two groups' spread. ANOVA also uses this F-statistic to test whether group means differ.

UP NEXT —quantifying "deviation" and testing it I7 Chi-Squared Test
I7 / Chi-Squared Test

Chi-Squared Test — Quantifying the Gap

So far we've tested means. But some data is purely categorical — survey choices, dice outcomes, disease × smoking. The chi-squared test quantifies "deviation from expectation" for these counts. The bigger the mismatch, the brighter the χ² statistic glows.

Goodness-of-fit asks: "Does the observed category distribution match a theoretical one?" Classic example: is the die fair?
Test of independence asks: "Are two categorical variables independent?" Compute χ² = Σ (O−E)²/E across every cell of the contingency table.
Why divide by E? → A deviation of 2 from an expected 10 matters more than 2 from an expected 1,000. Dividing by E turns raw gaps into relative ones.
Both use a χ²-distributed statistic; the p-value is the right-tail area. df = k−1 for goodness-of-fit, (r−1)(c−1) for independence.

Experiment Guide — try these in order
  1. Step 1: ① Goodness-of-fit: select "Fair die" and auto-roll → as n grows, χ² stays low, p-value stays high (fail to reject).
  2. Step 2: Switch to "Loaded" and auto-roll → χ² shoots up, p-value drops below α. The cheat is caught.
  3. Step 3: ② Independence: click only the top-left cell to create imbalance → χ² spikes, independence is rejected.
  4. Step 4: Reset and click cells evenly → χ² stays small. No imbalance = independence holds.

▶ ① Goodness-of-Fit — Is the Die Fair?

🎲 Click a bar on the left to add one roll (Shift+click to subtract)
Rolls n 0
Test statistic χ²
df
p-value
Decision
Experiment Guide — try these in order
  1. Step 1: Click the top-left and bottom-right cells repeatedly → diagonal bias raises χ², verdict: "not independent."
  2. Step 2: Reset, then click all cells evenly → χ² stays small. No bias = independent.
  3. Step 3: Set α to 0.01 → higher bar for rejection. Same data might flip the verdict.
  4. Step 4: Flood a single cell → huge gap from expected frequencies, p-value plummets.

▶ ② Test of Independence — Are Two Variables Independent?

Click a cell on the left to add +1 (Shift+click for −1)
Total n 0
Test statistic χ²
df
p-value
Decision
UP NEXT —comparing three or more groups at once I8 ANOVA
I8 / ANOVA

ANOVA — Feel What the F-Statistic Really Means

The chi-squared test quantified "category deviations." But how do we test whether means differ across three or more groups? Repeating t-tests inflates false positives — ANOVA compares all groups in a single test.

ANOVA compares "how different the groups are" vs. "how spread out each group is."
When the difference outweighs the spread, we conclude the groups really differ. Move the sliders to feel it.

Experiment Guide — Multiple Comparisons Problem
  1. Step 1: Click ▶ 100 trials — runs 100 experiments where 3 identical groups are compared with 3 pairwise t-tests.
  2. Step 2: Check the red dot ratio — theory predicts 14.3%. What do you get?
  3. Step 3: Click ▶ 1000 trials — as trials accumulate, the rate converges toward 14.3%.

▶ Why You Can't Just Repeat t-Tests

First, experience why ANOVA is needed. With 3 identical groups (N(0,1), n=20), repeat 3 pairwise t-tests — how bad does the false-positive rate get?
Trials 0
False positives 0
FP rate 0.0%

The F-test (ANOVA) below is how we avoid the false-positive inflation you just saw in the simulation above.

Experiment Guide — Feel the F-Statistic
  1. Step 1: Set between-group difference to zero → F ≈ 1. All groups look like one population.
  2. Step 2: Increase between-group difference → F rises, p drops. Watch it cross the rejection threshold.
  3. Step 3: Increase within-group spread → same difference but F drops. "Real differences can hide in noise."
  4. Step 4: Increase sample size → F rises. Larger samples detect smaller effects (statistical power).

▶ Between vs. Within — Feel the F-Statistic

F
Between df
Within df
p
SSB
SSW
η²
Decision
UP NEXT —measuring the "link" between two variables M0 Correlation

▼ What comes next — into Modeling

You can now test for differences and independence. Next: finding relationships and predicting.
Correlation measures the "link" between two variables, then simple regression turns it into prediction. Then multiple regression controls for confounders to measure true effects — tests, CIs, and the F-distribution all converge here. The full statistical toolkit comes together.

MODELING

Modeling — finding relationships and making predictions

M0 / Correlation

Correlation — Measuring the Link Between Two Variables

The chi-squared test quantified associations between categories. But how do we measure the relationship between continuous numbers — height vs. weight, study time vs. test scores? → The correlation coefficient r compresses "how much they move together" into a single number from −1 to +1.

The correlation coefficient r measures whether two variables move together. +1 means a perfect positive linear relationship, −1 a perfect negative one, 0 means no linear relationship. Click the canvas to add points and watch r update in real time. The yellow dashed lines mark the means, splitting the space into four quadrants — more points in green quadrants means positive correlation, more in red means negative. This coloring is the "tug-of-war of signs" in the formula Σ(x−x̄)(y−ȳ).

Experiment Guide — Feel the Correlation
  1. Step 1: Set r = 0.80, click Generate → an upward-sloping band. Points cluster in the green quadrants.
  2. Step 2: Change to r = −0.60, Generate → downward slope. More points in the red quadrants.
  3. Step 3: Set r = 0.00, Generate → points spread evenly across all four quadrants. A "cloud," not a band.
  4. Step 4: CLEAR, then manually place a U-shape → r ≈ 0 yet there's an obvious pattern! r only captures linear relationships.
↑ Click canvas to add points
n0
Correlation r
Covariance
Experiment Guide — Peek Behind the Numbers
  1. Step 1: Watch the animation. Points appear one by one, patterns look totally different…
  2. Step 2: Check each plot's r ≈ 0.816. They're all nearly identical!
  3. Step 3: Regression lines appear → lines are nearly identical too. Yet II is curved, III has an outlier, IV is dominated by one point.
  4. Step 4: Click "Replay" to watch again. Numbers alone don't tell the whole story.

▶ The pitfall of r — Anscombe's Quartet

UP NEXT —turning relationships into predictions M1 Simple Regression
M1 / Simple Regression

Simple Regression — Drawing the OLS Line

Correlation measured "how strong the link is." Next: "if x goes up by 1, how much does y move?" — simple regression turns the relationship into prediction. One line through two variables — and the t-tests and CIs you just learned power the inference on its slope β̂.

Regression with just one explanatory variable is simple regression. It assumes a linear relationship: when x increases by 1, y moves by β₁. Ordinary least squares (OLS) picks the line that minimizes the sum of squared vertical residuals. Click the canvas to add points and watch the line snap into place. Green bars are residuals. R² (in 0–1) measures how much of y the line explains.

Experiment Guide — try these in order
  1. Step 1: Hit "Random 20 pts" → a regression line and R² appear. Check the green bars (residuals).
  2. Step 2: Click far from the line to add one outlier → the line jerks toward it. Watch how far a single point can drag the fit.
  3. Step 3: CLEAR and place 5 points nearly in a line → R² ≈ 1.0. A perfect linear relationship.
  4. Step 4: CLEAR and arrange points in a circle → R² ≈ 0. A line can't capture this pattern.
↑ Click the canvas to add points
n0
Slope β₁
Intercept β₀
Correlation r
UP NEXT —controlling for everything else M2 Multiple Regression
M2 / Multiple Regression

Multiple Regression — Predict with multiple variables

Simple regression predicted scores from study hours alone. But what if heavy studiers sleep less — and lost sleep drags scores down? The true effect of studying gets masked by a hidden variable. Multiple regression controls for other factors to isolate each variable's real contribution.

One variable gives a line; two give a plane in 3D. But the point isn't geometry — it's removing confounding to isolate each variable's true effect. Start with the side-by-side comparison to see the moment β₁ shifts.

Experiment Guide — Feel Confounding
  1. Step 1: At default (corr = −0.5), compare β₁ left vs. right. Simple regression is smaller — the study effect is underestimated.
  2. Step 2: Set correlation to 0 → both β₁ values nearly match. "No confounding, no bias."
  3. Step 3: Set correlation to +0.5 → now simple regression β₁ is too large. Confounding can bias in either direction.
  4. Step 4: Check R² too. Multiple regression is always ≥ simple — adding a variable improved explanatory power.

▶ Simple vs. Multiple Regression — Watch β₁ Shift

Same data, two models: "study hours only" vs. "study + sleep." Adjust the correlation slider to change confounding strength.
Simple β₁
Multiple β₁
Gap (confounding bias)
Simple R²
Multiple R²
Experiment Guide — Feel the Regression Plane in 3D
  1. Step 1: Drag the 3D plot to rotate. The translucent surface is the regression plane — data points align along it.
  2. Step 2: Move the study slider → the prediction dot slides along the x₁ direction. The tilt = β₁.
  3. Step 3: Move sleep too → it moves along x₂. The tilt = β₂. Each variable's contribution is visible.
  4. Step 4: Hit Resample a few times → β₁, β₂, R² shift slightly each time. Estimates have variability too.
β̂₁ (per study hour)
β̂₂ (per sleep hour)

▼ The Big Picture

From the standard normal through probability, distributions, inference, and regression — the core scope of introductory statistics, end to end.
Every page is a descendant of N(0,1): t = "normal with unknown σ," χ² = "sum of squared normals," F = "ratio of χ²'s," regression tests use t and F. Dive deeper into each topic's dedicated page for formulas and detailed explanations.