ANOVA — Feel What the F-Statistic Really Means
Run a hundred experiments and the reason you can't just repeat t-tests becomes visible: the false-positive rate jumps from 5% to 14.3%.
ANOVA — Feel What the F-Statistic Really Means
ANOVA compares "how different the groups are" vs. "how spread out each group is."
When the difference outweighs the spread, we conclude the groups really differ. Move the sliders to feel it.
- Step 1: Click ▶ 100 trials — runs 100 experiments where 3 identical groups are compared with 3 pairwise t-tests.
- Step 2: Check the red dot ratio — theory predicts 14.3%. What do you get?
- Step 3: Click ▶ 1000 trials — as trials accumulate, the rate converges toward 14.3%.
▶ Why You Can't Just Repeat t-Tests
The F-test (ANOVA) below is how we avoid the false-positive inflation you just saw in the simulation above.
Experiment Guide — Feel the F-Statistic- Step 1: Set between-group difference to zero → F ≈ 1. All groups look like one population.
- Step 2: Increase between-group difference → F rises, p drops. Watch it cross the rejection threshold.
- Step 3: Increase within-group spread → same difference but F drops. "Real differences can hide in noise."
- Step 4: Increase sample size → F rises. Larger samples detect smaller effects (statistical power).
▶ Between vs. Within — Feel the F-Statistic
// ANOVA Table
| Source | SS | df | MS | F | p |
|---|---|---|---|---|---|
| Between (B) | — | — | — | — | — |
| Within (W) | — | — | — | ||
| Total | — | — |
// What the F-statistic actually does
Suppose you want to compare test scores across three classes.
"The class averages are far apart" → maybe the teaching method matters.
"But the scores within each class also vary a lot" → could just be noise.
The F-statistic puts a number on that comparison:
between-group spread ÷ within-group spread — that's really all it is.
The bigger the numerator relative to the denominator, the more evidence that the groups genuinely differ.
Here's the formula. Don't worry about memorizing it right away — try dragging the graph above and watching how F reacts first.
· SSB (between-group) = how far each group mean is from the overall mean
· SSW (within-group) = how much data scatter inside each group
· Dividing by df gives MS — "variation per degree of freedom"
· F ≈ 1 → "no evidence of differences"; F large → "groups likely differ"
How large is "large enough"? That depends on degrees of freedom and significance level α.
Check critical values in the Interactive Distribution Tables.
// Walk through it — test-score example
Three teaching methods A, B, C with 15 students each.
① Look at the means
Group A = 65, Group B = 72, Group C = 73, Grand mean = 70
② Between-group SSB
n × (group mean − grand mean)² summed up
= 15×(65−70)² + 15×(72−70)² + 15×(73−70)² = 15×(25+4+9) = 570
③ Within-group SSW
Total scatter of data around their own group means = say 2520
④ Degrees of freedom
dfB = k−1 = 3−1 = 2
dfW = N−k = 45−3 = 42
⑤ Compute MS and F
MSB = 570÷2 = 285, MSW = 2520÷42 = 60
F = 285÷60 = 4.75
⑥ Verdict
The critical value of F(2, 42) at α = 0.05 is about 3.22.
4.75 > 3.22 → Reject H₀ (at least one group differs).
— But we still don't know which groups differ (→ post-hoc tests needed).
// Easy-to-trip-on points
ANOVA only tells you "there's a difference somewhere." To pin down which pairs, you need post-hoc tests like Tukey's HSD or Bonferroni correction.
Check the simulation at the top of this page. With 3 groups and repeated t-tests, the false-positive rate jumps to about 14.3% even when no real difference exists. ANOVA keeps the overall α at 5%.
A large sample size can inflate F even for a tiny real difference. To measure how large the effect actually is, use effect size (η²).
// Shapes you'll meet again
The same structure shows up again and again whenever ANOVA is around.
- The shape of an ANOVA table — once SS and df are lined up, MS and F fall into the next columns automatically. The worked example above traces this exact shape
- How degrees of freedom slot in — between = k−1, within = N−k, total = N−1. These three always appear together
- Why repeated t-tests inflate — the multiple-comparisons story. The 14.3% from the simulation reappears in this same shape
- The three assumptions in the background — normality, homogeneity of variance, independence. The F-distribution picture above only takes shape when these three are in place