ANOVA — Feel What the F-Statistic Really Means
Why can't you just repeat t-tests for three or more groups? Move the sliders to see how the ratio of between-group to within-group variance becomes the F-statistic — and feel when it crosses the rejection threshold.
ANOVA — Feel What the F-Statistic Really Means
F = between-group variance ÷ within-group variance.
When groups differ a lot and individuals within each group are consistent, F grows large — suggesting real differences. Move the sliders to feel it.
- Step 1: Click ▶ 100 trials — runs 100 experiments where 3 identical groups are compared with 3 pairwise t-tests.
- Step 2: Check the red dot ratio — theory predicts 14.3%. What do you get?
- Step 3: Click ▶ 1000 trials — as trials accumulate, the rate converges toward 14.3%.
▶ Why You Can't Just Repeat t-Tests
// Why 14.3%?
Pairwise comparisons among 3 groups: m = 3 (A-B, B-C, A-C).
Each t-test has a 5% false positive rate. The probability of getting all three right is 0.95³ ≈ 0.857.
So "at least one error" = 1 − 0.857 ≈ 14.3%.
It gets worse as groups increase: 5 groups → m = 10 → 40%, 10 groups → m = 45 → over 90%.
ANOVA solves this by comparing all groups simultaneously in a single test.
The F-test (ANOVA) below is how we avoid the false-positive inflation you just saw in the simulation above.
Experiment Guide — Feel the F-Statistic- Step 1: Set between-group difference to zero → F ≈ 1. All groups look like one population.
- Step 2: Increase between-group difference → F rises, p drops. Watch it cross the rejection threshold.
- Step 3: Increase within-group spread → same difference but F drops. "Real differences can hide in noise."
- Step 4: Increase sample size → F rises. Larger samples detect smaller effects (statistical power).
▶ Between vs. Within — Feel the F-Statistic
// ANOVA Table
| Source | SS | df | MS | F | p |
|---|---|---|---|---|---|
| Between (B) | — | — | — | — | — |
| Within (W) | — | — | — | ||
| Total | — | — |
// What the F-statistic actually does
Suppose you want to compare test scores across three classes.
"The class averages are far apart" → maybe the teaching method matters.
"But the scores within each class also vary a lot" → could just be noise.
The F-statistic puts a number on that comparison:
between-group spread ÷ within-group spread — that's really all it is.
The bigger the numerator relative to the denominator, the more evidence that the groups genuinely differ.
Here's the formula. Don't worry about memorizing it right away — try dragging the graph above and watching how F reacts first.
· SSB (between-group) = how far each group mean is from the overall mean
· SSW (within-group) = how much data scatter inside each group
· Dividing by df gives MS — "variation per degree of freedom"
· F ≈ 1 → "no evidence of differences"; F large → "groups likely differ"
How large is "large enough"? That depends on degrees of freedom and significance level α.
Check critical values in the Interactive Distribution Tables.
// Walk through it — test-score example
Three teaching methods A, B, C with 15 students each.
① Look at the means
Group A = 65, Group B = 72, Group C = 73, Grand mean = 70
② Between-group SSB
n × (group mean − grand mean)² summed up
= 15×(65−70)² + 15×(72−70)² + 15×(73−70)² = 15×(25+4+9) = 570
③ Within-group SSW
Total scatter of data around their own group means = say 2520
④ Degrees of freedom
dfB = k−1 = 3−1 = 2
dfW = N−k = 45−3 = 42
⑤ Compute MS and F
MSB = 570÷2 = 285, MSW = 2520÷42 = 60
F = 285÷60 = 4.75
⑥ Verdict
The critical value of F(2, 42) at α = 0.05 is about 3.22.
4.75 > 3.22 → Reject H₀ (at least one group differs).
— But we still don't know which groups differ (→ post-hoc tests needed).
// Easy-to-trip-on points
ANOVA only tells you "there's a difference somewhere." To pin down which pairs, you need post-hoc tests like Tukey's HSD or Bonferroni correction.
Check the simulation at the top of this page. With 3 groups and repeated t-tests, the false-positive rate jumps to about 14.3% even when no real difference exists. ANOVA keeps the overall α at 5%.
A large sample size can inflate F even for a tiny real difference. To measure how large the effect actually is, use effect size (η²).
// Exam patterns
These come up frequently on stats exams and certification tests.
- Fill in an ANOVA table — given SS and df, compute MS → F → compare with critical value. The worked example above is the exact template
- Degrees of freedom — between = k−1, within = N−k, total = N−1
- "Why not repeat t-tests?" — explain the multiple-comparisons problem. The 14.3% from the simulation is a concrete talking point
- Assumptions — normality, homogeneity of variance, independence