// α, β, and power tend to tangle, don't they
Type I errors. Type II errors. Power. The formulas are not the problem; the moment a real problem shows up, α and β and "reject / fail to reject" and "true / false" all start dancing at once, and which one is which gets hard to keep track of. That kind of moment, surely most readers know it.
StatPlay also has a section that handles these errors. But "α is here, β is here, watch them move" only labels the parts; it doesn't touch the wobble itself. A page like that is easy to scroll past.
This column is an attempt to revisit that wobble in the direction of "sort it into cases." Crossing two axes — "H₀ is true / false" and "reject / fail to reject" — gives four cases. That single move alone may make the whole picture much easier to keep straight. Worth a try, perhaps.
// One axis isn't enough
The reason error questions stay slippery is simple: an error gets talked about as a single probability on one axis.
For instance, "the Type II error is the probability of failing to reject" is a tempting line. Not exactly wrong, but a critical piece is missing — whether the underlying world has H₀ true, or H₁ true.
Work through an actual problem and the answer turns out to come in pairs:
- Type I: in the world where H₀ is true, the probability of rejecting anyway.
- Type II: in the world where H₁ is true, the probability of failing to reject.
An error is therefore determined by two coordinates: the world (H₀ true / H₁ true) and the decision (reject / fail to reject). Drop the world axis, and the description is guaranteed to wobble somewhere.
The next section pins both axes down inside one table.
// The 2×2 matrix — the spine of this column
The whole story collapses into this single table.
| Fail to reject H₀ | Reject H₀ | |
|---|---|---|
| H₀ is true (no real difference) |
Correct
1 − α
confidence
|
Type I error
α
≈ false alarm
|
| H₀ is false (a real difference) |
Type II error
β
≈ missed detection
|
Correct
1 − β
power
|
The vertical axis is which world is actually true. The horizontal axis is the decision being made. Two of the four cells are correct outcomes; the other two are errors.
Pause for a moment and study the table. The point to notice is that the four probabilities are not summed across the table. Reading along a row gives "in the H₀-true world, fail to reject + reject = 1"; the other row says "in the H₀-false world, fail to reject + reject = 1." In other words, each row sums to 1, and α and β live in different rows.
That "different rows" fact is the next twist.
// Try it — where does each cell sit on the distributions?
The frame is in place: errors are 2×2. But the table alone doesn't show where on the distributions each cell lives. In the mini-visualization below, clicking any cell of the 2×2 lights up the matching region of the distribution. Predict first, then click. Calling the shot in advance — "α is probably here, β over there" — is what makes the rest of the picture readable.
| Fail to reject | Reject | |
|---|---|---|
| H₀ true |
1−α Correct
|
α False alarm
|
| H₀ false |
β Missed
|
1−β Power
|
Once the clicks have run, two facts should be visible on the picture:
- α lives in the tails of the blue mountain (the H₀-true world). The purple mountain isn't involved.
- β lives near the center of the purple mountain (the H₁-true world). It's measured on a different mountain.
That picture — "α and β sit on different mountains" — is the topic of the next section.
// α and β are probabilities of different worlds
This is the key turn for the whole column — the point where the case-by-case sort really starts to pay off.
α and β are probabilities defined under different assumptions:
- α is the probability of rejecting in the world where H₀ is true.
- β is the probability of failing to reject in the world where H₁ is true.
So they don't live in the same probability space. Written out it sounds slight, but this carries serious consequences. For instance, the following question now sorts itself:
"What does α + β represent?"
Answer: nothing. The 2×2 table is not built to be summed down its columns. It's built so that each row sums to 1: in the H₀-true world, (1−α) + α = 1; in the H₁-true world, β + (1−β) = 1. Adding numbers across two different worlds doesn't yield an interpretable quantity. Within each world, fail to reject + reject = 1, and that's the relation the table is showing.
The familiar asymmetry between α and β also sorts naturally from here:
- α can be drawn the moment H₀ is fixed (the distribution shape is determined). That's why "α = 0.05" can be committed to before any data arrive.
- β can't be drawn until the position of H₁ is specified — that is, until an effect size δ is assumed.
α is set up front; β only becomes drawable once a δ has been committed to. The asymmetry follows directly from having only one of the two worlds fully specified — the same case-by-case point as before.
// Try it — β requires a δ to even exist
The claim above — "β can't be drawn until H₁ is placed" — is the kind of thing that's faster to feel than to argue. The control below moves α and the effect size δ independently. First move only α and watch what happens to β. Then freeze α and move only δ. The contrast between the two behaviors is what makes which quantity belongs to which world visible.
Pin down what just appeared on the canvas:
- Moving α alone changes the shaded area on the blue mountain. α is the protagonist; β can be treated as essentially unchanged here.
- Moving δ alone slides the purple mountain. β (the unshaded area between the critical values, under purple) changes a lot. α doesn't budge.
So α is a tool of the H₀ world; β is a tool of the H₁ world. They share a screen, but they live in different worlds. The case-by-case sort is showing up directly on the canvas.
// Power is just β read backward — not a new concept
At first, "power" tends to look like a third character alongside α and β. Once the cases are sorted, it isn't.
This is the probability of rejecting, still inside the H₁ world — the other side of β. It's the bottom-right cell of the 2×2. The same fact, just read as "detection rate" instead of "miss rate."
So why does power get its own name? It plugs straight into sample-size planning:
- "Keep β at most 0.2" feels less actionable than "secure power of at least 0.8." Power is a forward-facing number that drops into the formula solving for the required n.
- Sliding α and n inside StatPlay shows directly how power 1−β responds — the next interactive does this.
The takeaway: there is no need to memorize three concepts. The 2×2 is still just four cells. Power is β phrased the other way around.
// Try it — what shifts when you raise n?
One more quantity worth touching: the sample size n. "α and β trade off — push one down, the other goes up" is a common saying. That's only half the story. The full story is that increasing n loosens the trade-off itself. Below, vary only n. α stays fixed (0.05).
At n=30, the same effect size carries a high risk (β) of being judged "no effect." At n=300, that risk drops sharply. That is what "a larger n is needed" is really pointing at. It is the only legitimate way to reduce misses (β) without sacrificing α.
Pin down what showed up on the canvas:
- α doesn't move. The blue area stays at 5%.
- β drops as the mountains thin out and the overlap shrinks.
- Therefore power 1−β rises.
The "α and β are a seesaw" metaphor is half right; the complete picture is that the seesaw's pivot itself moves with n, once everything is sorted by case.
// When you want to feel one more layer
By this point, the 2×2, the world distinction between α and β, the relationship to power, and the role of n should all be lined up.
If you'd like to see how α, β, power, effect size, and n move simultaneously during an actual test, the testing topic has a heavier-duty canvas in its "Two Errors" section.
- Testing topic / Two Errors section →
- Preset effect-size links:
- Open with a small effect (δ=0.5) →
- Open with a moderate effect (δ=2.0) →
This column is the "putting it in order" room; the other visualization is the "playing with all knobs at once" room. Order first, play second — that sequence tends to leave the whole picture in order.
// KEY TAKEAWAY
- Errors are easier to keep straight as a 2×2. A one-axis description tends to lose the H₀-true / H₀-false coordinate.
- α and β can be sorted as probabilities of different worlds. One belongs to the H₀ world, the other to the H₁ world. Their sum yields no interpretable quantity.
- Power 1−β is β read backward. Not a new concept; a forward-facing rename that drops directly into sample-size formulas.
- n reduces β without touching α. It can be sorted as the only knob that moves the trade-off's pivot itself.
// Frequently asked questions
Sorting by cases makes this easier to keep straight. α is the rejection rate in the H₀-true world; β is the non-rejection rate in the H₁-true world. They are probabilities of different worlds, defined under different assumptions, so adding them doesn't yield an interpretable quantity. Within each world, reject + fail to reject = 1, and that is the relation the table is showing.
Sorting by cases helps. α does not move — it's a value committed to in advance. What moves is β: a small n means a large standard error σ/√n, which makes the H₀ and H₁ distributions overlap more, which raises β. Power 1−β drops correspondingly. That is what's underneath the line that "a real effect can fail to reach significance when n is small."
Sorting by cases makes the trap visible. The output "not significant" arises both when there really is no difference and when a real difference exists but n was too small to detect it (β large = insufficient power). Two distinct causes produce the same "fail to reject" outcome. That is why claiming "no difference" requires an additional showing — adequate power, or a confidence interval narrow enough to rule out meaningful effects.
Once the errors are sorted, take this back to a real test.
With α and β sorted as quantities of two different worlds, the next step is watching them move together inside an actual z- or t-test.
The link with confidence intervals also looks different once the 2×2 is in place.