// CASE FILE — TYPE I vs TYPE II
αβ

A false alarm and a missed detection.
They sound alike. They aren't the same.

"Type I error." "Type II error." The names are familiar; the formulas are not unfamiliar either.
Yet the moment a real problem shows up, the labels start to swap.
This column tries sorting that tangle into four cases on a single 2×2 table.

StatPlay Columns Type I vs Type II Errors
01

// α, β, and power tend to tangle, don't they

Type I errors. Type II errors. Power. The formulas are not the problem; the moment a real problem shows up, α and β and "reject / fail to reject" and "true / false" all start dancing at once, and which one is which gets hard to keep track of. That kind of moment, surely most readers know it.

StatPlay also has a section that handles these errors. But "α is here, β is here, watch them move" only labels the parts; it doesn't touch the wobble itself. A page like that is easy to scroll past.

This column is an attempt to revisit that wobble in the direction of "sort it into cases." Crossing two axes — "H₀ is true / false" and "reject / fail to reject" — gives four cases. That single move alone may make the whole picture much easier to keep straight. Worth a try, perhaps.

02

// One axis isn't enough

The reason error questions stay slippery is simple: an error gets talked about as a single probability on one axis.

For instance, "the Type II error is the probability of failing to reject" is a tempting line. Not exactly wrong, but a critical piece is missing — whether the underlying world has H₀ true, or H₁ true.

Work through an actual problem and the answer turns out to come in pairs:

An error is therefore determined by two coordinates: the world (H₀ true / H₁ true) and the decision (reject / fail to reject). Drop the world axis, and the description is guaranteed to wobble somewhere.

The next section pins both axes down inside one table.

03

// The 2×2 matrix — the spine of this column

The whole story collapses into this single table.

Fail to reject H₀ Reject H₀
H₀ is true
(no real difference)
Correct
1 − α
confidence
Type I error
α
≈ false alarm
H₀ is false
(a real difference)
Type II error
β
≈ missed detection
Correct
1 − β
power

The vertical axis is which world is actually true. The horizontal axis is the decision being made. Two of the four cells are correct outcomes; the other two are errors.

Pause for a moment and study the table. The point to notice is that the four probabilities are not summed across the table. Reading along a row gives "in the H₀-true world, fail to reject + reject = 1"; the other row says "in the H₀-false world, fail to reject + reject = 1." In other words, each row sums to 1, and α and β live in different rows.

That "different rows" fact is the next twist.

04 · INTERACTIVE

// Try it — where does each cell sit on the distributions?

The frame is in place: errors are 2×2. But the table alone doesn't show where on the distributions each cell lives. In the mini-visualization below, clicking any cell of the 2×2 lights up the matching region of the distribution. Predict first, then click. Calling the shot in advance — "α is probably here, β over there" — is what makes the rest of the picture readable.

Fail to rejectReject
H₀ true
1−α
Correct
α
False alarm
H₀ false
β
Missed
1−β
Power

Once the clicks have run, two facts should be visible on the picture:

That picture — "α and β sit on different mountains" — is the topic of the next section.

05

// α and β are probabilities of different worlds

This is the key turn for the whole column — the point where the case-by-case sort really starts to pay off.

α and β are probabilities defined under different assumptions:

So they don't live in the same probability space. Written out it sounds slight, but this carries serious consequences. For instance, the following question now sorts itself:

"What does α + β represent?"

Answer: nothing. The 2×2 table is not built to be summed down its columns. It's built so that each row sums to 1: in the H₀-true world, (1−α) + α = 1; in the H₁-true world, β + (1−β) = 1. Adding numbers across two different worlds doesn't yield an interpretable quantity. Within each world, fail to reject + reject = 1, and that's the relation the table is showing.

The familiar asymmetry between α and β also sorts naturally from here:

α is set up front; β only becomes drawable once a δ has been committed to. The asymmetry follows directly from having only one of the two worlds fully specified — the same case-by-case point as before.

06 · INTERACTIVE

// Try it — β requires a δ to even exist

The claim above — "β can't be drawn until H₁ is placed" — is the kind of thing that's faster to feel than to argue. The control below moves α and the effect size δ independently. First move only α and watch what happens to β. Then freeze α and move only δ. The contrast between the two behaviors is what makes which quantity belongs to which world visible.

α (significance)
δ (effect size)
α0.050
β
power 1−β
critical

Pin down what just appeared on the canvas:

So α is a tool of the H₀ world; β is a tool of the H₁ world. They share a screen, but they live in different worlds. The case-by-case sort is showing up directly on the canvas.

07

// Power is just β read backward — not a new concept

At first, "power" tends to look like a third character alongside α and β. Once the cases are sorted, it isn't.

1 − β = power

This is the probability of rejecting, still inside the H₁ world — the other side of β. It's the bottom-right cell of the 2×2. The same fact, just read as "detection rate" instead of "miss rate."

So why does power get its own name? It plugs straight into sample-size planning:

The takeaway: there is no need to memorize three concepts. The 2×2 is still just four cells. Power is β phrased the other way around.

08 · INTERACTIVE

// Try it — what shifts when you raise n?

One more quantity worth touching: the sample size n. "α and β trade off — push one down, the other goes up" is a common saying. That's only half the story. The full story is that increasing n loosens the trade-off itself. Below, vary only n. α stays fixed (0.05).

α (FIXED)
0.050
━━━━
β
↓↓↓
power 1−β
↑↑↑

At n=30, the same effect size carries a high risk (β) of being judged "no effect." At n=300, that risk drops sharply. That is what "a larger n is needed" is really pointing at. It is the only legitimate way to reduce misses (β) without sacrificing α.

Pin down what showed up on the canvas:

The "α and β are a seesaw" metaphor is half right; the complete picture is that the seesaw's pivot itself moves with n, once everything is sorted by case.

09

// When you want to feel one more layer

By this point, the 2×2, the world distinction between α and β, the relationship to power, and the role of n should all be lined up.

If you'd like to see how α, β, power, effect size, and n move simultaneously during an actual test, the testing topic has a heavier-duty canvas in its "Two Errors" section.

This column is the "putting it in order" room; the other visualization is the "playing with all knobs at once" room. Order first, play second — that sequence tends to leave the whole picture in order.

// KEY TAKEAWAY

FAQ

// Frequently asked questions

Sorting by cases makes this easier to keep straight. α is the rejection rate in the H₀-true world; β is the non-rejection rate in the H₁-true world. They are probabilities of different worlds, defined under different assumptions, so adding them doesn't yield an interpretable quantity. Within each world, reject + fail to reject = 1, and that is the relation the table is showing.

Sorting by cases helps. α does not move — it's a value committed to in advance. What moves is β: a small n means a large standard error σ/√n, which makes the H₀ and H₁ distributions overlap more, which raises β. Power 1−β drops correspondingly. That is what's underneath the line that "a real effect can fail to reach significance when n is small."

Sorting by cases makes the trap visible. The output "not significant" arises both when there really is no difference and when a real difference exists but n was too small to detect it (β large = insufficient power). Two distinct causes produce the same "fail to reject" outcome. That is why claiming "no difference" requires an additional showing — adequate power, or a confidence interval narrow enough to rule out meaningful effects.

Once the errors are sorted, take this back to a real test.

With α and β sorted as quantities of two different worlds, the next step is watching them move together inside an actual z- or t-test.
The link with confidence intervals also looks different once the 2×2 is in place.