Type I vs Type II Errors — One 2×2 Table Sorts It Out

Q: What does α + β equal?

Sorting this by cases makes it easier to keep straight. α is the rejection rate in the H₀-true world; β is the non-rejection rate in the H₁-true world. They are probabilities of different worlds, defined under different assumptions, so adding them doesn't produce an interpretable quantity. Within each world, reject + fail to reject = 1, and that's the relation the table is showing.

Q: How does small n affect the errors?

Sorting by cases helps. α does not move — it's a value committed to in advance. What moves is β: a small n means a large standard error σ/√n, which makes the H₀ and H₁ distributions overlap more, which raises β. Power 1−β drops correspondingly. That is what's underneath the saying that 'a real effect can fail to reach significance when n is small.'

Q: Why does 'not significant' not mean 'no difference'?

Sorting by cases makes the trap visible. The output 'not significant' arises both when there really is no difference and when a real difference exists but n was too small to detect it (β large = insufficient power). Two distinct causes produce the same fail-to-reject outcome. That is why claiming 'no difference' requires an additional showing — adequate power, or a confidence interval narrow enough to rule out meaningful effects.

// α, β, and power tend to tangle, don't they

Type I errors. Type II errors. Power. The formulas are not the problem; the moment a real problem shows up, α and β and "reject / fail to reject" and "true / false" all start dancing at once, and which one is which gets hard to keep track of. That kind of moment, surely most readers know it.

StatPlay also has a section that handles these errors. But "α is here, β is here, watch them move" only labels the parts; it doesn't touch the wobble itself. A page like that is easy to scroll past.

This column is an attempt to revisit that wobble in the direction of "sort it into cases." Crossing two axes — "H₀ is true / false" and "reject / fail to reject" — gives four cases. That single move alone may make the whole picture much easier to keep straight. Worth a try, perhaps.

// One axis isn't enough

The reason error questions stay slippery is simple: an error gets talked about as a single probability on one axis.

For instance, "the Type II error is the probability of failing to reject" is a tempting line. Not exactly wrong, but a critical piece is missing — whether the underlying world has H₀ true, or H₁ true.

Work through an actual problem and the answer turns out to come in pairs:

Type I: in the world where H₀ is true, the probability of rejecting anyway.
Type II: in the world where H₁ is true, the probability of failing to reject.

An error is therefore determined by two coordinates: the world (H₀ true / H₁ true) and the decision (reject / fail to reject). Drop the world axis, and the description is guaranteed to wobble somewhere.

The next section pins both axes down inside one table.

// The 2×2 matrix — the spine of this column

The whole story collapses into this single table.

	Fail to reject H₀	Reject H₀
H₀ is true (no real difference)	Correct 1 − α confidence	Type I error α ≈ false alarm
H₀ is false (a real difference)	Type II error β ≈ missed detection	Correct 1 − β power

The vertical axis is which world is actually true. The horizontal axis is the decision being made. Two of the four cells are correct outcomes; the other two are errors.

Pause for a moment and study the table. The point to notice is that the four probabilities are not summed across the table. Reading along a row gives "in the H₀-true world, fail to reject + reject = 1"; the other row says "in the H₀-false world, fail to reject + reject = 1." In other words, each row sums to 1, and α and β live in different rows.

That "different rows" fact is the next twist.

04 · INTERACTIVE

// Try it — where does each cell sit on the distributions?

The frame is in place: errors are 2×2. But the table alone doesn't show where on the distributions each cell lives. In the mini-visualization below, clicking any cell of the 2×2 lights up the matching region of the distribution. Predict first, then click. Calling the shot in advance — "α is probably here, β over there" — is what makes the rest of the picture readable.

	Fail to reject	Reject
H₀ true	1−α Correct	α False alarm
H₀ false	β Missed	1−β Power

Once the clicks have run, two facts should be visible on the picture:

α lives in the tails of the blue mountain (the H₀-true world). The purple mountain isn't involved.
β lives near the center of the purple mountain (the H₁-true world). It's measured on a different mountain.

That picture — "α and β sit on different mountains" — is the topic of the next section.

// α and β are probabilities of different worlds

This is the key turn for the whole column — the point where the case-by-case sort really starts to pay off.

α and β are probabilities defined under different assumptions:

α is the probability of rejecting in the world where H₀ is true.
β is the probability of failing to reject in the world where H₁ is true.

So they don't live in the same probability space. Written out it sounds slight, but this carries serious consequences. For instance, the following question now sorts itself:

"What does α + β represent?"

Answer: nothing. The 2×2 table is not built to be summed down its columns. It's built so that each row sums to 1: in the H₀-true world, (1−α) + α = 1; in the H₁-true world, β + (1−β) = 1. Adding numbers across two different worlds doesn't yield an interpretable quantity. Within each world, fail to reject + reject = 1, and that's the relation the table is showing.

The familiar asymmetry between α and β also sorts naturally from here:

α can be drawn the moment H₀ is fixed (the distribution shape is determined). That's why "α = 0.05" can be committed to before any data arrive.
β can't be drawn until the position of H₁ is specified — that is, until an effect size δ is assumed.

α is set up front; β only becomes drawable once a δ has been committed to. The asymmetry follows directly from having only one of the two worlds fully specified — the same case-by-case point as before.

06 · INTERACTIVE

// Try it — β requires a δ to even exist

The claim above — "β can't be drawn until H₁ is placed" — is the kind of thing that's faster to feel than to argue. The control below moves α and the effect size δ independently. First move only α and watch what happens to β. Then freeze α and move only δ. The contrast between the two behaviors is what makes which quantity belongs to which world visible.

α (significance)

α = 0.050

δ (effect size)

δ = 2.00

α0.050

β—

power 1−β—

critical—

Pin down what just appeared on the canvas:

Moving α alone changes the shaded area on the blue mountain. α is the protagonist; β can be treated as essentially unchanged here.
Moving δ alone slides the purple mountain. β (the unshaded area between the critical values, under purple) changes a lot. α doesn't budge.

So α is a tool of the H₀ world; β is a tool of the H₁ world. They share a screen, but they live in different worlds. The case-by-case sort is showing up directly on the canvas.

// Power is just β read backward — not a new concept

At first, "power" tends to look like a third character alongside α and β. Once the cases are sorted, it isn't.

1 − β = power

This is the probability of rejecting, still inside the H₁ world — the other side of β. It's the bottom-right cell of the 2×2. The same fact, just read as "detection rate" instead of "miss rate."

So why does power get its own name? It plugs straight into sample-size planning:

"Keep β at most 0.2" feels less actionable than "secure power of at least 0.8." Power is a forward-facing number that drops into the formula solving for the required n.
Sliding α and n inside StatPlay shows directly how power 1−β responds — the next interactive does this.

The takeaway: there is no need to memorize three concepts. The 2×2 is still just four cells. Power is β phrased the other way around.

08 · INTERACTIVE

// Try it — what shifts when you raise n?

One more quantity worth touching: the sample size n. "α and β trade off — push one down, the other goes up" is a common saying. That's only half the story. The full story is that increasing n loosens the trade-off itself. Below, vary only n. α stays fixed (0.05).

Sample size n = 30

α (FIXED)

0.050

━━━━

—

↓↓↓

power 1−β

—

↑↑↑

At n=30, the same effect size carries a high risk (β) of being judged "no effect." At n=300, that risk drops sharply. That is what "a larger n is needed" is really pointing at. It is the only legitimate way to reduce misses (β) without sacrificing α.

Pin down what showed up on the canvas:

α doesn't move. The blue area stays at 5%.
β drops as the mountains thin out and the overlap shrinks.
Therefore power 1−β rises.

The "α and β are a seesaw" metaphor is half right; the complete picture is that the seesaw's pivot itself moves with n, once everything is sorted by case.

// When you want to feel one more layer

By this point, the 2×2, the world distinction between α and β, the relationship to power, and the role of n should all be lined up.

If you'd like to see how α, β, power, effect size, and n move simultaneously during an actual test, the testing topic has a heavier-duty canvas in its "Two Errors" section.

This column is the "putting it in order" room; the other visualization is the "playing with all knobs at once" room. Order first, play second — that sequence tends to leave the whole picture in order.

// KEY TAKEAWAY

Errors are easier to keep straight as a 2×2. A one-axis description tends to lose the H₀-true / H₀-false coordinate.
α and β can be sorted as probabilities of different worlds. One belongs to the H₀ world, the other to the H₁ world. Their sum yields no interpretable quantity.
Power 1−β is β read backward. Not a new concept; a forward-facing rename that drops directly into sample-size formulas.
n reduces β without touching α. It can be sorted as the only knob that moves the trade-off's pivot itself.

FAQ

// Frequently asked questions

What does α + β equal?

Sorting by cases makes this easier to keep straight. α is the rejection rate in the H₀-true world; β is the non-rejection rate in the H₁-true world. They are probabilities of different worlds, defined under different assumptions, so adding them doesn't yield an interpretable quantity. Within each world, reject + fail to reject = 1, and that is the relation the table is showing.

How does small n affect the errors?

Sorting by cases helps. α does not move — it's a value committed to in advance. What moves is β: a small n means a large standard error σ/√n, which makes the H₀ and H₁ distributions overlap more, which raises β. Power 1−β drops correspondingly. That is what's underneath the line that "a real effect can fail to reach significance when n is small."

Why does "not significant" not mean "no difference"?

Sorting by cases makes the trap visible. The output "not significant" arises both when there really is no difference and when a real difference exists but n was too small to detect it (β large = insufficient power). Two distinct causes produce the same "fail to reject" outcome. That is why claiming "no difference" requires an additional showing — adequate power, or a confidence interval narrow enough to rule out meaningful effects.

Once the errors are sorted, take this back to a real test.

With α and β sorted as quantities of two different worlds, the next step is watching them move together inside an actual z- or t-test.
The link with confidence intervals also looks different once the 2×2 is in place.

// Try it live

Stop mixing up α and β on the same 2×2 board Move effect size δ, α, and n together — the 2×2 you just sorted out now lives as α and β shifting in real time

// Further reading

The Birthday Paradox — 50% with Just 23 People the gateway intuition-vs-probability column in motion — moving along the same lineage as Type I/II confusion Standard Deviation vs Standard Error — Telling SD and SE Apart in One Picture the SE lives inside the test denominator — the error 2×2 and the SD/SE two-tier share the same "add an axis to see" strategy in another guise

« See all columns