Hypothesis Testing — Reject Regions & p-values
Hypothesis testing has the structure of a courtroom. H₀ is presumed innocent; the rejection region is where conviction begins — z just shows which side it falls on.
Hypothesis Testing — Reject or Fail to Reject
Think of testing as a trial.
You start by assuming H₀ ("the drug has no effect" = "innocent"). Then if your computed test statistic z lands in the pre-chosen rejection region, you convict — that is, reject H₀.
Two panels below: ① geometry of z and rejection regions (two-sided, right, left), and ② false alarms (α) vs. misses (β).
- Step 1: Panel ①: z=1.96, α=0.05, two-sided → right on the boundary. p≈0.05. The watershed.
α (significance level) = the threshold for "too extreme to be coincidence." It sets the width of the rejection zone. - Step 2: Push z to 2.5 → deep in the rejection zone, p-value shrinks. "Strong evidence."
- Step 3: Switch to "Right" test → same z=1.96 but rejection area is one-sided; p-value halves.
▶ ① Basics: z-statistic & rejection region
- Step 1: Set effect size δ to 0 → H₁ overlaps H₀ completely, power drops to zero. No difference = can't detect.
- Step 2: Slide δ from 2 to 3 → the purple (H₁) curve separates, power climbs.
- Step 3: Lower α to 0.01 → fewer false alarms, but more misses (β rises). Feel the trade-off.
▶ ② Two kinds of errors: α, β, power
This panel is a hands-on playground for the α (Type I) / β (Type II) / power trade-off.
Come back whenever you want to feel the trade-off. For the deeper conceptual write-up (the 2×2 matrix, why α and β live in different worlds), see the "Type I & II in a 2×2 table" column.
Tip: drag horizontally on the chart to slide the critical boundary (α).
// Formula used here
Left formula in plain English
• "If this drug truly has no effect (H₀: μ = μ₀), how many standard errors is our sample mean from μ₀?"
Each part
• X̄ − μ₀: the gap between observed and "no effect" — bigger means more suspicious
• ÷ σ/√n: converts to "how many SEs?" — 1 SE is normal, 3 SEs is extremely rare
Right formula: p-value (two-sided)
• "Assuming H₀ is true, the probability of seeing a |z| this extreme or more" — that's what we want to know
• p-value ≤ α (significance level) → reject. "A result this extreme is too unlikely to be chance"
• p-value > α → fail to reject. "This could easily happen by chance"
⚠️ p-value ≠ "probability that H₀ is true"
• The p-value is computed assuming H₀ is true. It cannot tell you whether H₀ is actually true or false
• This is a common point of confusion
// The 5-step testing recipe — use this order every time
- State hypotheses: H₀: "no difference" vs. H₁: "there is a difference"
- Choose α before collecting data (e.g., 0.05)
- Compute the test statistic: z, t, χ², etc. — a single number summarizing "how far the data is from H₀"
- Find the p-value: look up the probability from the statistic
- Decide: p ≤ α → reject; p > α → fail to reject
// The courtroom analogy — going deeper
- H₀ = "defendant is innocent"; H₁ = "defendant is guilty"
- Rejecting H₀ = guilty verdict: "it's implausible that an innocent person would produce this evidence"
- Failing to reject = insufficient evidence: not "proven innocent," just "not enough to convict"
- That's why we say "fail to reject H₀" rather than "accept H₀"
// Connection to confidence intervals
μ₀ inside the 95% CI ⟺ fail to reject at α = 0.05. Same math, different lens: the CI shows "width," the test gives "yes/no." The conclusions always agree.
// The α-β trade-off
For a full breakdown of α, β, and power, see the "Type I & II in a 2×2 table" column.
// Common misconceptions
The p-value measures how surprising the data is, not how big the effect is. With n = 1,000,000 a tiny difference can yield p < 0.001. Always check effect size separately.
Misconceptions about errors and power ("not significant = no difference," "smaller α is always better," etc.) are collected in the FAQ of the "Type I & II in a 2×2 table" column.
// Shapes you'll meet again
Across hypothesis tests, the same cast keeps returning: comparing p with α, choosing one-sided vs. two-sided, and the 2×2 of error types.
- The p-value-vs-α comparison: with p = 0.03, α = 0.01, the relation 0.03 > 0.01 lands on "fail to reject." "Is the observed p inside α or not?" is the recurring decision shape
- One-sided vs. two-sided sorting: "does the drug raise the mean?" calls for one-sided; "does it change in either direction?" calls for two-sided. The question's orientation forks into this shape
- The 2×2 of errors: rejecting a true H₀ is Type I, failing to reject a false H₀ is Type II. Wherever a test lives, this 2×2 table tags along
- Where power sits: 1 − β = "the probability of detecting a real difference when one exists." It appears in this same shape, as the flip side of Type II error
Look up critical values in the Interactive Distribution Tables — z and t values with live graph sync