Skip to main content

Proportion Testing & Estimation — Visualize the z-test for Proportions

Estimation uses p̂; testing uses p₀. A single character swaps the standard-error formula — and the whole proportion test pivots on that one letter.

PROPORTION TEST & ESTIMATION

Proportion Test & Estimation — from sample proportion to the truth

The logic is exactly the same as when we did confidence intervals and hypothesis tests for means. The only twist: data is now "success or failure" counts instead of continuous measurements, so the standard error formula changes shape. Play with the sliders here and the textbook formulas start to read like a recipe you've already followed.

A world where every data point is just "success" or "failure." The sample proportion p̂ = x/n is the starting point, and when n is large enough the normal approximation kicks in (thanks to the Central Limit Theorem).
We'll work through ① interval estimation② one-sample test③ two-sample test. Compare each step with what you learned for means — spotting the similarities makes the differences easy to absorb.

Experiment Guide — try these in order
  1. Step 1: n=15, p̂=0.50, 95% → the interval is really wide. Just 15 people isn't enough precision.
  2. Step 2: Push n to 100 → the interval tightens up fast. Feel the power of sample size.
  3. Step 3: Set p̂ to 0.90 → variance p(1−p) shrinks, so the CI narrows. p̂=0.50 gives the widest interval.
  4. Step 4: Keep p̂=0.90, drop n to 10 → if ⚠ appears, the normal approximation conditions aren't met.

▶ ① Confidence interval for a proportion

Standard error SE
CI lower bound
CI upper bound
Margin of error E
Simulation — verify what "95%" really means

"What does a 95% CI actually mean?" — one of the easier spots to stall on. The answer: "if you repeated the same survey many times, about 95% of the intervals would capture the true proportion."
Run it 200 times and see whether that 95% claim holds up.

▶ ①-b CI simulation

Generated0
Coverage rate
Experiment Guide — experience proportion testing
  1. Step 1: n=100, p̂=0.60, p₀=0.50, α=0.05, two-sided → "Is 60% significantly different from 50%?"
    α (significance level) = the threshold for "too extreme to be coincidence." It determines the size of the rejection region.
  2. Step 2: Change p₀ to 0.55 → z shrinks and you can no longer reject. Small differences are hard to detect.
  3. Step 3: Increase n to 400 → same p̂=0.60, p₀=0.55 but now you reject. The power of sample size.
  4. Step 4: Switch test type to "Right" → one-sided test asking only "greater than 50%?" The p-value halves.

▶ ② One-sample z-test for a proportion

Same flow as testing a population mean. Set up H₀: p = p₀, compute z from the sample, and check whether it lands in the rejection region.
The only twist is that the standard error becomes √(p₀(1−p₀)/n). Nail that, and the rest is familiar.

Test statistic z
Critical value
p-value
Decision
Experiment Guide — compare two groups
  1. Step 1: n₁=n₂=100, p̂₁=0.60, p̂₂=0.45 → is the difference significant?
  2. Step 2: Move p̂₂ toward 0.55 → z shrinks, rejection gets harder.
  3. Step 3: Increase both to n₁=n₂=400 → same gap, more power.
  4. Step 4: Try n₁=50, n₂=200 (asymmetric) → the smaller n is the bottleneck.

▶ ③ Two-proportion z-test

"Drug A vs. Drug B — which works better?" "Ad A vs. Ad B — is the click-through rate really different?" This test is for comparing two groups.
The key idea: under H₀: p₁ = p₂, we pool both samples into a single pooled proportion to build a shared SE. Get that, and the rest mirrors the one-sample test.

Pooled proportion p̂
Test statistic z
p-value
Decision

// Formulas used here

★ Estimation and testing put different content inside SE
Estimation uses SE = √(p̂(1−p̂)/n); testing uses SE = √(p₀(1−p₀)/n). The label "SE of a proportion" is shared, but estimation grounds the SE in the sample world (your data is the reference) while testing grounds it in the H₀ world (the hypothesized proportion is the reference). Compare the denominators of the first two formulas below — that's the only line that changes.

Confidence interval for a proportion (Wald interval)

· p̂ = x/n: sample proportion (x successes out of n trials)
· zα/2: upper α/2 quantile of the standard normal. For a 95% CI, α = 0.05 and z0.025 = 1.96
· √(p̂(1−p̂)/n): standard error of the sample proportion. Similar to SE = σ/√n for means, but the variance is p(1−p) — that's what makes proportions special

One-sample z-test for a proportion

· p₀: the proportion assumed under the null hypothesis (e.g., p₀ = 0.5 for "50%")
· The denominator uses p₀, not p̂ — because in a test we work inside the H₀ world
· Under H₀ this z follows a standard normal, which lets us compute the p-value

Two-proportion z-test

· p̂: the pooled proportion — both groups combined. Under H₀: p₁ = p₂ the true rates are the same, so pooling gives a better estimate
· The denominator is the SE of the difference. The 1/n₁ + 1/n₂ term means the smaller sample is the bottleneck

// Normal approximation conditions — when can you use this method?

The z-test and CI for proportions rely on the sample proportion following a normal distribution.
The rule of thumb for this approximation to hold:

  • np ≥ 5 and n(1−p) ≥ 5 (use p₀ for tests)
  • Intuitively: "both successes and failures should occur at least 5 times"
  • When p is close to 0 or 1, the distribution becomes skewed and the normal approximation breaks down

If the conditions aren't met, use an exact binomial test or go back and collect more data.
In panel ① above, try shrinking n or pushing p̂ toward 0.01 — the ⚠ warning appears for exactly this reason.

// Estimation vs. testing — when to use which

  • Interval estimation: "Roughly where is the true proportion?" → present a confidence interval
  • Hypothesis test: "Is the true proportion different from p₀? Yes or no?" → decide via the p-value
  • They are two sides of the same coin: p₀ outside the 95% CI ⟺ reject H₀ at α = 0.05

Flip between panels ① and ② with the same n and p̂ to see this relationship in action. If p₀ sits outside the CI in ①, panel ② will reject.

// Common misconceptions

❌ "The SE for estimation and the SE for testing are the same"

For estimation, SE = √(p̂(1−p̂)/n) uses the sample proportion; for testing, SE = √(p₀(1−p₀)/n) uses the null value. The difference comes down to "which world are we basing our calculation on?"

❌ "Large n means the normal approximation always works"

If p is as extreme as 0.001, even n = 1,000 gives np = 1, which fails the condition. The "rule of 5" checks both p and n together.

❌ "For the two-sample test, compute each group's SE separately and add them"

Under H₀: p₁ = p₂, the true proportions are the same, so we pool them into a single estimate to build a shared SE. You compute separate SEs only when constructing a confidence interval for the difference.

// Shapes you'll meet again

Across proportion tests and intervals, the swap between the estimation SE and the testing SE — plus the normal-approximation check — keeps reappearing.

  • The CI assembly shape: with 120 of 200 in favor, p̂ = 0.6, SE = √(0.6 × 0.4 / 200), and the 95% CI lands at 0.6 ± 1.96 × SE. The "center ± multiplier × standard error" three-layer structure carries straight over to proportions
  • The test-statistic shape: with 35 defects in 500, p̂ = 0.07, p₀ = 0.05, z = (0.07 − 0.05) / √(0.05 × 0.95 / 500). The "H₀-world SE in the denominator" is the distinguishing detail against the estimation case
  • The normal-approximation condition: np ≥ 5 and n(1−p) ≥ 5. With n = 20 and p₀ = 0.1, np₀ = 2 falls below the threshold — that's the shape this checkpoint takes
  • The two-sample difference shape: scenarios like "Group A: 60/100 improved, Group B: 45/100 improved" carry a distinctive structure — pooled proportion sitting inside the SE in the denominator
UP NEXT —into the world where σ is unknown I6 t, χ², F