Multiple Regression — Control Confounders to Find True Effects
"More study time means lower scores" — a result that flips the moment you add one more variable. That moment is what confounding looks like.
Multiple Regression — Predict with multiple variables
One variable gives a line; two give a plane in 3D. But the point isn't geometry — it's removing confounding to isolate each variable's true effect. Start with the side-by-side comparison to see the moment β₁ shifts.
- Step 1: At default (corr = −0.5), compare β₁ left vs. right. Simple regression is smaller — the study effect is underestimated.
- Step 2: Set correlation to 0 → both β₁ values nearly match. "No confounding, no bias."
- Step 3: Set correlation to +0.5 → now simple regression β₁ is too large. Confounding can bias in either direction.
- Step 4: Check R² too. Multiple regression is always ≥ simple — adding a variable improved explanatory power.
▶ Simple vs. Multiple Regression — Watch β₁ Shift
- Step 1: Drag the 3D plot to rotate. The translucent surface is the regression plane — data points align along it.
- Step 2: Move the study slider → the prediction dot slides along the x₁ direction. The tilt = β₁.
- Step 3: Move sleep too → it moves along x₂. The tilt = β₂. Each variable's contribution is visible.
- Step 4: Hit Resample a few times → β₁, β₂, R² shift slightly each time. Estimates have variability too.
// What's happening with confounding?
In the comparison panel above, simple and multiple regression gave different β₁ values for the same data.
Why does adding one variable change the slope?
Here's what's happening behind the scenes:
More study → less sleep → lower scores (indirect negative effect)
Simple regression β₁ mixes "direct effect + indirect effect."
The true direct effect is +3 pts/h, but lost sleep drags scores down, so simple regression underestimates at ~+2.2.
Multiple regression β₁ holds sleep constant and extracts only studying's direct effect — that's what partial regression coefficient means.
Set correlation to 0 in the panel above and the indirect path vanishes — β₁ values converge. That's the "no confounding" state.
// Formula used here
Each part
• β₀ (intercept): baseline prediction when all variables = 0
• β₁x₁: "holding x₂ fixed, how much does y change per unit of x₁" × the value of x₁
• β₂x₂: same idea — the isolated contribution of x₂
How it differs from simple regression
• Simple regression β₁ = direct effect of x₁ + indirect effect via x₂ (mixed together)
• Multiple regression β₁ = direct effect of x₁ only (x₂ is statistically "held constant")
• Set correlation to 0 in the panel above and the indirect path vanishes — that's why both β₁ values converge
Geometric picture
• With 2 predictors, the fit is a plane in 3D. Least squares picks the tilt that minimizes total squared distance from points to plane
// Worked example — try it by hand
Predicting test scores for 30 students using study hours and sleep hours.
① Check the averages
Mean study = 5.0h, mean sleep = 7.0h, mean score = 65 pts
Correlation between study and sleep: r = −0.45 (heavy studiers sleep less)
② Simple regression y ~ x₁ (study hours only)
β₁ = +2.4 pts/h, R² = 0.32
→ Each study hour adds 2.4 pts... but this underestimates the true effect
③ Multiple regression y ~ x₁ + x₂ (add sleep hours)
β₁ = +3.1 pts/h, β₂ = +2.0 pts/h, R² = 0.57
→ Controlling for sleep, study effect rises to +3.1 pts/h
④ Why the change?
In this data, study↑ → sleep↓ (r = −0.45).
Simple regression blamed studying for the negative impact of lost sleep.
Multiple regression separated sleep out, revealing studying's true effect.
⑤ Make a prediction
β₀ = ȳ − β₁x̄₁ − β₂x̄₂ = 65 − 3.1×5 − 2.0×7 = 35.5 (OLS always passes through the means).
Student with 6h study, 7h sleep → ŷ = 35.5 + 3.1×6 + 2.0×7 = 35.5 + 18.6 + 14.0 = 68.1 pts
β₀ is the "score at 0h study, 0h sleep" — a hypothetical baseline with no intuitive meaning (don't extrapolate).
// Common misconceptions
"Study hours (0–10)" and "sleep hours (4–10)" are on different scales. Comparing raw coefficients is meaningless. Use standardized coefficients (both variables rescaled to SD = 1) to compare importance.
R² always increases with more variables, but the adjusted R² can decrease. Irrelevant variables add noise and destabilize estimates. In the simulator above, a coefficient near zero hints that the variable may be unnecessary.
When two predictors are highly correlated — like "study hours" and "library hours" — individual coefficients become erratic (signs can even flip). Watch for VIF > 10 as a warning sign.
// Shapes you'll meet again
Around multiple regression, the same coefficient-interpretation and test-comparison shapes keep returning.
- The partial-coefficient picture: with β₁ = 3, the reading "holding x₂ constant, raising x₁ by 1 lifts y by 3 on average" appears. "Hold the others fixed" is the conditional clause that travels with this picture
- When β₁ shifts between simple and multiple regression: with correlated predictors, simple regression fuses direct and indirect effects into one number. Setting correlation to 0 in the panel above brings the two estimates back into alignment
- R² vs. adjusted R² as a pair: adding a variable always raises R², but adjusted R² can fall. The shape "if adjusted drops, the new variable wasn't worth its cost" lives here
- How F-tests and t-tests divide the labor: the F-test takes "all partial coefficients = 0" at once; the t-test asks "is this one coefficient = 0?" — the model-wide and per-coefficient questions split along these two shapes
- The multicollinearity signal: VIF > 10 appears as the rule-of-thumb threshold, foreshadowing scenes where coefficients become erratic