Correlation — Feel r Through Scatter Plots
Anscombe's quartet — four scatterplots that look nothing alike yet share the exact same r. A single number can't see what your eyes can.
Correlation — Measuring the Link Between Two Variables
The correlation coefficient r measures whether two variables move together. +1 means a perfect positive linear relationship, −1 a perfect negative one, 0 means no linear relationship. Click the canvas to add points and watch r update in real time. The yellow dashed lines mark the means, splitting the space into four quadrants — more points in green quadrants means positive correlation, more in red means negative. This coloring is the "tug-of-war of signs" in the formula Σ(x−x̄)(y−ȳ).
- Step 1: Set r = 0.80, click Generate → an upward-sloping band. Points cluster in the green quadrants.
- Step 2: Change to r = −0.60, Generate → downward slope. More points in the red quadrants.
- Step 3: Set r = 0.00, Generate → points spread evenly across all four quadrants. A "cloud," not a band.
- Step 4: CLEAR, then manually place a U-shape → r ≈ 0 yet there's an obvious pattern! r only captures linear relationships.
// Formula used here
Numerator — the four-quadrant story
• (xᵢ − x̄): how far each point's x is from the mean
• (yᵢ − ȳ): how far each point's y is from the mean
• Multiply: same direction → positive (green quadrants), opposite → negative (red quadrants)
• Sum: a tug-of-war between positive and negative contributions. Positive wins → r > 0, negative wins → r < 0
Denominator
• Divides by each variable's standard deviation to normalize r into −1 ≤ r ≤ 1
• Unit-free: centimeters or inches give the same r
What R² means
• "Fraction of y's variation explained by a linear relationship with x"
• R² = 0.64 → 64% of y's spread is accounted for by x
• The remaining 36% comes from other factors or randomness
- Step 1: Watch the animation. Points appear one by one, patterns look totally different…
- Step 2: Check each plot's r ≈ 0.816. They're all nearly identical!
- Step 3: Regression lines appear → lines are nearly identical too. Yet II is curved, III has an outlier, IV is dominated by one point.
- Step 4: Click "Replay" to watch again. Numbers alone don't tell the whole story.
▶ The pitfall of r — Anscombe's Quartet
// Anscombe's Quartet — what it teaches
In 1973, statistician Francis Anscombe created this dataset to show that identical summary statistics can hide completely different data patterns.
- I: ideal linear relationship. r accurately reflects what's going on
- II: a quadratic relationship. r only measures linear correlation, so it misses the curve
- III: one outlier drags r down. r is fragile against outliers
- IV: a single distant x-value creates the entire correlation. High leverage from one point
Takeaway: always plot your data before computing r or a regression equation. This is a foundational principle of statistical analysis.
// Common misconceptions
Ice cream sales and drowning deaths are strongly correlated. But ice cream doesn't cause drowning — temperature drives both (confounding variable). Correlation says "they move together," not "one causes the other."
U-shaped data can have r ≈ 0 despite a strong relationship. r measures only linear association. Non-linear patterns go undetected.
As Anscombe's quartet shows, identical r values can arise from wildly different data structures. "Plot first, compute second" is the golden rule.
// Shapes you'll meet again
A handful of recurring shapes surround the correlation coefficient.
- How r is assembled: Σ(xᵢ−x̄)(yᵢ−ȳ) / √{Σ(xᵢ−x̄)² · Σ(yᵢ−ȳ)²}. "Co-variation divided by the geometric mean of each side's variation" — the same shape that the scatter slope above traces
- r and R² correspondence: when R² = 0.49, r = ±0.7. The sign aligns with the scatter slope's direction
- The shape of the zero-correlation test: t = r√(n−2) / √(1−r²), df = n−2. "Larger r and larger n both push t up" is built into the structure
- How covariance becomes correlation: r = Cov(X,Y) / (sx·sy). Dividing covariance by the two standard deviations is the shape that strips out units