How Far Can Statistics Predict Your Income?

Predicting income with statistics might sound like fortune-telling. But with just three pieces of information — age, gender, and where you live — you can get pretty close.

Statistics doesn't predict pinpoint values. It shows a range. That's its honest way of dealing with uncertainty.

Try it below.

Note: The data used here is from Japan's Basic Survey on Wage Structure (Ministry of Health, Labour and Welfare). The numbers reflect Japan's labor market and may not generalize to other countries.

It's a case study in how statistics handles uncertainty — using one country's data, but the methodology applies anywhere.

// Income Prediction Simulator

Select your attributes. The graph and numbers update in real time.

Age 35

Gender

Prefecture

Education

Company size

Predicted Income (avg) —×10k ¥

95% Prediction Interval —

95% Confidence Interval —

Prediction interval (wide band): pick one person with your attributes at random — 95% chance their income falls here.
Confidence interval (narrow band): the true average income for everyone with your attributes is 95% likely to be here.
Estimating a group average is easier than predicting one individual's outcome. That's why the prediction interval is always wider (drag the confidence interval itself).

// How the Prediction Works

This uses multiple regression analysis. Multiple explanatory variables (age, gender, prefecture, education, company size) predict a single outcome variable (income).

ŷ = f(age) + β_gender + β_prefecture + β_education + β_{company size}

It's a simple linear model that sums up each variable's "effect."
Below are all the parameters used in this simulator.

Base Income by Age (¥10K)

Age	22	25	30	35	40	45	50	55	60	65
Income	280	340	420	480	540	590	620	640	580	480

Ages in between are linearly interpolated.

Gender Effect (¥10K)

Gender	Effect
Male (baseline)	±0
Female	−120

Education Effect (¥10K)

Education	Effect
Graduate school	+80
University (baseline)	±0
Vocational/Junior college	−40
High school	−80

Company Size Effect (¥10K)

Company Size	Effect
Enterprise (1000+)	+60
Large (100–999, baseline)	±0
Medium (30–99)	−20
Small (10–29)	−60

Prefecture Effect (¥10K, excerpt)

Prefecture	Effect	Prefecture	Effect	Prefecture	Effect
Tokyo	+130	Kanagawa	+80	Osaka	+50
Aichi	+40	Chiba	+30	Fukuoka	+5
Hokkaido	−30	Miyazaki	−60	Aomori	−70
Okinawa	−80	…all 47 prefectures supported

Want to learn more about multiple regression? Explore the multiple regression topic page for an interactive deep dive.

This column is based on Japan's "Basic Survey on Wage Structure 2023" (Ministry of Health, Labour and Welfare).
Source: https://www.mhlw.go.jp/toukei/itiran/roudou/chingin/kouzou/z2023/ ↗

Representative values from this public dataset are hardcoded — not fetched from an API in real time. Data reflects the survey year.

These numbers describe Japan's labor market specifically. The underlying statistical reasoning, however, generalizes to any country's data.

// "Average" vs. "You"

Look at the prediction again.

"Average income of people with the same attributes: ¥X.XM"
"95% fall between ¥X.XM and ¥X.XM"

The key point: that range describes the spread among people with the same attributes, not where you specifically are.

Your own income could be anywhere in that range. Statistics alone can't tell you. Occupation, years of experience, luck, effort — there are countless factors the model doesn't include.

So what statistics actually says is this:

If you gather 100 people with the same attributes as you, 95 of them will be somewhere between ¥X.XM and ¥X.XM.

It's not a pinpoint prediction. But isn't it more honest than fortune-telling?

// Confidence Interval vs. Prediction Interval

The wide band and the narrow band on the graph answer different questions.

Prediction interval asks:
"If you pick one person with these attributes at random, where will their income be?"

Confidence interval asks:
"If you could gather everyone with these attributes, what would their true average income be?"

Picture this: imagine 10,000 people who share your exact attributes.

Estimating their average income → probably between ¥5.8M and ¥6.2M (confidence interval, narrow)
Pointing at one person and guessing their income → between ¥4.04M and ¥7.96M (prediction interval, much wider)

Averages stabilize as data grows, but individuals vary widely. That's why the range for "guessing one person" is always wider than the range for "estimating the average."

Many explanations conflate the two and say "95% confident" without distinguishing them. Getting this distinction right is a genuine level-up in statistical literacy.

// Open Source Calculations

All calculations are open source on GitHub.
github.com/sasai-lab/statplay-opensource ↗

Data points used:

Average income by age (22–65)
Gender adjustment coefficient
Prefecture adjustment coefficients (47 prefectures)
Education level coefficients
Company size coefficients

These are hardcoded and combined as a simple linear model. Full coefficient details are in Section 02.

A detailed walkthrough of the derivation process, the full prediction function, and how the prediction/confidence intervals are calculated is available on Qiita (Japanese only).
Building an Income Prediction Model from Public Statistics — Qiita ↗

// KEY TAKEAWAY

Statistics doesn't "nail it" — it honestly shows a range
The "average" for a group and "your" prediction have different ranges
Confidence intervals (narrow) estimate the true mean; prediction intervals (wide) estimate individuals
Being honest about uncertainty is more useful than pretending to predict pinpoints

FAQ

// About These Calculations

Source: Japan's Basic Survey on Wage Structure 2023 (Ministry of Health, Labour and Welfare). Representative values are hardcoded and may differ from current data. These numbers describe Japan's labor market specifically.

Yes — adding occupation, years of experience, industry, etc. would improve the model. But this column's purpose is to help you feel the difference between confidence intervals and prediction intervals. The current five variables (age, gender, prefecture, education, company size) provide enough complexity for that lesson.

No. This shows the statistical distribution of people with similar attributes — not a guarantee of individual income. Education, occupation, experience, and many other factors significantly affect actual income.

Yes. All calculation logic is available on GitHub — coefficients, methodology, and full source code.

Statistics isn't fortune-telling.

It's a tool that honestly shows "there's roughly an X% chance you fall in this range."
Being honest about uncertainty is, in the end, more useful than pretending to predict pinpoints.
That's the main thing this column wants to convey.

// Try it live

Slice income variability across predictors with multiple regression Move age, gender, and prefecture together — partial coefficients and confounder control turn the column's argument into a visible prediction interval

// Further reading

Type I vs Type II Errors — One 2×2 Table Sorts It Out the Type I/II framework kicks into motion every time regression coefficient significance is judged — the sister column that shares the decision logic

« See all columns

Age, gender, prefecture.Just three inputs reveala range of income.

// Income Prediction Simulator

// How the Prediction Works

Base Income by Age (¥10K)

Gender Effect (¥10K)

Education Effect (¥10K)

Company Size Effect (¥10K)

Prefecture Effect (¥10K, excerpt)

// "Average" vs. "You"

// Confidence Interval vs. Prediction Interval

// Open Source Calculations

// KEY TAKEAWAY

// About These Calculations

Age, gender, prefecture.
Just three inputs reveal
a range of income.