Predicting income with statistics might sound like fortune-telling. But with just three pieces of information — age, gender, and where you live — you can get pretty close.
Statistics doesn't predict pinpoint values. It shows a range. That's its honest way of dealing with uncertainty.
Try it below.
It's a case study in how statistics handles uncertainty — using one country's data, but the methodology applies anywhere.
// Income Prediction Simulator
Select your attributes. The graph and numbers update in real time.
Confidence interval (narrow band): the true average income for everyone with your attributes is 95% likely to be here.
Estimating a group average is easier than predicting one individual's outcome. That's why the prediction interval is always wider.
// How the Prediction Works
This uses multiple regression analysis. Multiple explanatory variables (age, gender, prefecture, education, company size) predict a single outcome variable (income).
It's a simple linear model that sums up each variable's "effect."
Below are all the parameters used in this simulator.
Base Income by Age (¥10K)
| Age | 22 | 25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 | 65 |
|---|---|---|---|---|---|---|---|---|---|---|
| Income | 280 | 340 | 420 | 480 | 540 | 590 | 620 | 640 | 580 | 480 |
Ages in between are linearly interpolated.
Gender Effect (¥10K)
| Gender | Effect |
|---|---|
| Male (baseline) | ±0 |
| Female | −120 |
Education Effect (¥10K)
| Education | Effect |
|---|---|
| Graduate school | +80 |
| University (baseline) | ±0 |
| Vocational/Junior college | −40 |
| High school | −80 |
Company Size Effect (¥10K)
| Company Size | Effect |
|---|---|
| Enterprise (1000+) | +60 |
| Large (100–999, baseline) | ±0 |
| Medium (30–99) | −20 |
| Small (10–29) | −60 |
Prefecture Effect (¥10K, excerpt)
| Prefecture | Effect | Prefecture | Effect | Prefecture | Effect |
|---|---|---|---|---|---|
| Tokyo | +130 | Kanagawa | +80 | Osaka | +50 |
| Aichi | +40 | Chiba | +30 | Fukuoka | +5 |
| Hokkaido | −30 | Miyazaki | −60 | Aomori | −70 |
| Okinawa | −80 | …all 47 prefectures supported | |||
Want to learn more about multiple regression? Explore the multiple regression topic page for an interactive deep dive.
Source: https://www.mhlw.go.jp/toukei/itiran/roudou/chingin/kouzou/z2023/ ↗
Representative values from this public dataset are hardcoded — not fetched from an API in real time. Data reflects the year of survey.
These numbers describe Japan's labor market specifically. The underlying statistical reasoning, however, generalizes to any country's data.
// "Average" vs. "You"
Look at the prediction again.
"Average income of people with the same attributes: ¥X.XM"
"95% fall between ¥X.XM and ¥X.XM"
The key point: that range describes the spread among people with the same attributes, not where you specifically are.
Your own income could be anywhere in that range. Statistics alone can't tell you. Education, occupation, years of experience, luck, effort — there are countless factors not included in those three inputs.
So what statistics actually tells us is this:
If you gather 100 people with the same attributes as you, 95 of them will be somewhere between ¥X.XM and ¥X.XM.
It's not a pinpoint prediction. But isn't it more honest than fortune-telling?
// Confidence Interval vs. Prediction Interval
The wide band and the narrow band on the graph answer different questions.
Prediction interval asks:
"If I pick one person with these attributes at random, where will their income be?"
Confidence interval asks:
"If I could gather everyone with these attributes, what would their true average income be?"
Picture this: imagine 10,000 people who share your exact attributes.
Estimating their average income → probably between ¥5.8M and ¥6.2M (confidence interval, narrow)
Pointing at one person and guessing their income → between ¥4.04M and ¥7.96M (prediction interval, much wider)
Averages stabilize as data grows, but individuals vary widely. That's why the range for "guessing one person" is always wider than the range for "estimating the average."
Many explanations conflate the two and say "95% confident" without distinguishing them. Getting this distinction right is a genuine level-up in statistical literacy.
// Open Source Calculations
All calculations are open source on GitHub.
github.com/sasai-lab/statplay-opensource ↗
Data points used:
- Average income by age (22–65)
- Gender adjustment coefficient
- Prefecture adjustment coefficients (47 prefectures)
- Education level coefficients
- Company size coefficients
These are hardcoded and combined as a simple linear model. Full coefficient details are in Section 02.
A detailed walkthrough of the derivation process, the full prediction function,
and how the prediction/confidence intervals are calculated is available on Qiita (Japanese only).
公開統計から年収予測の回帰式をたてる実験 — Qiita ↗
// KEY TAKEAWAY
- Statistics doesn't "nail it" — it honestly shows a range
- The "average" for a group and "your" prediction have different ranges
- Confidence intervals (narrow) estimate the true mean; prediction intervals (wide) estimate individuals
- Being honest about uncertainty is more useful than pretending to predict pinpoints
// About These Calculations
Source: Japan's Wage Structure Basic Survey 2023 (Ministry of Health, Labour and Welfare). Representative values are hardcoded and may differ from current data. These numbers describe Japan's labor market specifically.
Yes — adding occupation, years of experience, industry, etc. would improve the model. But this column's purpose is to help you feel the difference between confidence intervals and prediction intervals. The current 5 variables (age, gender, prefecture, education, company size) provide enough complexity for that lesson.
No. This shows the statistical distribution of people with similar attributes — not a guarantee of individual income. Education, occupation, experience, and many other factors significantly affect actual income.
Yes. All calculation logic is available on GitHub — coefficients, methodology, and full source code.
Statistics isn't fortune-telling.
It's a tool that honestly shows "there's roughly an X% chance you fall in this range."
Being honest about uncertainty is, in the end, more useful than
pretending to predict pinpoints.
That's the main thing this column wants to convey.