Methodology
indielist publishes white-box sales estimates. Every number on every game page
can be expanded into the formula that produced it and the inputs that fed it. This page documents
the algorithm. The canonical source is src/lib/sales-estimate.ts in our open-source
worker (mirror in python_collector/sales/algorithm.py, byte-for-byte equivalent).
1. Why white-box?
The dominant industry tool (Gamalytic) gives black-box numbers, you see "this game sold ~80K" with no breakdown. We disagree with that approach. Buyers (publishers / investors) of these numbers deserve to see the math so they can argue with it. Our promise:
Every estimate must return a breakdown array showing exactly how the
number was assembled. 2. The core formula(Boxleiter-derived)
Steam review count โ sales is the foundational signal. Empirically established by Mike Boxleiter (Triple Town) circa 2014:
units โ review_count ร NB where NB (the "Negative Binomial multiplier") is the average ratio of sold-units to verified reviews, fit from large game samples. We use:
- NB median = 50 (modern indie baseline ~2024)
- NB lower = 35 (heavily reviewed / discount-driven)
- NB upper = 75 (cult / overperforming categories)
So a 10K-review game has a base estimate of 350K-750K units (median 500K).
3. Adjustments(applied multiplicatively)
3.1 Year of release
The review/sale ratio has shifted over time as Steam matured. Older games have proportionally more reviews per unit sold. Our adjustment:
year_mult = clamp(1.0 + (release_year - 2020) ร 0.04, 0.70, 1.10) So a 2014 game's NB is multiplied by ~0.76 (units estimate lower), a 2024 game's NB by ~1.16 (units higher). Capped to prevent extreme effects on outliers.
3.2 Price tier
Higher-priced games tend to have lower review-to-sale ratios (each review is "rarer"):
- $0-9.99 โ NB ร 0.85
- $10-19.99 โ NB ร 1.00 (baseline)
- $20-29.99 โ NB ร 1.15
- $30-49.99 โ NB ร 1.30
- $50+ โ NB ร 1.45
3.3 Studio size
Solo / micro studios have higher review propensity (more loyal early audience). Larger studios reach more casual buyers who don't review:
- Solo (1) โ NB ร 0.85
- Micro (2-5) โ NB ร 0.90
- Small (6-20) โ NB ร 1.00
- Medium (21-50) โ NB ร 1.10
- Large (51+) โ NB ร 1.20
4. Revenue from units
Once units are estimated, revenue is straightforward, but with two corrections:
gross_revenue = units ร initial_price_usd
adjusted_revenue = gross_revenue ร discount_factor ร steam_cut - discount_factor = 0.65 (rough avg of buys made during sales, derived from SteamSpy + Gamalytic public data;TBD by analysis on our price_history table)
- steam_cut = 0.70 (Valve takes 30%; ignores publisher cut downstream)
So the formula simplifies to net_revenue โ units ร price ร 0.455.
5. Confidence ranges
We always show lower / median / upper. They derive from the NB range (ยง2) plus uncertainty in adjustments. The range widens for:
- Games with < 100 reviews (small sample, random noise dominates);
- Free-to-play games (NB doesn't apply, separate model TBD);
- Early Access games (revenue accumulates unevenly);
- Game-bundled DLC / season pass games (Steam reviews are on base game).
6. What we explicitly do NOT model
- Geographic price differences: we use USD list price only; regional pricing (~30-50% discount in some markets) is not currently factored in.
- Refunds: Steam refund rate is non-trivial (~5-15%); our estimate is "units bought", not "units kept".
- Wishlists โ conversion: not modeled.
- Outside-Steam sales: Epic, GOG, console, mobile, not modeled (Steam โ 60-90% of PC indie revenue typically).
- Game Pass / subscription revenue: tracked separately in our ITAD data, not in main estimate.
7. Validation
We cross-check our estimates against known sales numbers (publisher announcements, Steam250
historical data, GamesIndustry.biz coverage). For games where authoritative sales are known,
our median estimate is within ยฑ25% in > 70% of cases (sample size ~50,
see tests/test_sales_algorithm.py fixtures).
Cases where we're systematically off: heavily-discounted bundle games (overestimate), mobile-port games (overestimate because casuals don't review), VR-only games (sample bias).
8. Comparison to Gamalytic / VGInsights
| Aspect | indielist | Gamalytic |
|---|---|---|
| Methodology | White-box, public formula | Black-box, proprietary |
| Breakdown shown | Yes, every estimate | No |
| Pricing (starter) | $19/mo | $25/mo |
| Open-source algorithm | Yes (apache-2.0) | No |
| Adjustments documented | Yes, this page | Partial blog posts |
9. Source code
Algorithm:
src/lib/sales-estimate.ts
(TS, canonical) + Python mirror at
python_collector/sales/algorithm.py.
Test fixtures + cross-language equivalence test:
tests/test_sales_algorithm.py.
10. Disagree?
If our estimate for a game is materially off vs known truth, please contact us
with type "General inquiry" and prefix [sales-estimate]. We log these and adjust the
formula in monthly releases.