Skip to content
indielist (beta)

Methodology

indielist publishes white-box sales estimates. Every number on every game page can be expanded into the formula that produced it and the inputs that fed it. This page documents the algorithm. The canonical source is src/lib/sales-estimate.ts in our open-source worker (mirror in python_collector/sales/algorithm.py, byte-for-byte equivalent).

1. Why white-box?

The dominant industry tool (Gamalytic) gives black-box numbers, you see "this game sold ~80K" with no breakdown. We disagree with that approach. Buyers (publishers / investors) of these numbers deserve to see the math so they can argue with it. Our promise:

Every estimate must return a breakdown array showing exactly how the number was assembled.

2. The core formula(Boxleiter-derived)

Steam review count โ†’ sales is the foundational signal. Empirically established by Mike Boxleiter (Triple Town) circa 2014:

units โ‰ˆ review_count ร— NB

where NB (the "Negative Binomial multiplier") is the average ratio of sold-units to verified reviews, fit from large game samples. We use:

So a 10K-review game has a base estimate of 350K-750K units (median 500K).

3. Adjustments(applied multiplicatively)

3.1 Year of release

The review/sale ratio has shifted over time as Steam matured. Older games have proportionally more reviews per unit sold. Our adjustment:

year_mult = clamp(1.0 + (release_year - 2020) ร— 0.04, 0.70, 1.10)

So a 2014 game's NB is multiplied by ~0.76 (units estimate lower), a 2024 game's NB by ~1.16 (units higher). Capped to prevent extreme effects on outliers.

3.2 Price tier

Higher-priced games tend to have lower review-to-sale ratios (each review is "rarer"):

3.3 Studio size

Solo / micro studios have higher review propensity (more loyal early audience). Larger studios reach more casual buyers who don't review:

4. Revenue from units

Once units are estimated, revenue is straightforward, but with two corrections:

gross_revenue = units ร— initial_price_usd
adjusted_revenue = gross_revenue ร— discount_factor ร— steam_cut

So the formula simplifies to net_revenue โ‰ˆ units ร— price ร— 0.455.

5. Confidence ranges

We always show lower / median / upper. They derive from the NB range (ยง2) plus uncertainty in adjustments. The range widens for:

6. What we explicitly do NOT model

7. Validation

We cross-check our estimates against known sales numbers (publisher announcements, Steam250 historical data, GamesIndustry.biz coverage). For games where authoritative sales are known, our median estimate is within ยฑ25% in > 70% of cases (sample size ~50, see tests/test_sales_algorithm.py fixtures).

Cases where we're systematically off: heavily-discounted bundle games (overestimate), mobile-port games (overestimate because casuals don't review), VR-only games (sample bias).

8. Comparison to Gamalytic / VGInsights

AspectindielistGamalytic
MethodologyWhite-box, public formulaBlack-box, proprietary
Breakdown shownYes, every estimateNo
Pricing (starter)$19/mo$25/mo
Open-source algorithmYes (apache-2.0)No
Adjustments documentedYes, this pagePartial blog posts

9. Source code

Algorithm: src/lib/sales-estimate.ts (TS, canonical) + Python mirror at python_collector/sales/algorithm.py.

Test fixtures + cross-language equivalence test: tests/test_sales_algorithm.py.

10. Disagree?

If our estimate for a game is materially off vs known truth, please contact us with type "General inquiry" and prefix [sales-estimate]. We log these and adjust the formula in monthly releases.


Related: FAQ ยท About ยท Terms ยท Pricing