How indie game sales estimates actually work — the white-box Boxleiter method

2026-05-05 7 min read for investors

sales-datamethodologyboxleiteralgorithm

Every IndieList game page that has a paid-sales estimate includes an expandable calculation. You can see the Steam review count, the starting reviews-to-units multiplier, every fixed adjustment, the final multiplier and the formula version. That visibility is useful—but it must not be confused with proof that the estimate is accurate.

The useful idea behind a Boxleiter-style estimate

Steam reviews are public while unit sales usually are not. A Boxleiter-style estimate treats review count as an indirect signal and multiplies it by an assumed number of units per review:

estimated units = Steam review count × NB

The calculation is easy to reproduce. Its weakness is equally important: the units-per-review relationship is not constant. Age, price history, keys, bundles, audience behaviour, genre, review prompts and launch conditions can all move it. Review count is evidence, but it is not a sales ledger.

What IndieList v1.0 actually does

Version v1.0 is a hand-calibrated, additive heuristic. It starts from NB = 50, adds or subtracts fixed points for release year, initial USD price, positive-review percentage, studio-size tier and a simple primary-genre keyword rule, then clamps the final result to [15, 150].

NB = clamp(50 + year + price + rating + team + genre, 15, 150)
median units = review count × NB
display range = median × [0.6, 1.4]

Those adjustments are points, not percentages and not multipliers. The exact table lives on the methodology page, and the deployed runtime exports the same data at /methodology/sales-v1.json.

A reproducible example

Consider a hypothetical paid game with these inputs:

1,000 Steam reviews;
90% positive;
released in 2024;
$19.99 initial USD price;
a small studio-size tier;
Action as its primary genre.

The v1.0 breakdown is:

base: +50;
year_2024: +0;
price_$20: +0 (the runtime uses the unrounded $19.99 value for the rule);
positive_90%: +10;
team_small: +0;
no genre adjustment because “Action” matches neither keyword group.

Final NB is 60. The median is therefore 60,000 units, with a fixed display range of 36,000 to 84,000. Anyone with the same inputs and formula version can reproduce those numbers.

Reproducible does not mean validated. The repository fixtures verify that the code returns the intended v1.0 output and that the TypeScript and Python implementations agree. They are not a backtest against official unit-sales truth.

Why the lower and upper numbers are not a confidence interval

IndieList currently displays median × 0.6 and median × 1.4. These are fixed heuristic bounds. We do not have a published, versioned ground-truth dataset that proves a stated percentage of real sales falls inside them, so the site does not call them statistically calibrated confidence bounds. Actual sales can fall outside the displayed range.

A future statistical interval would require source policies for official disclosures, explicit sample exclusions, a train/test split, error distributions and out-of-sample coverage. Until those artifacts exist, adding an accuracy percentage would be marketing copy rather than evidence.

How the revenue number is derived

The revenue figure is another transparent simplification. Version v1.0 applies three fixed model assumptions to initial USD list price:

net per unit = list price × 0.85 × (1 − 0.04) × (1 − 0.30)
             = list price × 0.5712
modeled net revenue = estimated units × net per unit

This does not reconstruct every historical discount, regional sale, refund, bundle, key, tax, tiered store term, publisher share, subscription deal or non-Steam purchase. “Modeled net revenue” is the honest label; it is not a studio or publisher financial statement.

Known failure modes

Free-to-play: reviews cannot be converted into paid copies, so IndieList suppresses the paid-sales estimate.
Bundles and keys: buyers acquired outside ordinary store purchases may review at a different rate.
Deep discounting: initial list price and one fixed discount factor cannot reconstruct a long price history.
Early Access and recent launches: review accumulation and sales timing can be uneven; v1.0 does not model that trajectory.
Unknown inputs: missing list price falls back to $15.00 and missing positive percentage is supplied as 80 by the game-service boundary. Those are model fallbacks, not observations.
Non-game Steam apps: DLC, demos, soundtracks and tools require an app-type gate and must not be treated as standalone paid games.

What white-box transparency is good for

Transparency lets a reader identify why a result looks wrong, compare two estimates using the same assumptions, and reproduce a historical version. It also makes bugs visible: an incorrect release year, app identity or price can be traced to a specific input instead of disappearing inside a proprietary score.

It does not make the model authoritative. For investment, contracts, tax, valuation or public reporting, use official disclosures and primary financial records. Treat the IndieList value as a research hypothesis whose assumptions are open for inspection.

How the model can earn stronger claims

A later formula version should be trained and evaluated against a source-audited dataset of official sales disclosures. It should publish sample construction, time alignment, platform coverage, exclusions, median and tail error, and interval coverage. Any changed coefficients or behaviour must get a new formulaVersion; old estimates stay reproducible.

Inspect it yourself

Read the exact adjustment table and validation status, download the machine-readable v1.0 method, or open a paid game and expand “How is this calculated?” to see that page’s inputs and breakdown.