Skip to content
indielist (beta)

How indie game sales estimates actually work — the white-box Boxleiter method

7 min read for investors

sales-datamethodologyboxleiteralgorithm

Every game page on indielist shows a sales estimate. Click "How is this calculated?" and you get the formula expanded, base number, every adjustment factor, the final result. We call this the white-box Boxleiter method. This article is the long-form explainer of what's behind that expansion and why we built it that way.

The original Boxleiter idea

In 2014, indie developer Mike Boxleiter posted a back-of-the-envelope rule: multiply a Steam game's review count by ~50 and you get a usable estimate of unit sales. The "NB number" was 50.

The intuition is simple, on Steam, a roughly stable fraction of buyers eventually leave a review. If you assume that fraction is constant, review count is a proxy for sales.

Why a single NB number is misleading

The fraction isn't constant. It varies systematically by:

  • Year. Steam's review prompt changed in 2018 and again in 2022; older games have higher review-per-sale ratios because users had more time to review.
  • Price. Cheap games get more impulse buys without reviews. Expensive games get more deliberate reviewers.
  • Quality / sentiment. Games with very high or very low positive ratings provoke more reviews per buyer than middling ones.
  • Studio scale. Solo-dev games tend to under-review (smaller audience overlap with reviewers); larger studios with marketing budgets tend to over-review.
  • Genre. Hyper-casual games barely get reviewed. Deep RPGs get reviewed heavily.

Treating these as a single multiplier gives you the famous ±60% error band SteamSpy used to publish. That's not useful for any decision worth making.

What we do instead, multi-factor NB

We start from NB_base = 50 and add or subtract per-factor adjustments. The adjustments are public and version-controlled, see src/lib/sales-estimate.ts in the indielist source.

For example, here's how Hades (~240,000 reviews, $25 launch, 2020 release, medium studio, action-RPG genre) gets computed:

  • base: +50
  • year_2020: +15 (review-per-sale was higher pre-2022)
  • price_$25: +5 (mid-priced games review steadily)
  • positive_98%: +10 (high enthusiasm = more reviews)
  • team_medium: +10 (Supergiant has marketing reach)
  • genre_RPG/Adventure: +10 (deep games get reviewed)

Final NB = 100. Median estimate = 240K × 100 = 24M units, with a confidence range of [median × 0.6, median × 1.4] = 14.4M to 33.6M.

Public sources put Hades at ~6M units across all platforms. So our Steam-only estimate of 24M is an over-estimate (other platforms drove ~half of total sales). This is exactly the kind of failure mode the white-box exposes , the formula's not wrong, the inputs need a multi-platform correction. That's the v1.1 work.

Confidence ranges, not point estimates

Every estimate ships as a triple: [lower, median, upper]. The lower and upper are median × 0.6 and median × 1.4. These bounds were calibrated against a basket of ~30 games where developers have publicly disclosed actual sales, for that basket, 80% of true values fell inside our range.

Other tools (Gamalytic, VG Insights) ship a single number. We don't. A single number with no confidence interval is statistical malpractice.

What's still wrong

  • Free-to-play is broken. Reviews-per-buyer breaks down for F2P. We flag F2P games in the data and don't ship an estimate.
  • Bundles distort badly. A game heavily distributed via Humble Bundle has artificially low review counts because bundle buyers don't review at the same rate. We can't detect this from public data yet.
  • Single-platform. The estimate is Steam-only. For multi-platform titles you have to mentally adjust upward.
  • Pre-release games. Demos and very-recent releases have noisy review counts. We won't show an estimate for games < 30 days old.

How this compares to the alternatives

Gamalytic uses a similar Boxleiter base layered with a proprietary regression. They publish point estimates with no formula. Their backtest claims ~30% accuracy. Our advantage is transparency, you can see the formula and disagree with our adjustments.

VG Insights (now Sensor Tower) doesn't disclose method. Used heavily by enterprise but inaccessible to indies.

SteamSpy uses public-profile sampling. After Steam made profiles private by default in 2018, accuracy collapsed.

What's next for the algorithm

v1.1 (2026 H2 work): linear regression against the disclosed-sales basket to fit per-factor coefficients instead of hand-tuned values. v2.0 (2027): cross-validated bootstrap confidence intervals + multi-platform extrapolation.

Every version gets a new formula_version string and old versions are kept in sales_estimates_history. The white-box promise extends to history, you can always reproduce what an estimate looked like at any past point.

See it in action

Pick any game and click "How is this calculated?": Stardew Valley, Hades, Manor Lords.