Back to Decoded

May 17, 2026

What we learned shipping Spark Analyzer for $5/user → cents

AICase study

By James Farmer · Founder, Stratus Creative

Spark Analyzer is an AI-powered diagnostics tool for Minecraft server administrators. It takes a Spark profiler report — a detailed snapshot of server performance — and explains what's wrong in plain language, why it matters, and what to fix. We built it, shipped it, and it now has 500+ registered users and has processed 400+ diagnostic reports.

The first version was embarrassingly expensive to run.

How we started: $5–$7 per analysis

Early Spark Analyzer sent the raw Spark report file directly to GPT-4 for analysis. Spark reports are detailed. A typical report includes thread traces, method timings, entity counts, plugin call stacks, tick rate graphs — a moderately complex server generates a file that runs 50,000–150,000 tokens when serialized naively.

At GPT-4 pricing at the time, a single analysis cost $5–$7 in API spend. At 400 analyses, that's a $2,000–$2,800 API bill just to get to where we are now. That math doesn't work at scale. At 10,000 analyses/month — a realistic growth target — it's a $50,000–$70,000/month model bill. The product would be unshippable.

We knew we had to fix this before growth made it worse.

What we actually changed

Three things, in order of impact:

1. Pre-processing pipeline (90% token reduction)

Instead of sending the raw report, we built a pre-processing layer that extracts the signal before the LLM call. The key metrics for a performance diagnosis — worst-offending methods, entity counts above threshold, tick rate drops, plugin call frequency, memory pressure indicators — can be extracted with deterministic code. A rule-based parser pulls the top 20 signals from any Spark report in milliseconds.

The LLM now receives a structured summary: ~2,000–4,000 tokens instead of 50,000–150,000. The analysis quality held. The cost dropped by roughly 90%.

2. Model routing (simple vs. complex)

Not all server problems require the same reasoning depth. A server running 40 plugins with a single obvious offender (a poorly optimized world generator consuming 70% of tick time) is a simple analysis — the answer is unambiguous. A server with diffuse performance problems across 15 plugins and unusual entity behavior requires more nuanced reasoning.

We built a lightweight classifier that runs before the main LLM call. Simple reports route to Claude Haiku ($0.80/M input). Complex reports route to Claude Sonnet ($3/M input). Roughly 60–70% of reports are simple. The cost difference between routing and sending everything to Sonnet is substantial at volume.

3. Prompt caching for the system prompt

The system prompt for Spark Analyzer is detailed — it includes Minecraft-specific performance context, interpretation guidelines, output formatting instructions, and examples. It's the same across every analysis. We implemented prompt caching so the system prompt is only billed at full price on the first call in each context window; subsequent calls within the cache TTL are billed at 90% discount.

At 400+ analyses, this alone saves a meaningful percentage of the input token bill.

What it costs now

The per-analysis cost today is in the range of $0.02–$0.12 depending on report complexity and whether the analysis hits a warm cache. The $5–$7 number is gone. The business is sustainable through meaningful scale.

What this means for clients

Every AI workflow we see built by agencies — support bots, lead-qualification agents, document processors, email triagers — has this optimization problem lurking in it.

The agencies that don't know about it quote a model bill based on naive token counting. They send full documents to the LLM. They route everything to the most capable (most expensive) model. They don't implement caching. They don't build a pre-processing layer. The workflow works in demo, and the cost is only visible at production volume.

The model bill isn't a fixed cost. It's an engineering problem. The gap between an unoptimized AI workflow and an optimized one is often 80–95% in API spend, which at any real volume is the difference between a sustainable product and one that's quietly underwater.

Any AI workflow we build at Stratus includes this optimization layer as part of the build. Our Care tier includes ongoing monitoring and optimization as usage patterns evolve — because caching hit rates, model routing thresholds, and pre-processing rules all need tuning as real usage data comes in.

If you've been quoted a monthly API cost for an AI workflow and it was based on "X tokens per call × Y calls per month," run that number through our cost estimator and then ask the agency what their optimization plan is. If they don't have one, the real number is worse.

Decoded by email

One decoded piece a month. No pitch.