How do you ship AI features without burning budget on tokens?

— AI & agents

Single-digit dollars per thousand MAU on AI features.

Three cost-cutting habits: cache deterministic outputs forever, route to the cheapest model that meets the quality bar, and use prompt caching for stable system prompts.

3 min readUpdated 2026-06-02By Aryan Singh Pokharia, Founding Member & Lead Developer

AptixLabs · 2026-04-30

AI features are designed around three cost-cutting habits. The studio refuses to ship an AI feature that's economically broken — a feature that costs more per user than it earns is a liability, not an asset.

Habit 1 — Cache deterministic outputs

The same prompt with the same input produces the same output. The studio caches every deterministic call — workout summaries, programme rewrites, content classifications — in Firestore keyed by an input hash. Hit rate on the largest cache is roughly 60%, which means 60% of "AI calls" never touch a model.

Habit 2 — Route to the cheapest model that meets the bar

A short classification call goes to Gemini Flash or Claude Haiku at ~$0.25 per million tokens. A long-form coaching response goes to Sonnet or GPT-4-class only when the quality difference shows in evaluation. The studio runs a quality eval on every model swap before promoting it to production.

Habit 3 — Prompt caching for stable system prompts

The studio uses prompt caching on the Anthropic API for any prompt with a stable system message, which drops the cost of repeated calls by up to 90%. The same approach works on OpenAI with their cached-input pricing.

The result

The mistake that makes AI features uneconomic

The classic failure is calling a frontier model for everything. It feels simplest, and it works in the demo — then the bill arrives and the feature is a liability. The fix is not exotic: most calls do not need the smartest model. A classification, a tag, a short summary runs fine on a small fast model at a fraction of the cost. Reserve the expensive model for the genuinely hard generation, and the per-user cost drops by an order of magnitude with no quality loss anyone notices.

Caching is the highest-leverage habit

The single biggest saving is not picking a cheaper model — it is not calling the model at all. Any output that is deterministic for a given input gets cached forever after the first computation. On our largest cache the hit rate sits around 60%, which means more than half of what looks like "AI usage" is a free database read. Pair that with prompt caching for stable system prompts and the economics stop being scary.

#llm-cost#prompt-caching#claude#gemini#cost-discipline

Have a project like this?

The studio is taking on a small number of partners. Tell us what you're building — we reply within a working day.

Start a conversation