— AI & agents

Single-digit dollars per thousand MAU on AI features.

Three cost-cutting habits: cache deterministic outputs forever, route to the cheapest model that meets the quality bar, and use prompt caching for stable system prompts.

3 min readPublished 2026-04-30By AptixLabs studio
Glowing futuristic chip, representing AI compute cost disciplineAptixLabs · 2026-04-30

AI features are designed around three cost-cutting habits. The studio refuses to ship an AI feature that's economically broken — a feature that costs more per user than it earns is a liability, not an asset.

Habit 1 — Cache deterministic outputs

The same prompt with the same input produces the same output. The studio caches every deterministic call — workout summaries, programme rewrites, content classifications — in Firestore keyed by an input hash. Hit rate on the largest cache is roughly 60%, which means 60% of "AI calls" never touch a model.

Habit 2 — Route to the cheapest model that meets the bar

A short classification call goes to Gemini Flash or Claude Haiku at ~$0.25 per million tokens. A long-form coaching response goes to Sonnet or GPT-4-class only when the quality difference shows in evaluation. The studio runs a quality eval on every model swap before promoting it to production.

Habit 3 — Prompt caching for stable system prompts

The studio uses prompt caching on the Anthropic API for any prompt with a stable system message, which drops the cost of repeated calls by up to 90%. The same approach works on OpenAI with their cached-input pricing.

The result

Have a project like this?

The studio is taking on a small number of partners. Tell us what you're building — we reply within a working day.

Start a conversation