How does AptixLabs write prompts that hold up in production?

— AI & agents

Prompts that don't break in production.

A clever prompt that works in a playground and a prompt that survives ten thousand real users are different things. Here is how we write the second kind.

4 min readUpdated 2026-06-02By Aryan Singh Pokharia, Founding Member & Lead Developer

AptixLabs · 2026-05-29

A prompt that works once in a playground is not a feature. A prompt that handles ten thousand unpredictable users without producing garbage, leaking instructions, or quietly costing a fortune — that is a feature. The studio treats prompts like code: versioned, tested, and constrained.

Structure beats cleverness

Every production prompt has the same shape: a stable system message that defines the role and the hard rules, a clearly delimited block for the user-supplied content, and an explicit output contract. The stable system message is the same on every call, which means it can be prompt-cached — dropping cost on repeated calls by up to 90 percent.

Constrain the output

Free-text output is hard to use downstream and easy to break. Wherever the result feeds back into the app, the studio forces structured output — a JSON schema the model must fill — so the response is parseable and validated at the boundary. If the model returns something off-schema, the call retries rather than shipping bad data into the UI.

Guard against injection

User content can contain text that looks like instructions ("ignore the above and…"). The studio keeps user content inside a delimited block, never concatenated into the instruction section, and adds an explicit rule telling the model to treat that block as data, not commands. Anything touching a tool call gets a second guardrail pass.

Evaluate every change

A fixed eval set of representative inputs for every prompt
Re-run the eval set on every prompt edit and every model upgrade
Log production samples for the first week after any change
Roll back instantly if the eval score drops — prompts are versioned like code

Treat a prompt like code, because it is

The difference between a playground prompt and a production prompt is everything that happens after the happy path. A production prompt is versioned, has an evaluation set, enforces an output schema, and rolls back instantly when a model update degrades it. A prompt that lives only in someone's head and gets edited live is an outage waiting to happen. We store prompts in the codebase and review changes the way we review any other code.

Structured output is what makes it safe

Free-text from a model is hard to use and easy to break. Wherever a result feeds back into the app, the model is forced to fill a schema, and the response is validated at the boundary — if it comes back off-schema, the call retries rather than shipping garbage into the UI. That single constraint turns an unpredictable text generator into a dependable component you can build on.

#prompt-engineering#llm#structured-output#prompt-caching#ai

Have a project like this?

The studio is taking on a small number of partners. Tell us what you're building — we reply within a working day.

Start a conversation