A prompt that works once in a playground is not a feature. A prompt that handles ten thousand unpredictable users without producing garbage, leaking instructions, or quietly costing a fortune — that is a feature. The studio treats prompts like code: versioned, tested, and constrained.
Structure beats cleverness
Every production prompt has the same shape: a stable system message that defines the role and the hard rules, a clearly delimited block for the user-supplied content, and an explicit output contract. The stable system message is the same on every call, which means it can be prompt-cached — dropping cost on repeated calls by up to 90 percent.
Constrain the output
Free-text output is hard to use downstream and easy to break. Wherever the result feeds back into the app, the studio forces structured output — a JSON schema the model must fill — so the response is parseable and validated at the boundary. If the model returns something off-schema, the call retries rather than shipping bad data into the UI.
Guard against injection
User content can contain text that looks like instructions ("ignore the above and…"). The studio keeps user content inside a delimited block, never concatenated into the instruction section, and adds an explicit rule telling the model to treat that block as data, not commands. Anything touching a tool call gets a second guardrail pass.
Evaluate every change
- A fixed eval set of representative inputs for every prompt
- Re-run the eval set on every prompt edit and every model upgrade
- Log production samples for the first week after any change
- Roll back instantly if the eval score drops — prompts are versioned like code
