Steven's Knowledge

Structured Output

Treat the model as a function that returns typed data, not a chatbot that returns prose

The single biggest reliability win for production LLM systems is forcing structured output. A response you can parse is a response you can validate, route, retry, and integrate. A blob of prose is none of those things.

Why Structure Wins

  • Programmatic — you can branch on fields, route on enums, store in a database.
  • Validatable — you can reject malformed responses immediately and retry.
  • Composable — structured output from one call becomes input to another.
  • Cheaper to test — you can assert on fields, not just on substrings.

Mechanisms

In rough order of strictness:

  1. JSON mode — the model is constrained to produce valid JSON. Available on most major APIs.
  2. JSON schema / structured outputs — the model is constrained to produce JSON matching a specific schema. The strongest guarantee; available on OpenAI, Gemini, and increasingly elsewhere.
  3. Tool/function calling — declare a function with a typed schema; the model returns arguments matching it. Functionally equivalent to schema-constrained output and often the most reliable form.
  4. Prompt-and-parse — ask for JSON in the prompt, parse the response. Works, but not safe without validation.

Always validate the output against your schema even when the API claims to enforce it. Defense in depth.

Schema Design Tips

  • Keep schemas small. Each new field is a new chance for the model to get it wrong.
  • Use enums for finite choices. Don't ask for free text when you mean one of five values.
  • Make optional fields really optional. Required fields the model can't always produce force it to hallucinate.
  • Name fields like a human would. Models follow naming conventions that look like real code.

XML Tags for Long Inputs

For tasks where you're passing in long context (documents, transcripts, code), XML-style tags work well:

<document>
{the document}
</document>

<question>
{the question}
</question>

It's not literal XML — it's just a structured marker that models reliably follow.

Where Structure Hurts

Heavy schemas can constrain the model into worse answers. If you're asking for nuanced judgment, give it room to write its reasoning in a free-text field, then capture the decision in a structured one. The shape of the schema is itself a prompt.

On this page