Tool Calling Standards
How models invoke functions — schemas, conventions, and the surface that's slowly converging
Tool calling — also called function calling — is the mechanism by which a model produces structured invocations of code rather than free-text responses. Every major provider supports it, and the formats are similar enough to feel like a standard, even though no formal one exists. Understanding the conventions matters for building portable, reliable agentic systems.
What Tool Calling Is
The model is given:
- A list of tools, each with a name, description, and parameter schema (typically JSON Schema).
- A conversation or task.
The model decides whether to respond directly, or to produce one or more tool calls — structured JSON specifying which tool to invoke with what arguments. The application runs the tool, returns the result, and the loop continues.
This is the foundation of every modern agent. The protocols that wrap it (MCP, agent frameworks) all sit on top of this primitive.
The Common Shape
Across providers, tool definitions look roughly like:
{
"name": "search_customers",
"description": "Search the customer database by name or email.",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string" },
"limit": { "type": "integer", "minimum": 1, "maximum": 50 }
},
"required": ["query"]
}
}Tool calls in the response look like:
{
"tool_calls": [{
"id": "call_abc",
"name": "search_customers",
"arguments": "{\"query\":\"acme\",\"limit\":10}"
}]
}Provider differences are in the wrapper structure, not the core idea.
Provider-Specific Variations
- OpenAI.
toolsarray of function definitions,tool_choiceto constrain selection,tool_callsin responses,toolrole messages for results. - Anthropic.
toolsarray, results returned astool_resultcontent blocks within user messages. Slightly more flexible content model. - Google Gemini.
function_declarations,function_callin responses,function_responsefor results. - Mistral, Cohere, others. Similar shapes, varying field names.
Multi-provider libraries (LiteLLM, LangChain, AI SDK) abstract the differences. Building directly against one provider is often simpler when you're not multi-vendor.
Single Calls vs Parallel Calls
Modern providers support multiple parallel tool calls per turn — the model can request search_customers and get_inventory in the same response, with the application running both before continuing. Not all providers support this equally; a few still default to serial.
Parallel calls are a real latency win for agentic flows. When supported, design the tool surface to encourage them: make tools side-effect-free where possible, name them so the model recognizes opportunities for parallelism.
Tool Choice Modes
Three common modes:
- Auto — the model decides whether to call a tool. The default.
- Required / any — the model must call some tool. Useful when intent is clear and you don't want a free-text response.
- Specific tool — force a particular call. Good for structured extraction where you know exactly what you want.
Most providers support all three; the parameter names vary.
Structured Output via Tool Calling
Tool calling is also the most reliable way to get structured output. Define a "tool" whose only purpose is to return the structured result, force the model to call it, parse the arguments. The schema-enforcement guarantees you get valid JSON conforming to the schema.
This pattern is so common that some providers have folded it into a separate "structured output" feature; the underlying mechanism is the same.
Schema Best Practices
The same schema can produce reliable tool calls or constant failures depending on how it's written:
- Keep it small. Each new field is a new chance to get something wrong.
- Use enums. Don't ask for free text when you mean one of five values.
- Name fields like a human. Models follow conventional naming patterns.
- Make required things really required. Required fields the model can't always produce force hallucination.
- Constrain numeric ranges. Use
minimum,maximum,multipleOfwhere they apply. - Describe each field. The descriptions are part of the prompt; treat them like instructions.
A clean schema with good descriptions outperforms prompt instructions for the same task, by a wide margin.
Tool Description as Prompt
The description field on a tool is one of the most underused parts of the API. The model uses it to decide when to call the tool and how to use it. Good descriptions:
- State purpose clearly. What the tool does in one sentence.
- Note key constraints. "Returns max 50 results." "Read-only — does not modify data."
- Distinguish from similar tools. If you have
search_usersandsearch_customers, make the difference unmistakable. - Mention when not to use it. "Don't use this for searching products — use
search_productsinstead."
The description is part of every tool-using prompt; brevity and precision compound at scale.
Error Handling Across the Boundary
Tool calls fail. The pattern that works:
- Run the tool. Catch errors.
- Return the error to the model. As a tool result, not as an exception bubbling up. "The
search_customerstool failed with:connection timeout." - Let the model decide. Often it tries again, picks a different approach, or surfaces the failure to the user.
Don't silently retry tool calls without telling the model. Don't crash the whole agent loop on a transient tool failure. The tool boundary is the right place to convert exceptions into structured errors the model can reason about.
Idempotency and Side Effects
Models call tools more than once. Design accordingly:
- Idempotent tools. Same arguments, same effect. Easier to reason about.
- Confirmation gates. For irreversible actions (send_email, charge_card, delete_file), require human confirmation before the tool executes.
- Audit trails. Every tool call logged with arguments, result, timing.
- Rate limits. A model in a loop can call a tool many times in seconds; rate limits prevent runaway behavior.
The defensive posture: assume the model will sometimes call the wrong tool with the wrong arguments. The cost of that mistake should be bounded.
Where Tool Calling Is Going
A few trends to watch:
- Stronger schema enforcement. Models that produce strictly valid JSON for their tool calls are increasingly common; the rate of malformed outputs is dropping.
- Composite tool calls. Single requests that produce a small program — multiple calls with data flow between them — rather than one call at a time.
- Streaming tool calls. Some providers stream tool call arguments as they're generated; others only emit the complete call. Streaming will continue to spread.
- Native MCP integration. Tools defined in MCP servers exposed automatically as model tools, without per-application wiring.
The primitive — describe a tool, let the model invoke it, run it, return results — has been stable for two years. The surface around it keeps polishing.