Steven's Knowledge

Tool Calling Standards

How models invoke functions — schemas, conventions, and the surface that's slowly converging

Tool calling — also called function calling — is the mechanism by which a model produces structured invocations of code rather than free-text responses. Every major provider supports it, and the formats are similar enough to feel like a standard, even though no formal one exists. Understanding the conventions matters for building portable, reliable agentic systems.

What Tool Calling Is

The model is given:

  • A list of tools, each with a name, description, and parameter schema (typically JSON Schema).
  • A conversation or task.

The model decides whether to respond directly, or to produce one or more tool calls — structured JSON specifying which tool to invoke with what arguments. The application runs the tool, returns the result, and the loop continues.

This is the foundation of every modern agent. The protocols that wrap it (MCP, agent frameworks) all sit on top of this primitive.

The Common Shape

Across providers, tool definitions look roughly like:

{
  "name": "search_customers",
  "description": "Search the customer database by name or email.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string" },
      "limit": { "type": "integer", "minimum": 1, "maximum": 50 }
    },
    "required": ["query"]
  }
}

Tool calls in the response look like:

{
  "tool_calls": [{
    "id": "call_abc",
    "name": "search_customers",
    "arguments": "{\"query\":\"acme\",\"limit\":10}"
  }]
}

Provider differences are in the wrapper structure, not the core idea.

Provider-Specific Variations

  • OpenAI. tools array of function definitions, tool_choice to constrain selection, tool_calls in responses, tool role messages for results.
  • Anthropic. tools array, results returned as tool_result content blocks within user messages. Slightly more flexible content model.
  • Google Gemini. function_declarations, function_call in responses, function_response for results.
  • Mistral, Cohere, others. Similar shapes, varying field names.

Multi-provider libraries (LiteLLM, LangChain, AI SDK) abstract the differences. Building directly against one provider is often simpler when you're not multi-vendor.

Single Calls vs Parallel Calls

Modern providers support multiple parallel tool calls per turn — the model can request search_customers and get_inventory in the same response, with the application running both before continuing. Not all providers support this equally; a few still default to serial.

Parallel calls are a real latency win for agentic flows. When supported, design the tool surface to encourage them: make tools side-effect-free where possible, name them so the model recognizes opportunities for parallelism.

Tool Choice Modes

Three common modes:

  • Auto — the model decides whether to call a tool. The default.
  • Required / any — the model must call some tool. Useful when intent is clear and you don't want a free-text response.
  • Specific tool — force a particular call. Good for structured extraction where you know exactly what you want.

Most providers support all three; the parameter names vary.

Structured Output via Tool Calling

Tool calling is also the most reliable way to get structured output. Define a "tool" whose only purpose is to return the structured result, force the model to call it, parse the arguments. The schema-enforcement guarantees you get valid JSON conforming to the schema.

This pattern is so common that some providers have folded it into a separate "structured output" feature; the underlying mechanism is the same.

Schema Best Practices

The same schema can produce reliable tool calls or constant failures depending on how it's written:

  • Keep it small. Each new field is a new chance to get something wrong.
  • Use enums. Don't ask for free text when you mean one of five values.
  • Name fields like a human. Models follow conventional naming patterns.
  • Make required things really required. Required fields the model can't always produce force hallucination.
  • Constrain numeric ranges. Use minimum, maximum, multipleOf where they apply.
  • Describe each field. The descriptions are part of the prompt; treat them like instructions.

A clean schema with good descriptions outperforms prompt instructions for the same task, by a wide margin.

Tool Description as Prompt

The description field on a tool is one of the most underused parts of the API. The model uses it to decide when to call the tool and how to use it. Good descriptions:

  • State purpose clearly. What the tool does in one sentence.
  • Note key constraints. "Returns max 50 results." "Read-only — does not modify data."
  • Distinguish from similar tools. If you have search_users and search_customers, make the difference unmistakable.
  • Mention when not to use it. "Don't use this for searching products — use search_products instead."

The description is part of every tool-using prompt; brevity and precision compound at scale.

Error Handling Across the Boundary

Tool calls fail. The pattern that works:

  • Run the tool. Catch errors.
  • Return the error to the model. As a tool result, not as an exception bubbling up. "The search_customers tool failed with: connection timeout."
  • Let the model decide. Often it tries again, picks a different approach, or surfaces the failure to the user.

Don't silently retry tool calls without telling the model. Don't crash the whole agent loop on a transient tool failure. The tool boundary is the right place to convert exceptions into structured errors the model can reason about.

Idempotency and Side Effects

Models call tools more than once. Design accordingly:

  • Idempotent tools. Same arguments, same effect. Easier to reason about.
  • Confirmation gates. For irreversible actions (send_email, charge_card, delete_file), require human confirmation before the tool executes.
  • Audit trails. Every tool call logged with arguments, result, timing.
  • Rate limits. A model in a loop can call a tool many times in seconds; rate limits prevent runaway behavior.

The defensive posture: assume the model will sometimes call the wrong tool with the wrong arguments. The cost of that mistake should be bounded.

Where Tool Calling Is Going

A few trends to watch:

  • Stronger schema enforcement. Models that produce strictly valid JSON for their tool calls are increasingly common; the rate of malformed outputs is dropping.
  • Composite tool calls. Single requests that produce a small program — multiple calls with data flow between them — rather than one call at a time.
  • Streaming tool calls. Some providers stream tool call arguments as they're generated; others only emit the complete call. Streaming will continue to spread.
  • Native MCP integration. Tools defined in MCP servers exposed automatically as model tools, without per-application wiring.

The primitive — describe a tool, let the model invoke it, run it, return results — has been stable for two years. The surface around it keeps polishing.

On this page