Steven's Knowledge

OpenAI-Compatible APIs

The de facto wire format for talking to LLMs — and why it became one

OpenAI's Chat Completions API became the closest thing the LLM world has to a portable standard. Today, almost every model provider, inference engine, and local runtime exposes an "OpenAI-compatible" endpoint — same JSON shapes, same paths, same auth model. It's not a formal standard, but it functions as one, and building against it is the path of least resistance for staying portable.

What "OpenAI-Compatible" Means

A service exposes endpoints that match OpenAI's API surface closely enough that an OpenAI SDK pointed at it just works. At minimum:

  • POST /v1/chat/completions — the chat endpoint, with messages, model, temperature, max_tokens, stream, tools, etc.
  • POST /v1/embeddings — embedding requests in the same format.
  • Streaming via Server-Sent Events with the same chunk format.
  • Tool calling with the function-calling JSON shape.

The depth of compatibility varies — some providers cover 90% of edge cases, some 60%. The headline-grabbing part is: change the base URL, change the API key, your code keeps working.

Why It Won

OpenAI got there first with a clean, simple API design at exactly the moment the entire industry was figuring out what an LLM API should look like. By the time competitors arrived, application code had standardized around that shape. Building a different API meant fighting the inertia of every framework, every tutorial, every existing app.

It also helped that the API design is genuinely good — the chat-message format generalizes well, tool calls are clean, streaming is straightforward.

What Speaks It

  • Provider APIs. Anthropic, Google, Mistral, Cohere all offer OpenAI-compatible modes alongside their native APIs.
  • Model platforms. Together, Fireworks, Groq, OpenRouter, Anyscale, Replicate.
  • Inference engines. vLLM, SGLang, TGI, llama.cpp, Ollama, LM Studio — all expose OpenAI-compatible servers.
  • Cloud APIs. Bedrock, Vertex, Azure OpenAI all have compatibility layers (often imperfect).

The pragmatic implication: write your code with the OpenAI SDK, point it at any of these. Most things work.

Where Compatibility Breaks Down

"Compatible" doesn't mean "identical." Common gaps:

  • Tool calling differences. Schema shape, multi-tool parallelism, structured output adherence vary between implementations.
  • Streaming subtleties. Chunk boundaries, finish reasons, function-call streaming behave slightly differently.
  • Parameter coverage. logprobs, seed, response_format, n may or may not be supported.
  • Token counting. Tokenizers differ by model; quoted token counts are not directly comparable.
  • Model-specific features. Anthropic's prompt caching, OpenAI's logit bias, Google's safety settings — only available on native APIs.
  • Error formats. Compatible-mode errors are sometimes wrapped or simplified.

For simple chat use, compatibility is excellent. For anything advanced, native SDKs of the actual provider are usually a better fit.

Native SDKs vs Compatible Mode

When to use a provider's native SDK over the OpenAI-compatible mode:

  • You want provider-specific features. Prompt caching, computer use, batch APIs, citations, tool use customization.
  • You're locked in deliberately. No real reason to keep portability for one specific feature.
  • Better error and retry handling. Native SDKs often have richer ergonomics.

When to use OpenAI-compatible:

  • Multi-provider support is a real product requirement.
  • Local development with Ollama / LM Studio, then production with a frontier API. Same code, different base URL.
  • Hot-swapping models based on cost, latency, capacity.
  • Defensive portability — you don't want to be locked into one provider's API forever.

Many teams use both: OpenAI-compatible at the call site for portability, native SDKs in spots where they need depth.

Provider Abstraction Libraries

If you're going multi-provider, several libraries wrap the differences:

  • LiteLLM — broad coverage, OpenAI-compatible facade over many providers, including ones that don't natively offer compatibility. Popular in production.
  • AI SDK (Vercel) — provider-agnostic, mostly TypeScript-first.
  • LangChain and LlamaIndex — provider abstraction is part of their broader feature set.
  • Provider-native multi-model support — Anthropic Bedrock, Vertex Anthropic, Azure OpenAI all give you one provider's models behind a different cloud's API.

LiteLLM specifically has become a de facto pattern for "I want OpenAI-shaped code that hits whatever model I configure."

The Limits of "Compatible" as a Standard

OpenAI-compatible is a convention, not a specification. Implications:

  • No conformance tests. Implementations claim compatibility; the only way to know is to try.
  • Drift over time. OpenAI adds features; compatible providers lag behind.
  • No governance. OpenAI decides what the API does, on their schedule, for their reasons. The rest of the ecosystem reacts.

This is fine when the API surface is stable; it's painful when OpenAI introduces breaking changes (as has happened) or when an emerging feature isn't supported in your compatible provider.

What's Coming

Several efforts to formalize an actual standard:

  • OpenAI's Responses API — the company's own attempt at a more capable, agent-friendly successor. Less broadly adopted than chat completions today.
  • Anthropic Messages, Google GenAI — clean native APIs gaining their own ecosystems alongside compatibility modes.
  • OpenTelemetry GenAI semantic conventions — standardizing the observability side, not the request format.

It's possible the industry converges on something better than OpenAI-compatible. It's also possible it doesn't, and we live with this convention for years.

A Pragmatic Default

For a new application in 2025:

  • Build against an OpenAI-compatible interface using a multi-provider library (LiteLLM or similar).
  • Reach for the native SDK when you specifically need a feature it offers.
  • Treat the choice of model as configuration, not as code.
  • Plan for compatibility imperfections; have an integration test that pins behavior across providers.

That posture lets you move with the ecosystem rather than against it.

On this page