Error Handling & Fallbacks

LLM-powered features have richer failure modes than typical web services. The model can timeout, the provider can rate-limit, the output can be malformed, the safety filter can refuse, the tool call can fail, the answer can be wrong. The user-visible experience of these failures is what separates a feature people trust from one they abandon.

The Failure Taxonomy

A non-exhaustive list of what can go wrong:

Network errors. Provider unreachable, request times out.
Rate limits. 429s from the provider.
Provider outages. API up but degraded.
Cost / quota errors. Internal budget cap hit.
Malformed output. JSON didn't parse, schema validation failed.
Empty output. Model returned nothing usable.
Refusal. Safety filter or model refused to answer.
Tool call failures. The function the model wanted to call errored.
Context too long. Prompt exceeded the model's window.
Wrong answer. No error from any layer; the answer is just wrong.

Each one wants a different response.

Retries

Retries are the first line of defense — but only for the right errors:

Transient errors (network hiccups, 5xx, rate limits with backoff): retry with exponential backoff.
Deterministic errors (validation failures, model refused): don't retry blindly; the same input will fail the same way.
Cost-sensitive retries. Cap the number of retries; expensive calls retried infinitely become a billing incident.

Retry budgets at the user level matter: a single user request that fans out into 50 retries because of a transient provider issue is its own failure mode.

Output Validation Failures

The model returned something that didn't parse or validate:

Re-prompt with the error. "Your previous response wasn't valid JSON — here's the parser error. Try again." Often works on the second attempt.
Stricter constraints. Move from prompted JSON to schema-enforced JSON or tool calling, which has structural guarantees.
Fall back to extraction. Run a small parser that extracts what it can from a partial response.
Fail loudly to the user if all retries fail. Don't silently render a broken state.

Provider Outages

API providers go down. Plan for it before it happens:

Multi-provider fallback. Define a primary, fall through to a secondary on persistent failures. Same prompt shape; routing logic decides which to use.
Multi-region routing. When a single region is impaired, route to others.
Degraded mode. When all providers fail, fall back to a non-AI experience or cached responses rather than a hard error.
Circuit breakers. Stop hammering a failing provider; fail fast and let downstream code handle it.

A single-provider stack is a single point of failure. For non-critical features that's fine; for core flows it isn't.

Refusals and Safety Filters

The model itself refused to answer. Categorize:

Legitimate refusals (asked for harmful content): show a clear policy explanation. Don't retry.
False positives (refused a benign question): unfortunate. Provide a feedback path; route to a different prompt or model variant if persistent.
Provider-side filtering (request blocked before reaching the model): often opaque error codes. Build classifiers to recognize them and surface meaningful messages.

The pattern to avoid: silently retrying until something works, with no signal to the user. They notice. They lose trust.

Wrong Answers

The system technically succeeded but the answer is wrong. The hardest failure to handle:

Don't pretend confidence. If you have any signal that the answer might be wrong (low retrieval scores, model self-report), surface it.
Make verification easy. Citations, links, "view source" — give the user a path to check.
Feedback affordances. Thumbs down, "this is wrong" buttons, free-text reports. Capture the failure for the eval set.
Recovery options. "Try again," "explain differently," "search the docs directly" let users escape a bad answer.

The product takeaway: design every AI feature with the assumption that some answers will be wrong, and make the path forward from a wrong answer good.

Empty or Truncated Responses

The model produced nothing, or stopped mid-output:

Detect zero-output cases. Don't show an empty bubble. "I wasn't able to produce a response — please try again."
Truncation detection. If finish_reason is length, surface that. Offer to continue, or automatically continue with a follow-up call.
Partial salvage. If the truncated output is useful (e.g., 80% of a code block), show it with a notice.

Tool Call Failures

In agentic flows, tool calls fail constantly:

Pass the error back to the model. "The search tool failed with: connection timeout. Try again or take a different approach." Often the model recovers.
Bound the loop. If a tool fails three times in a row, stop trying and surface the failure to the user.
Distinguish error types. Permission denied, validation error, transient outage — each may want a different behavior.

User-Facing Error Copy

The principles:

Specific over generic. "I couldn't reach the search service" beats "Something went wrong."
Action-oriented. Tell the user what they can do. Retry, rephrase, contact support, switch context.
Honest. Don't paper over problems. Users figure it out.
Branded gracefully. Errors are part of the product; design them, don't bolt them on.

Logging and Recovery Loops

Every error is data:

Log the full context (input, prompt, model, parameters, output).
Tag with error type and severity.
Sample for human review.
Feed recurring failures back into the eval set or prompt updates.

Errors that don't reach a queue get fixed slowly or never. Errors that do drive the next round of improvement.

A Healthy Default

A well-engineered AI feature handles errors like:

Try the primary provider. Retry on transient errors with backoff.
On persistent failure, fall through to a secondary path.
On output validation failure, re-prompt with the error.
On user-visible failure, show a specific message with an action.
Log everything. Sample for review. Fold learnings into the next iteration.

It's not glamorous, but the features users come back to are the ones that recover well from the things that go wrong.

On this page