Steven's Knowledge

Workflow Automation & Agents

Multi-step business processes carried out by an LLM-driven agent that operates tools, APIs, and UIs

Scenario Abstraction

A business workflow today follows a pattern like: read an inbound signal → look up a few things in different systems → make a small judgment → take an action in another system → notify a person. It involves several apps, a recurring shape, and 90% routine logic with 10% real judgment. Classical RPA can automate the deterministic parts but cracks the moment the shape varies.

The LLM-agent scenario uses a model as the decision and orchestration layer: it reads the inbound, reasons about state, calls tools (APIs, internal services, search, sometimes browser UIs), and produces the action — within a clearly defined toolbox and policy.

This is the broadest scenario in the catalog because nearly every back-office workflow has a shape like this; it differs from document-to-action in that the trigger isn't a document and the work crosses multiple systems.

Solution Shape

  1. Define the workflow — written description, in/out signals, decision points, success criteria.
  2. Build the toolbox — typed tools (functions / APIs) the agent may call. Each has a clear input/output contract and side-effect class (read-only, write, external).
  3. Constrain the loop — system prompt with policy, allowed actions, escalation criteria, hard limits (max tool calls, max cost, time bound).
  4. Plan + act loop — LLM reads state, picks the next tool, sees the result, repeats until done or blocked.
  5. Human handoff — when blocked, summarize state to a human queue with a single-click resume.
  6. Approval before mutation — for irreversible actions, require explicit human approval until the agent earns autonomy.
  7. Persistent run record — every step, tool call, and decision is logged for audit and debugging.

Agents fail less in reasoning than in tooling: tools that lie about success, vague error messages, missing search-by-ID endpoints. Tool design is most of the work.

Key Building Blocks

  • Tool registry with strong typing and versioning.
  • State management — short-term scratchpad + persistent memory for the run.
  • Policy / system prompt — the agent's job description, scope, and refusal rules.
  • Cost & step limiter — prevents runaway loops.
  • Approval & escalation UI.
  • Replay tool — given a run trace, re-run with a new prompt or model.
  • Eval suite — replayable scenarios with expected outcomes.

Concrete Cases

  • Inbound email triage and response. Read inbound mail, classify, look up customer / order info, draft a reply, send (or queue for review).
  • Support-ticket resolution agent. For a defined set of issue classes: read ticket, look up, take fix action (reissue receipt, push config, restart resource), reply to customer.
  • Sales-development outreach. Given a target list, research each account, draft a personalized first email, schedule follow-ups based on responses.
  • IT helpdesk operations. Reset password, add to group, provision app access, with verifications and policy checks.
  • Refund / cancellation processing. Verify eligibility, apply policy, process the refund, notify the customer, log the case.
  • Procurement intake. New request → classify, look up preferred vendors, draft PO, route for approval.
  • Recruiting coordination. Schedule interviews across calendars, send confirmations, reschedule on conflict, update ATS.
  • Browser-based RPA replacement. Where there's no API, an agent operates a vendor portal UI to perform claim status checks, posting updates, etc.
  • Devops "runbook agent". When an alert fires, the agent runs the documented diagnostic steps and posts a structured summary; humans still take destructive actions.
  • Marketing campaign assembly. Brief → assets, copy variants, segments, schedule across channels.

Similar Scenarios

  • Vertical-specific copilots — legal drafting assistant, ops engineer copilot, finance close copilot. Same shape, narrower toolbox.
  • Multi-step research agents — see Research & Synthesis; same loop, the toolbox is mostly retrieval.
  • Code-modification agents — same loop, toolbox is editor + tests + git.
  • Event-driven automations — webhook-triggered agents that respond to system events (file uploaded, status changed) rather than human inboxes.

Pitfalls & Evaluation

  • Tools that lie. A create_invoice that returns 200 OK but didn't actually create is fatal. Verify side effects post-hoc; don't trust tool return values blindly for critical writes.
  • Unbounded loops. Hard cap steps, cost, wall-clock. Treat exceeding the cap as a failure that escalates to a human.
  • Quiet partial failure. The agent declares success because the last tool call succeeded, even though the user's goal wasn't met. Score on end-to-end outcomes, not tool successes.
  • Permission sprawl. Agents accumulate too many credentials. Scope per workflow; rotate; least-privilege.
  • Adversarial input. Inbound emails / tickets can contain prompt injection. Treat all external content as untrusted; never let it override system policy.

Useful metrics: end-to-end success rate per workflow type, average tool calls per run (a proxy for cost and brittleness), human-handoff rate, mean time saved per processed item, incident rate (anything that should not have happened).

On this page