Document-Driven Action

Reading documents, applying business rules, and performing the resulting transaction in a target system

Scenario Abstraction

A document or batch of documents arrives — invoices, bank statements, contracts, lab reports, customs forms. A human currently:

Reads the documents,
Cross-checks them against other records (ledger, ERP, prior contracts),
Flags anything weird,
Submits the resulting transaction in a target system (Xero, SAP, an EHR, a customs portal).

The work is high-volume, rule-bound, and rarely creative — but it requires reading messy real-world documents and reasoning about them with domain context. This is exactly the gap where document-only OCR/RPA solutions broke down before LLMs.

The pattern is document → understanding → reconciliation → transaction in target system → human sign-off — with the LLM doing the understanding and reconciliation layers.

Solution Shape

Ingest — receive the document (email, upload, portal scrape, EDI feed).
Parse layout — turn the PDF/image into structured text with positions (tables stay tables).
Extract — LLM produces typed fields and line items, with bounding-box citations.
Match & reconcile — for each line, find the matching record in the target system (transaction, PO, patient, contract).
Apply business rules — detect mismatches: amount differs, tax code wrong, duplicate, out-of-policy, missing approval.
Explain — for each flag, the LLM writes a short reason a human can verify in seconds.
Draft the action — pre-fill the form / API call to the target system; do not submit yet.
Human review (where needed) — a reviewer accepts / edits / rejects each item; their corrections feed back into evaluation.
Submit — call the target system's API; capture the receipt; close the loop.

The autonomy dial is decided by error cost. Many teams ship "draft + 1-click confirm" before they consider full auto-submit; some classes of items (low value, high confidence, repeat pattern) graduate to auto-submit over time.

Key Building Blocks

Document parsing layer — Textract / Azure DI / Unstructured / Llama Parse, or a vision-LLM directly.
Structured extraction with citations — typed JSON + bounding boxes.
Reference data access — connectors / read APIs for the target system.
Rule engine — declarative checks (deterministic) alongside LLM judgments (probabilistic).
Target-system writer — the actual API call; idempotency keys; rollback for partial failures.
Review UI — a single screen that shows source doc, extracted fields, matched record, flags, and a confirm button.
Audit log — what the model saw, what it proposed, who approved, what was submitted.

Concrete Cases

Accounts-payable automation. Vendor invoices in → match against PO and receipt → detect duplicates, price/quantity variance, missing approval → post to ERP.
Bank reconciliation to Xero / QuickBooks. Bank statements + invoices in → match transactions to ledger entries → flag unexplained items → submit reconciliation. (The "replace the bookkeeper" case.)
Expense report auditing. Employee submits receipts → extract amount/category/policy → check against travel policy → approve or push back with reason.
Insurance claims intake. Claim form + supporting docs → extract fields → check policy coverage → propose payout amount → adjudicator confirms.
Customs / trade docs. Commercial invoice + packing list + HS classification → fill customs declaration → flag inconsistencies in weights / origin.
Loan application packaging. Borrower-uploaded bank statements, tax returns, pay stubs → extract income → fill underwriting worksheet.
Clinical lab review. Lab PDF + EHR record → reconcile patient identity → flag critical values → propose chart update.
Procurement contract intake. New contract PDF → extract obligations and renewal dates → create records in CLM and tasks in compliance system.
Property management invoicing. Vendor invoices for each unit → allocate to the right property → post to property accounting system.

Similar Scenarios

The same shape with different inputs/outputs:

Document-driven KYC onboarding — ID + utility bill + selfie → fill onboarding record + compliance checks.
Email-to-ticket — inbound vendor email → extract fields → create ticket in the right queue with metadata pre-filled.
Form-to-API migration — paper / PDF forms from external partners → call internal API.
Sensor reading → maintenance record — telemetry export → propose work order.

Pitfalls & Evaluation

Extraction looks right, math is wrong. Always re-compute totals from extracted line items deterministically; never trust the LLM's "total" field.
No grounding = no audit. Every extracted field must point back to a region of the source. Without that, you can't defend an audit, and reviewers can't trust the system.
Silent format drift. Vendor changes invoice template, accuracy quietly drops. Monitor extraction confidence and per-template metrics.
Idempotency. A retried submission must not double-post. Use a deterministic key derived from the source document.
Stop at the boundary you can defend. It is fine — often correct — to ship "drafts everything, human clicks submit." Don't auto-submit until you have months of measured agreement.

Useful metrics: field-level accuracy against a labeled set, per-document straight-through-processing rate, reviewer edit rate, time saved per document, false-positive rate of flags (reviewer's most hated number).