Document-Driven Action
Reading documents, applying business rules, and performing the resulting transaction in a target system
Scenario Abstraction
A document or batch of documents arrives — invoices, bank statements, contracts, lab reports, customs forms. A human currently:
- Reads the documents,
- Cross-checks them against other records (ledger, ERP, prior contracts),
- Flags anything weird,
- Submits the resulting transaction in a target system (Xero, SAP, an EHR, a customs portal).
The work is high-volume, rule-bound, and rarely creative — but it requires reading messy real-world documents and reasoning about them with domain context. This is exactly the gap where document-only OCR/RPA solutions broke down before LLMs.
The pattern is document → understanding → reconciliation → transaction in target system → human sign-off — with the LLM doing the understanding and reconciliation layers.
Solution Shape
- Ingest — receive the document (email, upload, portal scrape, EDI feed).
- Parse layout — turn the PDF/image into structured text with positions (tables stay tables).
- Extract — LLM produces typed fields and line items, with bounding-box citations.
- Match & reconcile — for each line, find the matching record in the target system (transaction, PO, patient, contract).
- Apply business rules — detect mismatches: amount differs, tax code wrong, duplicate, out-of-policy, missing approval.
- Explain — for each flag, the LLM writes a short reason a human can verify in seconds.
- Draft the action — pre-fill the form / API call to the target system; do not submit yet.
- Human review (where needed) — a reviewer accepts / edits / rejects each item; their corrections feed back into evaluation.
- Submit — call the target system's API; capture the receipt; close the loop.
The autonomy dial is decided by error cost. Many teams ship "draft + 1-click confirm" before they consider full auto-submit; some classes of items (low value, high confidence, repeat pattern) graduate to auto-submit over time.
Key Building Blocks
- Document parsing layer — Textract / Azure DI / Unstructured / Llama Parse, or a vision-LLM directly.
- Structured extraction with citations — typed JSON + bounding boxes.
- Reference data access — connectors / read APIs for the target system.
- Rule engine — declarative checks (deterministic) alongside LLM judgments (probabilistic).
- Target-system writer — the actual API call; idempotency keys; rollback for partial failures.
- Review UI — a single screen that shows source doc, extracted fields, matched record, flags, and a confirm button.
- Audit log — what the model saw, what it proposed, who approved, what was submitted.
Concrete Cases
- Accounts-payable automation. Vendor invoices in → match against PO and receipt → detect duplicates, price/quantity variance, missing approval → post to ERP.
- Bank reconciliation to Xero / QuickBooks. Bank statements + invoices in → match transactions to ledger entries → flag unexplained items → submit reconciliation. (The "replace the bookkeeper" case.)
- Expense report auditing. Employee submits receipts → extract amount/category/policy → check against travel policy → approve or push back with reason.
- Insurance claims intake. Claim form + supporting docs → extract fields → check policy coverage → propose payout amount → adjudicator confirms.
- Customs / trade docs. Commercial invoice + packing list + HS classification → fill customs declaration → flag inconsistencies in weights / origin.
- Loan application packaging. Borrower-uploaded bank statements, tax returns, pay stubs → extract income → fill underwriting worksheet.
- Clinical lab review. Lab PDF + EHR record → reconcile patient identity → flag critical values → propose chart update.
- Procurement contract intake. New contract PDF → extract obligations and renewal dates → create records in CLM and tasks in compliance system.
- Property management invoicing. Vendor invoices for each unit → allocate to the right property → post to property accounting system.
Similar Scenarios
The same shape with different inputs/outputs:
- Document-driven KYC onboarding — ID + utility bill + selfie → fill onboarding record + compliance checks.
- Email-to-ticket — inbound vendor email → extract fields → create ticket in the right queue with metadata pre-filled.
- Form-to-API migration — paper / PDF forms from external partners → call internal API.
- Sensor reading → maintenance record — telemetry export → propose work order.
Pitfalls & Evaluation
- Extraction looks right, math is wrong. Always re-compute totals from extracted line items deterministically; never trust the LLM's "total" field.
- No grounding = no audit. Every extracted field must point back to a region of the source. Without that, you can't defend an audit, and reviewers can't trust the system.
- Silent format drift. Vendor changes invoice template, accuracy quietly drops. Monitor extraction confidence and per-template metrics.
- Idempotency. A retried submission must not double-post. Use a deterministic key derived from the source document.
- Stop at the boundary you can defend. It is fine — often correct — to ship "drafts everything, human clicks submit." Don't auto-submit until you have months of measured agreement.
Useful metrics: field-level accuracy against a labeled set, per-document straight-through-processing rate, reviewer edit rate, time saved per document, false-positive rate of flags (reviewer's most hated number).