Steven's Knowledge

Document-Driven Action

Reading documents, applying business rules, and performing the resulting transaction in a target system

Scenario Abstraction

A document or batch of documents arrives — invoices, bank statements, contracts, lab reports, customs forms. A human currently:

  1. Reads the documents,
  2. Cross-checks them against other records (ledger, ERP, prior contracts),
  3. Flags anything weird,
  4. Submits the resulting transaction in a target system (Xero, SAP, an EHR, a customs portal).

The work is high-volume, rule-bound, and rarely creative — but it requires reading messy real-world documents and reasoning about them with domain context. This is exactly the gap where document-only OCR/RPA solutions broke down before LLMs.

The pattern is document → understanding → reconciliation → transaction in target system → human sign-off — with the LLM doing the understanding and reconciliation layers.

Solution Shape

  1. Ingest — receive the document (email, upload, portal scrape, EDI feed).
  2. Parse layout — turn the PDF/image into structured text with positions (tables stay tables).
  3. Extract — LLM produces typed fields and line items, with bounding-box citations.
  4. Match & reconcile — for each line, find the matching record in the target system (transaction, PO, patient, contract).
  5. Apply business rules — detect mismatches: amount differs, tax code wrong, duplicate, out-of-policy, missing approval.
  6. Explain — for each flag, the LLM writes a short reason a human can verify in seconds.
  7. Draft the action — pre-fill the form / API call to the target system; do not submit yet.
  8. Human review (where needed) — a reviewer accepts / edits / rejects each item; their corrections feed back into evaluation.
  9. Submit — call the target system's API; capture the receipt; close the loop.

The autonomy dial is decided by error cost. Many teams ship "draft + 1-click confirm" before they consider full auto-submit; some classes of items (low value, high confidence, repeat pattern) graduate to auto-submit over time.

Key Building Blocks

  • Document parsing layer — Textract / Azure DI / Unstructured / Llama Parse, or a vision-LLM directly.
  • Structured extraction with citations — typed JSON + bounding boxes.
  • Reference data access — connectors / read APIs for the target system.
  • Rule engine — declarative checks (deterministic) alongside LLM judgments (probabilistic).
  • Target-system writer — the actual API call; idempotency keys; rollback for partial failures.
  • Review UI — a single screen that shows source doc, extracted fields, matched record, flags, and a confirm button.
  • Audit log — what the model saw, what it proposed, who approved, what was submitted.

Concrete Cases

  • Accounts-payable automation. Vendor invoices in → match against PO and receipt → detect duplicates, price/quantity variance, missing approval → post to ERP.
  • Bank reconciliation to Xero / QuickBooks. Bank statements + invoices in → match transactions to ledger entries → flag unexplained items → submit reconciliation. (The "replace the bookkeeper" case.)
  • Expense report auditing. Employee submits receipts → extract amount/category/policy → check against travel policy → approve or push back with reason.
  • Insurance claims intake. Claim form + supporting docs → extract fields → check policy coverage → propose payout amount → adjudicator confirms.
  • Customs / trade docs. Commercial invoice + packing list + HS classification → fill customs declaration → flag inconsistencies in weights / origin.
  • Loan application packaging. Borrower-uploaded bank statements, tax returns, pay stubs → extract income → fill underwriting worksheet.
  • Clinical lab review. Lab PDF + EHR record → reconcile patient identity → flag critical values → propose chart update.
  • Procurement contract intake. New contract PDF → extract obligations and renewal dates → create records in CLM and tasks in compliance system.
  • Property management invoicing. Vendor invoices for each unit → allocate to the right property → post to property accounting system.

Similar Scenarios

The same shape with different inputs/outputs:

  • Document-driven KYC onboarding — ID + utility bill + selfie → fill onboarding record + compliance checks.
  • Email-to-ticket — inbound vendor email → extract fields → create ticket in the right queue with metadata pre-filled.
  • Form-to-API migration — paper / PDF forms from external partners → call internal API.
  • Sensor reading → maintenance record — telemetry export → propose work order.

Pitfalls & Evaluation

  • Extraction looks right, math is wrong. Always re-compute totals from extracted line items deterministically; never trust the LLM's "total" field.
  • No grounding = no audit. Every extracted field must point back to a region of the source. Without that, you can't defend an audit, and reviewers can't trust the system.
  • Silent format drift. Vendor changes invoice template, accuracy quietly drops. Monitor extraction confidence and per-template metrics.
  • Idempotency. A retried submission must not double-post. Use a deterministic key derived from the source document.
  • Stop at the boundary you can defend. It is fine — often correct — to ship "drafts everything, human clicks submit." Don't auto-submit until you have months of measured agreement.

Useful metrics: field-level accuracy against a labeled set, per-document straight-through-processing rate, reviewer edit rate, time saved per document, false-positive rate of flags (reviewer's most hated number).

On this page