Steven's Knowledge

Knowledge Assistant

Natural-language access to a defined corpus of internal or product knowledge

Scenario Abstraction

There is a corpus — internal wiki, product docs, policy library, codebase, support history — and there is a population of users who today should be reading it but in practice ask a human (HR, IT, support, a senior engineer) instead. The work being replaced is "find the answer in our stuff and explain it in context."

The job of the system is to answer questions grounded in the corpus, with citations, acknowledging when the corpus doesn't cover it, and respecting the asker's permissions.

This is the most well-trodden LLM scenario; the failure modes are also well-trodden.

Solution Shape

A RAG-shaped pipeline tuned for Q&A:

  1. Corpus ingestion — connectors to source systems (Confluence, Notion, Slack, Drive, Git, Zendesk), incremental sync, ACL preservation.
  2. Pre-processing — strip boilerplate, expand tables/images, split on document structure (not character count).
  3. Indexing — embeddings + lexical (BM25), per-tenant or per-permission partitioning.
  4. Query understanding — rewrite vague questions, expand acronyms, sometimes decompose multi-part questions.
  5. Retrieval + rerank — hybrid retrieval, rerank top-N with a cross-encoder.
  6. Answer generation — model answers using only retrieved chunks, attaches inline citations, abstains on insufficient context.
  7. Feedback loop — thumbs up/down, "this answer was wrong" form, periodic eval set refresh.

The hard parts are rarely the LLM call itself — they are corpus hygiene, permissions, and freshness.

Key Building Blocks

  • Connectors with permission propagation (so the assistant never reveals what the asker can't read).
  • Chunking strategy appropriate to the document type (code differs from contracts differs from runbooks).
  • Hybrid retrieval + reranker.
  • Citation-enforced prompt — answers without supported sources are rejected and retried.
  • Eval set — a few hundred labeled question/answer pairs from real history.
  • Conversation memory — short window of prior turns, plus an explicit "based on what was just said" anchor.
  • UI — answer + collapsible source list + "report wrong" + suggested follow-ups.

Concrete Cases

  • Employee HR / IT helpdesk bot. "How many sick days do I have left in this country?" "How do I request VPN access?" Grounded in HR policy + IT runbooks, scoped by employee region.
  • Customer-facing product help. A chat that answers "how do I export to CSV" with a snippet from the docs and a link. Reduces ticket volume on already-documented questions.
  • Developer assistant on internal codebase. "Where is the auth middleware defined?" "What's the convention for adding a feature flag?" Grounded in code + ADRs + runbooks.
  • Sales rep enablement. "What's our differentiation vs Competitor X?" "Has anyone closed a deal in healthcare like this?" Grounded in battlecards + past CRM notes.
  • Field-technician copilot. Mobile interface for service techs to ask repair questions; corpus is product manuals + past tickets.
  • Compliance / legal Q&A. "Can we send marketing to this jurisdiction?" Grounded in internal policy + retained legal opinions; abstains on novel asks.
  • Onboarding companion. First-30-days assistant pre-loaded with team docs, glossary, "who owns what" map.
  • Patient education portal. Answers questions grounded in trusted medical content + the patient's care plan.

Similar Scenarios

These are knowledge assistants that look different on the surface but share the pipeline:

  • Search replacement — the same retrieval stack, with a result list instead of a generative answer.
  • Document-grounded chat over a single uploaded doc — same shape, corpus of size one.
  • Compliance lookup tool for auditors — same shape, the "answer" is the located clause itself.
  • Knowledge base auto-curation — turn the assistant inward: it flags stale or contradictory articles.

Pitfalls & Evaluation

  • The corpus is the product. A clean, current, well-structured corpus + a mediocre RAG implementation beats a brilliant RAG implementation on a messy wiki. Most time goes here.
  • Permissions leak. Test that the system never returns content the user can't read directly. Index per ACL; never filter only in the prompt.
  • Confident hallucinations on out-of-corpus questions. The model must say "I don't know" or "I don't see this covered." Train this with refusal examples and reject unsupported claims.
  • Stale answers. The wiki said one thing in 2022, the new policy lives in a different doc. Add freshness signals to ranking and surface "last updated" to the user.
  • Conversational drift. Multi-turn chat slowly loses the original question. Re-anchor to the first user message every few turns.

Useful metrics: answer correctness on a labeled eval set, citation precision (do citations actually support the claim?), refusal rate on out-of-corpus questions, deflection rate (tickets not opened that would have been), thumbs-up rate trended weekly.

On this page