Steven's Knowledge

Citations & Source Attribution

Show your work — citations are where AI features earn user trust

Citations turn a confident-sounding paragraph into a verifiable claim. For any AI feature that summarizes, retrieves, or answers questions over real data, surfacing sources isn't a nice-to-have — it's the difference between a tool people trust and one they second-guess.

Why Citations Matter

  • Trust. Users who can verify can also rely.
  • Debugging. When the model gets something wrong, the citation tells you whether the retrieval was bad or the generation was.
  • Compliance. Many regulated domains require traceable answers.
  • Accountability. Citations shift "the AI said it" to "this specific document said it" — clearer for the user, clearer for the team.
  • Attribution. When you're using third-party content, citations are the bare minimum of fair use.

A RAG system without visible sources is doing half its job.

What to Cite

The right granularity depends on the data:

  • Documents. "Source: Q3 financial report" — coarse but useful.
  • Sections. "Source: Q3 financial report, page 14, 'Cash flow.'" — better.
  • Sentences or spans. "Source: this exact passage." — best, but harder to generate reliably.

Sentence-level attribution is the gold standard. It's also where models hallucinate citations most often, so verification matters.

Citation UI Patterns

Several common shapes:

Inline reference markers. Small numbered markers [1] inside the answer, with a sources list at the bottom or in a sidebar. The Wikipedia / academic paper pattern. Familiar, works on any output.

Hover previews. Hovering a citation surfaces the source excerpt. Best of both: doesn't clutter the answer, surfaces detail on demand.

Inline expansion. Clicking a citation expands it inline, showing the source span. Good for short documents.

Sidebar pinning. A persistent panel showing currently-cited sources alongside the answer. Best for analytical tools where users compare across sources.

Highlighting in source view. Clicking a citation jumps to the source document with the relevant span highlighted. Great when users want to read the surrounding context.

For most products, inline markers + hover previews is the right default.

Generating Reliable Citations

Models can and do hallucinate citations. Mitigations:

  • Cite from the prompt only. Tell the model to attribute claims only to documents present in the prompt; refuse if the information isn't there.
  • Use IDs, not free text. Have the model emit [doc_id: 42, span: 17-43], not [Source: the company report]. IDs are checkable; free text isn't.
  • Validate post-hoc. Parse citations out of the response, look up the cited spans, verify they exist and contain something related to the claim.
  • Faithfulness scoring. Run an LLM-as-judge over (claim, cited span) pairs to flag mismatches.

A simple validator that drops or flags malformed citations catches a surprising amount of hallucination.

When to Refuse Instead of Citing

If retrieval returns nothing relevant, the right answer is "I don't have information on that," not "let me make something up and cite it." This is hard to enforce in the prompt alone:

  • Threshold check — if retrieval scores are below a quality bar, route to a refusal.
  • Explicit instruction — "If the documents don't answer the question, say so."
  • Output validator — a final check that the response is grounded in retrieved content.

A confident wrong answer with confident wrong citations is the worst possible failure mode. It's also the most common one in naive RAG.

Showing Confidence in Citations

Not all citations are equal:

  • A direct quote from the document is solid evidence.
  • An inferred conclusion from the document is weaker.
  • A citation supporting only part of a multi-part claim is partial.

Surfacing this in UI is hard but valuable: different visual treatment for direct vs inferred citations, clear "partial" indicators, color or weight cues for citation strength.

When Sources Conflict

Real corpora contain contradictions: documents written at different times, by different teams, with different policies. The model often smooths over these contradictions, picking one source and ignoring the other. Better:

  • Surface both. "Document A says X; document B says Y."
  • Note the conflict. Don't pretend agreement where there isn't.
  • Date-aware retrieval. Weight recent documents higher when policies might have changed.

Citation Density

Too few citations and users can't verify. Too many and the answer becomes unreadable noise. Calibrate to:

  • Per claim. Each substantive factual claim should have at least one citation.
  • Skip the obvious. Citing every word of a paraphrase isn't helpful.
  • Group when possible. Multiple consecutive claims from the same source can share a marker.

Two to four citations per paragraph is usually right. More feels academic; fewer feels unverifiable.

Linking Out vs Embedding

Sources can live in three places:

  • Embedded in the response. Excerpts shown directly. Best for short, decisive sources.
  • Linked. A click opens the source document. Best when the document is long or interactive.
  • Inline link with preview. The middle ground — link out, but show a preview on hover.

Mix as needed. Internal documents usually warrant inline previews; external links usually warrant click-throughs.

A Realistic Bar

A high-quality cited answer:

  • Cites every substantive claim.
  • Uses verifiable identifiers, not free-text source names.
  • Validates citations programmatically before showing them.
  • Shows source previews on demand, without cluttering the main answer.
  • Acknowledges when sources disagree, when sources are missing, or when confidence is low.

Few products meet this bar today. The ones that do are the ones users come back to.

On this page