Analytics & BI Narrative

Turning numbers into prose — automatic explanations, anomaly attribution, and human-readable reports from structured data

Scenario Abstraction

Every organization has dashboards, metrics, and reports that somebody has to explain. The numbers are easy to compute; the prose around them — what changed, why it probably changed, what to do about it, what to tell whom — is what consumes analyst, ops, and exec time.

The scenario isn't "let the LLM do analytics." It's narrower and more honest: the analytics already exist; the LLM writes the narrative layer around them, grounded in those numbers, with traceability back to the queries that produced them.

This differs from Research & Synthesis (which pulls many external sources) and from Content Generation (which has no factual grounding requirement). Here the inputs are internal structured data, the facts are non-negotiable, and the value is interpretation.

Solution Shape

Semantic layer / metric catalog — agreed definitions of metrics, dimensions, and segments. Without this, the LLM and the analyst will disagree on what "revenue" means.
Trigger — schedule (weekly business review), event (KPI threshold breach), or request ("explain last week").
Structured retrieval — query the warehouse / metric store for the relevant slices; pre-compute deltas, segments, anomalies.
Candidate causes — for an anomaly, run a small set of decomposition queries (which segment, which channel, which cohort drives the change).
LLM narrative pass — given structured deltas + decomposition + business context, write prose: what happened, the most likely contributors, what's not explained, recommended actions.
Citation back to queries — every number quoted carries the query / dashboard link that produced it.
Distribution — Slack digest, email, embedded in the BI tool, briefing doc for a meeting.

The LLM does not invent numbers, does not re-aggregate, does not explore freely. It writes about the already-computed structured input. That constraint is the entire reason this works in production.

Key Building Blocks

Metric store / semantic layer (LookML, dbt metrics, Cube, etc.) so every reference is canonical.
Decomposition engine — classical attribution / driver analysis to feed structured candidates.
Anomaly detector — statistical, not LLM-based; the LLM explains what the detector finds.
Strict prompt template — what's allowed in narrative, what's forbidden (no speculation without a query).
Citation enforcement — every quoted number is hyperlinked / cited.
Distribution channel adapter — Slack, email, Notion, slide template.

Concrete Cases

Automated weekly business review. Each Monday morning, generate the WBR narrative: KPI movements, segment contributors, callouts, open questions for the team to discuss.
Anomaly-triggered briefings. A KPI breaches its band → produce a short brief: what happened, top three likely segments / events, links to dashboards.
Cohort / experiment readout drafts. Given an A/B test or cohort comparison, draft the readout including caveats (low sample, novelty, seasonality).
Customer-success exec summaries. Per-account health summary written from usage, support, billing, and pipeline data.
Sales pipeline call-out. Each rep / region's pipeline narrative for forecast calls: largest movements, slip risks, healthy late-stage deals.
Financial close commentary. Variance explanations against budget — pulled from GL data plus structured prior-period comparisons.
Marketing campaign post-mortems. Channel-by-channel narrative from spend and conversion data; specifically marks results below stat-sig threshold as inconclusive.
Ops daily standup briefings. Last 24h: incidents, throughput, top alerts, queue health, with recommended focus items.
Product-analytics insight digests. From event analytics, surface candidate insights weekly; analyst confirms / rejects each before publishing.
HR / people analytics narratives. Hiring funnel velocity, attrition by tenure / team, with appropriate privacy aggregation.

Similar Scenarios

BI question answering ("text-to-SQL") — adjacent but riskier; the LLM authors the query, so verification matters more. Consider keeping LLM as narrative over canonical queries before letting it write queries freely.
Report templating without structured analytics — closer to Content Generation.
Forecast commentary — analytics-narrative on top of forecast model outputs.
Earnings-call / shareholder-letter drafting — same shape on top of audited financials.

Pitfalls & Evaluation

Fabricated numbers. Treat any quoted number not traceable to a query as a defect. Build the pipeline so the only way to mention a figure is to retrieve it.
Spurious causality. "Revenue rose because of the campaign" — based on what evidence? Either show the attribution decomposition or don't make the claim.
Re-aggregation drift. If the LLM does any math, it will drift from the analyst's numbers. Ban math in the prompt; all aggregation happens in SQL.
Definition mismatch. "Active user" means different things to different teams. Make the metric definition part of the narrative output ("we counted active as 7-day").
Confidence theater. Generated prose sounds confident regardless of underlying data quality. Explicitly carry "low sample / inconclusive / seasonality unclear" warnings through to the output.
PII leakage in narratives. Per-row commentary on sensitive data slips into wider distribution. Aggregate before generating.

Useful metrics: analyst time saved per recurring report, narrative accuracy on a held-out set (cross-checked against ground truth queries), reader-rated usefulness, false-attribution rate (caught in review), proportion of generated claims that link to a citation.