Research & Synthesis

Surveying many sources across the web or an internal corpus and producing a defensible synthesized brief

Scenario Abstraction

An analyst, strategist, or knowledge worker is asked: "What's going on with X? Pull together the relevant material and give me a brief I can act on." The work is survey + synthesis: gather many sources of varying quality, extract the relevant claims, weigh them, and produce a structured output (a memo, a market map, a competitive teardown, a literature review, a daily news brief).

Unlike a knowledge assistant, the corpus is open (or at least dynamic) — sources are retrieved fresh per question. Unlike RAG over a fixed corpus, source credibility, recency, and conflict are first-class concerns.

This is the area where "deep research" agent products live: long-running, multi-step, browse-the-web-and-write.

Solution Shape

Question decomposition — break the ask into sub-questions; identify what kind of source answers each.
Search planning — choose source types (web search, academic search, internal docs, specific databases, social signals).
Iterative retrieval — search, read snippets, refine queries, follow citations, deepen on the promising leads.
Per-source extraction — pull the relevant claims with provenance (URL + access date + quote).
Conflict & quality handling — when sources disagree, surface the disagreement rather than averaging; prefer primary over secondary.
Outline → draft — agree on the structure of the deliverable, then fill it with cited claims.
Self-critique — gap analysis: which sub-questions are weakly sourced? Re-search those.
Deliverable — structured document with inline citations, an "open questions" section, and a reproducible log of what was searched.

This is one of the most agentic scenarios in the catalog. It often needs many tool calls (search, fetch, parse PDF, look up internal records) and tens of minutes of run time.

Key Building Blocks

Search tools — general web search, scholarly search, social search, specialized databases.
Browser / fetcher with parsing — handle paywalls, JS-rendered content, PDFs, sometimes login.
Notes layer — a working scratchpad that survives between steps; explicit memory.
Citation infrastructure — every claim has source(s); links are checked.
Outline-then-fill writer.
Quality / coverage critic — a separate prompt that scores whether the deliverable is actually grounded.
Run log — searches issued, sources accessed, why each was kept/dropped.

Concrete Cases

Competitive teardown. Given a competitor name, produce positioning, pricing, target market, recent news, hiring signals, customer reviews, with citations.
Market sizing memo. Pull TAM/SAM/SOM signals from public sources, reconcile conflicting numbers, write the memo.
Daily executive brief. Each morning, summarize overnight news relevant to a watchlist of companies / topics, with novelty filtering.
Investment due-diligence pre-pack. Public-source DD on a target: financials, leadership, press, regulatory filings, key risks.
Literature review for a research question. Pull and summarize papers from PubMed / arXiv, group by claim, surface contradictions.
Patent landscape scan. For a tech area, retrieve relevant patents, cluster by topic, identify whitespace.
Sales account research. Given a target account, produce a one-pager with org structure, recent initiatives, key contacts, talking points.
Regulatory horizon scan. Across jurisdictions, identify upcoming rules relevant to a product line.
Hiring intelligence. For a competitor, infer team structure and growth from public hiring posts and LinkedIn signals.
OSINT investigation (defensive). Given an indicator, assemble what's publicly known with provenance.

Similar Scenarios

Internal-only research — same shape, restricted to internal sources; overlaps with knowledge assistant when scope shrinks.
Trip / event briefing — pre-meeting brief on attendees, their company, recent moves.
"Catch me up after vacation" — synthesize what changed in a defined area during a date range.
Literature-grounded answer to a single hard question — same engine, single-question scope.

Pitfalls & Evaluation

The bibliography that doesn't exist. Hallucinated URLs, made-up paper titles. Always re-fetch every citation; a citation that doesn't resolve is dropped.
Source quality collapse. SEO spam, content farms, and other LLM output dominate easy searches. Curate trusted source whitelists per domain.
Premature convergence. The agent stops after a few sources because the first ones agreed. Force minimum source diversity by topic.
Confident on contested facts. When sources disagree, the report must surface the disagreement, not pick a side silently.
No reproducibility. Without the search log, you can't audit the reasoning later. Treat the log as part of the deliverable.

Useful metrics: citation precision (do citations support the claim?), citation recall (are key claims sourced?), coverage of seeded sub-questions, expert-rated quality on a held-out brief set, time-to-deliverable vs human baseline.