Steven's Knowledge

AI-Powered Search

Beyond keyword match — search systems that understand intent

Classical search ranks documents by lexical match. AI-powered search adds semantic understanding, query rewriting, and generative summaries. Done well, it's the most common AI feature users actually feel.

The Building Blocks

  • Lexical search — BM25, exact match. Fast, cheap, indispensable for names, codes, and quoted phrases.
  • Semantic search — embed query and documents, rank by vector similarity. Catches paraphrases and intent.
  • Hybrid search — combine lexical and semantic scores. The right default; each catches what the other misses.
  • Reranking — re-score the top-N from initial retrieval with a stronger (cross-encoder or LLM) model.
  • Generative answer — compose a direct answer with citations on top of the search results.

Hybrid Is the Right Default

Pure semantic search underperforms classical search on:

  • Exact identifiers — order numbers, error codes, model names.
  • Negation — "no nuts" returns the same as "with nuts."
  • Rare terms — unusual vocabulary the embedding model didn't see.

Pure lexical search misses:

  • Paraphrasing — "how do I cancel" vs "subscription termination."
  • Conceptual queries — "fast outdoor activities for kids."

Combining them — typically Reciprocal Rank Fusion or weighted score blending — is more robust than either alone.

Query Understanding

A surprising amount of search quality comes from rewriting the query before retrieval:

  • Spelling correction.
  • Expansion — adding synonyms or related terms.
  • Decomposition — splitting compound questions into sub-queries.
  • Filtering extraction — pulling structured filters ("under $50", "last week") out of free text.

LLMs are very good at this; a small fast model in the query path is usually worth the latency.

Reranking

Initial retrieval optimizes for recall. Reranking re-scores the top candidates with a model that can look at the query and document together (cross-encoder) or reason about them in prose (LLM). The lift over raw vector ranking is usually large.

Generative Answers

Composing an answer over search results is RAG by another name. The same rules apply: ground in retrieved sources, cite them, refuse when nothing relevant was found.

What Search Quality Actually Means

Three measures, all worth tracking:

  • Click-through rate — users finding something to click.
  • Result-to-action rate — users taking the next step (purchase, save, copy).
  • Zero-result rate — searches that returned nothing useful.

The first is easy to game; the others are what you should actually optimize.

Latency Matters More Than People Admit

Search has to feel instant. A "smart" pipeline that takes two seconds will lose to a "dumber" one that takes 200ms. Optimize the cheap path first; reserve the expensive pipeline for queries that warrant it.

On this page