Steven's Knowledge

Best Practices

Production search - indexing strategy, relevance tuning, security, multi-tenancy, observability, pitfalls

Best Practices

Patterns that apply whether you're on Algolia, Meilisearch, Typesense, or Elasticsearch. Tuning details differ; principles don't.

Index Design

The most important decision after picking an engine.

One Document Per Search Result

Build the index around the unit users search for. A product detail page → one document per product. An article → one document per article. Don't search a normalized join — pre-denormalize at index time.

// Product document for search
{
  id: "prod-42",
  name: "Espresso Maker XL",
  brand: "Acme",
  brand_id: "acme",                        // for filtering
  category: "Kitchen > Coffee Machines",   // hierarchical for facets
  price: 199.99,
  currency: "USD",
  in_stock: true,
  popularity: 832,                          // for custom ranking
  description: "...",
  tags: ["espresso", "coffee", "kitchen"],
  thumbnail: "https://...",
  url: "/products/prod-42",
}

Include what you need to filter, rank, and render — not the full database row.

Searchable vs Filterable vs Displayable

Most engines distinguish three roles for an attribute:

RoleUsed forExample
SearchableFull-text matchingname, description, tags
FilterableEquality / range filterscategory, price, in_stock
SortableOrder resultsprice, popularity, created_at
DisplayableRender in UIthumbnail, url

Attributes can have multiple roles. Mark them appropriately — engines use this to optimize storage and queries.

Searchable Attribute Order

Multiple searchable attributes are usually ranked: a match in name outweighs a match in description. Algolia and Meilisearch let you set the order:

searchableAttributes: [
  "name",          // highest weight
  "brand",
  "category",
  "tags",
  "description",   // lowest weight
]

A bad order ruins relevance: if description outranks name, "espresso" matches anything that mentions espresso anywhere, not espresso machines first.

Relevance Tuning

Default relevance is good. Custom ranking signals are how you make it great.

Built-in Ranking Rules

Most engines rank by:

  1. Typos — fewer typos rank higher
  2. Proximity — query words close together rank higher
  3. Attribute — match in higher-weight attribute ranks higher
  4. Exactness — exact match outranks prefix match
  5. Custom signals (your turn)

Custom Ranking

Add signals that reflect business value:

  • Popularity — items with more views/clicks rank higher
  • Stock — in-stock items above out-of-stock
  • Recency — newer items boosted
  • Featured / promoted — manual boosts
// Meilisearch: rank tied results by these attributes
"rankingRules": [
  "words", "typo", "proximity", "attribute", "exactness",
  "popularity:desc",          // custom: most popular wins ties
  "in_stock:desc",
]

Synonyms

// Algolia / Meilisearch synonyms
{
  "couch": ["sofa", "loveseat"],
  "sneakers": ["trainers", "tennis shoes"],
}

Mine your search analytics for queries that should match: "what's the difference between sneakers and trainers" → same products.

Click-Through Tuning (the dream goal)

Top-tier search:

  1. Log every query + click.
  2. Build a model: which queries / contexts make which results valuable.
  3. Re-rank: rank up things that get clicked, down things that don't.

Algolia bakes click-tuning into their product. With Meilisearch/Typesense you can roll your own using a custom ranking attribute that you periodically recompute from click data.

Indexing Strategy

Batch on Initial Load, Stream on Updates

Initial indexing of millions of records — batch (5,000 docs per request, parallel). Once steady-state, push individual document updates as they happen (CDC stream, message queue, app-level hooks).

// Bulk indexing
const BATCH = 5000;
for (let i = 0; i < docs.length; i += BATCH) {
  await index.addDocuments(docs.slice(i, i + BATCH));
}

// Streaming updates
await index.updateDocuments([changedDoc]);   // single doc

Async indexing is the norm — addDocuments returns immediately with a task ID; the engine indexes in the background. Poll the task or just trust it for non-critical paths.

CDC From Your Database

The reliable pattern: change-data-capture from Postgres / MySQL → message queue → indexer → search engine.

Postgres ──► Debezium ──► Kafka ──► indexer service ──► Meilisearch
            (logical replication or wal2json)

This guarantees the search index converges with the database, handles backfills, and survives transient failures.

For smaller scale, outbox pattern in your app: write to DB and an outbox table in the same transaction; a worker reads outbox and pushes to search.

Re-indexing

Sometimes you have to wipe and reindex (schema change, ranking config that requires it). Strategies:

StrategyNotes
Atomic swapIndex into products_v2; switch app to point to it; delete products. Zero downtime.
In-place re-indexEngine handles; brief period of split state
Reindex while writes continuePause writes (small downtime) OR dual-write to old and new

Atomic swap (also called "blue-green index") is the safe default.

Security

Public Keys for Frontend

Never put admin / master keys in the browser. Issue search-only keys scoped to specific indexes:

// Meilisearch
{
  description: "Public search key",
  actions: ["search"],
  indexes: ["products"],
  expiresAt: null,
}

Per-user scoped keys with attribute filters are useful for multi-tenant:

// Algolia secured API key — signs the user's tenant filter into the key
const secureKey = algoliasearch.generateSecuredApiKey(
  searchKey,
  { filters: 'tenant_id:42' }
);

The key embeds the filter — the user can search, but only within their tenant's data. They can't tamper with the filter (HMAC-signed).

Don't Index Sensitive Fields

Search indexes are less locked down than your DB. Don't index:

  • Passwords, hashes, tokens
  • PII you don't need for search (SSN, full credit card)
  • Internal fields users shouldn't see

If you must index user-discriminating data (email, phone), at least make it filterable but not displayable.

Rate Limiting

Add gateway-level rate limits on search endpoints (see API Gateway). Search is cheap per request but search is also a perfect bot magnet — a script can hammer it to map your catalog.

Multi-Tenancy

Two approaches:

One Index per Tenant

products-tenant-1
products-tenant-2
products-tenant-3

Pros: clean isolation; per-tenant ranking config; easy deletion. Cons: scales poorly past hundreds of tenants; cross-tenant queries are awkward.

One Index, Tenant ID Filter

products (single index)
   - tenant_id is filterable

Search keys per tenant embed filter: tenant_id:N. The user can only see their data.

Pros: scales to millions of tenants; smaller infrastructure footprint. Cons: shared ranking; if you index sensitive data, a misconfigured key leaks it.

For most SaaS, "one index, filtered key" wins. For per-tenant customization, "one index per tenant."

Observability

SignalWhy
Query latency p99User-facing; alert above SLO
Indexing lagDB change → search index seconds-to-minutes; alert if growing
Zero-result rateQueries returning nothing — opportunity for synonyms, query understanding
Top queriesWhat users actually look for; informs taxonomy and ranking
CTR per queryWhich queries deliver value; tune low-CTR queries
Index size growthCapacity planning

All three engines export metrics to Prometheus or have managed dashboards (Algolia in their UI). Wire into your observability stack — see Prometheus & Grafana.

Common Pitfalls

PitfallSymptomFix
Searching a normalized joinSlow, irrelevantDenormalize at index time
Master key in browserHijacked accountUse search-only keys
All attributes searchableNoisy resultsConfigure searchable attributes
No custom rankingDefaults ruleUse popularity / business-value signals
Re-indexing in place during trafficHalf-state visibleAtomic swap pattern
Putting raw HTML in indexable textTags in resultsStrip at index time
Logging full search queries with PIIPrivacy issueHash or strip personal queries
One huge "everything" indexSlow, complex tuningOne index per searchable entity type
Indexing every database rowIndex larger than DBIndex only displayable + filterable fields
No fallback for engine downSite partially brokenGraceful degradation; cache top queries

Search UX

Beyond the engine, the UI makes or breaks search:

  • Search as you type (instant search) — modern bar.
  • Highlight matches in results.
  • "Did you mean..." for low-result queries.
  • Filters as facets — counts visible (Color: Red (43)).
  • Empty state — suggest popular searches, recent searches, categories.
  • Mobile-first — search is huge on mobile.
  • Track which results users click; feed that back to ranking.

InstantSearch.js (works with all three engines) gives you most of this out of the box.

Checklist

Production search checklist

  • Search-only API keys for the browser; admin keys backend-only
  • Searchable / filterable / sortable / displayable roles configured per attribute
  • Custom ranking attribute (popularity / stock / recency)
  • Synonyms list maintained from search analytics
  • Initial bulk index + streaming updates (CDC or outbox)
  • Atomic-swap re-indexing for breaking schema changes
  • Per-tenant scoped keys (HMAC-signed filters) for multi-tenant apps
  • PII / secrets not indexed
  • Rate limits on the search endpoint at the gateway
  • Query latency, zero-result rate, top queries monitored
  • InstantSearch.js or equivalent UI library
  • Empty-state UX with suggestions
  • Graceful degradation if search is down (cached top results)
  • Backups (self-host) or trust the SaaS SLA (Algolia)
  • Capacity planned for index growth

On this page