Multi-Tenancy
Tenancy models (silo/pool/bridge), data isolation strategies, tenant context propagation, noisy neighbors, and per-tenant configuration and limits
Multi-Tenancy
A multi-tenant system serves many customers (tenants) from one deployment. Each tenant believes they have the application to themselves; in reality they share code, and often share databases and servers. The central engineering problem is isolation: tenant A must never see, touch, or starve tenant B — and the cost of guaranteeing that scales with how much you share.
This is an application-code concern first. The hard part is not provisioning servers; it is making sure every query, every cache key, and every log line carries the right tenant scope, automatically, so that one forgotten WHERE tenant_id = ? does not leak another customer's data.
Tenancy Models
There is a spectrum from "one stack per tenant" to "everything shared," usually named silo, bridge, and pool.
SILO BRIDGE POOL
(isolated) (hybrid) (shared)
Tenant A → Stack A Tenant A ┐ Tenant A ┐
Tenant B → Stack B Tenant B ┼→ Shared app Tenant B ┼→ Shared app
Tenant C → Stack C Tenant C ┘ + separate Tenant C ┘ + shared DB
DB per tenant (tenant_id column)
most isolation ─────────────────────────────────────→ most efficiency
most cost ←───────────────────────────────────── least cost| Model | Isolation | Cost per tenant | Blast radius | Best for |
|---|---|---|---|---|
| Silo | Strongest — separate everything | Highest | One tenant | Regulated/enterprise; few large tenants |
| Bridge | Strong data isolation, shared compute | Medium | One tenant's data | Mixed customer sizes |
| Pool | Logical only (enforced in code) | Lowest | All tenants (if a bug leaks) | Many small tenants; SaaS at scale |
Most SaaS products start pool (cheapest, fastest to ship) and peel high-value or compliance-bound tenants out into silo later. You can run different tiers simultaneously — a "tiered" or bridge approach where the same code path serves both.
Data Isolation Strategies
This is where the model becomes concrete. Three implementations, increasing in sharing:
1. Database per Tenant (Silo)
Each tenant gets a physically separate database. Routing picks the connection based on tenant.
function dbForTenant(tenantId: string): Pool {
const connString = tenantRegistry.lookup(tenantId).databaseUrl;
return getPool(connString); // pooled per tenant
}- Pro: Total isolation. A bad query cannot cross tenants. Easy per-tenant backup, restore, and even region placement.
- Con: Hundreds of databases to migrate, monitor, and connection-pool. Migrations become a fan-out job. Connection count explodes.
2. Schema per Tenant (Bridge)
One database, one schema (namespace) per tenant. Set the search path per request.
-- Postgres: switch the active schema for this connection
SET search_path TO tenant_8f2a;
SELECT * FROM orders; -- resolves to tenant_8f2a.orders- Pro: Strong logical isolation, one database to operate. Cheaper than database-per-tenant.
- Con: Migrations still fan out across schemas. Thousands of schemas strain the catalog. The
SET search_pathmust be bulletproof — forget it and you query the wrong (orpublic) schema.
3. Shared Table + tenant_id (Pool)
All tenants share tables; every row carries a tenant_id. Cheapest, and the one with the sharpest knife.
CREATE TABLE orders (
id uuid PRIMARY KEY,
tenant_id uuid NOT NULL,
customer_id uuid NOT NULL,
total_cents bigint NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX ON orders (tenant_id, created_at); -- tenant_id leads EVERY indexThe danger is obvious: a single query missing WHERE tenant_id = ? leaks or corrupts across tenants. You cannot rely on developers remembering. Two layers of defense:
Row-Level Security (RLS) — let the database enforce the filter so application bugs cannot bypass it:
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- Per request/transaction, set the current tenant:
SET LOCAL app.tenant_id = '8f2a...';
-- Now EVERY query on orders is automatically scoped. A forgotten WHERE is safe.A repository/ORM layer that injects the filter — so no raw query escapes without scoping. RLS is the stronger guarantee because it holds even when someone bypasses the ORM with raw SQL.
If you choose shared-table pooling, treat RLS (or an equivalent unbypassable filter) as non-optional. The convenience of pooling is exactly what makes a single missing predicate catastrophic.
Tenant Context Propagation
Every request runs in the context of one tenant. That context has to flow from the edge all the way to the query — without being passed as an explicit argument through forty function calls.
Establish It at the Edge
Resolve the tenant once, from a trustworthy source, in middleware:
function tenantMiddleware(req, res, next) {
// Source of truth, in order of trust:
// 1. A claim in the validated JWT (best — signed, can't be forged)
// 2. A subdomain (acme.app.com) (ok — but verify the user belongs to it)
// 3. A header / path param (only if independently authorized)
const tenantId = req.auth.claims.tenant_id;
if (!tenantId) return res.status(403).json({ error: { code: 'NO_TENANT' } });
tenantContext.run({ tenantId }, () => next()); // bind for the rest of the request
}Derive tenant from the authenticated identity, not from client-supplied input you haven't checked. A X-Tenant-Id header the client sets freely is an IDOR waiting to happen: user from tenant A sends tenant B's id and reads their data.
Carry It Implicitly
Use the runtime's request-scoped storage so deep code can read the tenant without it being threaded through every signature:
import { AsyncLocalStorage } from 'node:async_hooks';
export const tenantContext = new AsyncLocalStorage<{ tenantId: string }>();
// Deep in a repository — no tenantId parameter needed:
function currentTenant(): string {
const ctx = tenantContext.getStore();
if (!ctx) throw new Error('No tenant in context'); // fail closed, never default to "all"
return ctx.tenantId;
}Equivalents: Python contextvars, Go context.Context (passed explicitly — Go's idiom), Java ThreadLocal / Micrometer context. The rule everywhere: fail closed. Missing tenant context is an error, never "show everything."
Don't Forget the Other Caches
Tenant scope leaks through everything that holds data, not just the primary database:
- Cache keys must include the tenant —
cache.get(tenantId + ":user:" + id). A shared key serves tenant A's data to tenant B. - Background jobs lose request context — pass
tenantIdin the job payload and re-establish context in the worker. - Logs and traces should carry
tenant_idas a structured field for debugging and per-tenant analysis. - Search indexes / object storage need a tenant prefix or filter, same as the database.
Noisy Neighbors
In pooled models, tenants share finite resources: CPU, connection pool slots, queue capacity, rate-limit budget. One heavy tenant — a bulk import, a runaway report, an abusive API client — can degrade everyone else. This is the noisy-neighbor problem.
Defenses, applied per tenant rather than globally:
| Mechanism | What it bounds | Notes |
|---|---|---|
| Per-tenant rate limits | Requests/sec per tenant | Reuse the token-bucket from Resilience, keyed by tenantId |
| Per-tenant concurrency caps | In-flight expensive ops | A bulkhead per tenant stops one from eating the pool |
| Query cost / row limits | Runaway reads | Cap result-set size; statement timeouts |
| Fair queuing | Background job hogging | Round-robin or weighted scheduling across tenants, not pure FIFO |
| Tier-based quotas | Plan-level fairness | Free tier gets less; enterprise gets reserved capacity |
// Per-tenant rate limiting — each tenant gets an independent bucket
const tenantLimiter = new Map<string, TokenBucket>();
function limiterFor(tenantId: string): TokenBucket {
if (!tenantLimiter.has(tenantId)) {
tenantLimiter.set(tenantId, new TokenBucket(/* capacity */ 100, /* refill */ 10));
}
return tenantLimiter.get(tenantId)!;
}The structural fix for chronic noisy neighbors is to promote the heavy tenant to a silo — give them their own database or stack so their load cannot touch anyone else. That is the whole point of the silo/pool spectrum: it is a dial you turn per tenant as their value and demands grow.
Per-Tenant Configuration & Limits
Tenants are not identical. They have different feature flags, branding, quotas, and integrations. Resolve this configuration through the same tenant context, with a sensible default-and-override layering:
interface TenantConfig {
features: { advancedReports: boolean; sso: boolean };
limits: { maxUsers: number; maxStorageGb: number; apiRateLimit: number };
branding: { logoUrl?: string; primaryColor?: string };
}
function resolveConfig(tenantId: string): TenantConfig {
// Layer: global defaults < plan tier defaults < per-tenant overrides
return merge(GLOBAL_DEFAULTS, planDefaults(tenant.plan), tenantOverrides(tenantId));
}Enforce limits where the resource is consumed, and return a clear, tenant-aware error when a tenant hits a quota (403 with a code like PLAN_LIMIT_EXCEEDED, not a generic failure). Cache resolved config — but remember to scope and invalidate the cache per tenant.
Decision Tree
Regulated data, or a few large enterprise tenants?
→ Silo (database/stack per tenant). Pay for isolation.
Many small tenants, cost-sensitive, shipping fast?
→ Pool (shared tables + tenant_id) WITH row-level security enforced in the DB.
Mixed: most tenants small, a few demand isolation?
→ Bridge / tiered. Pool the small ones, silo the big ones. Same codebase.
Resolving which tenant a request belongs to?
→ From the signed JWT/identity, never an unverified client header. Fail closed.
One tenant degrading others?
→ Per-tenant rate limits + bulkheads first; promote chronic offenders to a silo.Checklist
- The isolation model (silo/bridge/pool) is a deliberate choice, not an accident of the first table you created.
- If pooled, the database enforces tenant scope (RLS or an equivalent unbypassable filter) — not just application code.
tenant_idleads every index on shared tables.- Tenant is resolved from authenticated identity, never unverified client input.
- Tenant context propagates implicitly (request-scoped storage) and fails closed when missing.
- Cache keys, background-job payloads, logs, and search indexes all carry tenant scope.
- Per-tenant rate limits and concurrency caps protect against noisy neighbors.
- Per-tenant config layers global defaults → plan tier → tenant overrides, and limit violations return a clear, tenant-aware error.