Schema design, continuous aggregates, retention, downsampling, hot/cold tiers, joining with relational data

Patterns

The patterns that separate "TSDB barely working" from "TSDB humming."

Schema Design

The first decision: what's a tag (indexed, low-cardinality, queryable in WHERE) vs. a field (the actual value, not indexed)?

Element	TSDB term	Example
`service` (5 values)	Tag	`service="checkout"`
`region` (3 values)	Tag	`region="us-east-1"`
`endpoint` (~100 values)	Tag	`endpoint="/api/cart"`
`latency_ms`	Field (value)	`latency_ms=42`
`request_size_bytes`	Field	`request_size_bytes=1024`
`user_id` (1M values)	Avoid as tag — too high cardinality	(log it instead)
`request_id` (every request)	Never a tag	(trace it instead)

A wide table with a few low-cardinality tags + several numeric fields is the sweet spot. Adding tags is cheap (compression hides similar values); adding high-cardinality tags is expensive (each unique combination is a new series).

TimescaleDB

In Timescale you have full Postgres typing. Use it:

CREATE TABLE http_metrics (
  time          TIMESTAMPTZ NOT NULL,
  service       TEXT NOT NULL,
  endpoint      TEXT NOT NULL,
  region        TEXT NOT NULL,
  status_class  SMALLINT NOT NULL,   -- 2, 3, 4, 5
  latency_ms    DOUBLE PRECISION,
  bytes_sent    BIGINT
);

SELECT create_hypertable('http_metrics', 'time', chunk_time_interval => INTERVAL '1 day');

-- Composite index for typical queries
CREATE INDEX ON http_metrics (service, time DESC);
CREATE INDEX ON http_metrics (service, endpoint, time DESC);

Don't index columns you never query on; each index costs writes.

InfluxDB

http_metrics,service=checkout,endpoint=/cart,region=us-east-1,status_class=2 latency_ms=42,bytes_sent=1024

Tags are everything before the space; fields after. Same cardinality rules.

Continuous Aggregates / Downsampling

The single most important optimization: pre-compute the queries you'll run on big ranges.

Raw data at 1-second resolution × 1 year = ~31M points per series. A dashboard query over a year is slow no matter how good the storage is. The answer: maintain a 1-minute (or 1-hour) rollup automatically.

TimescaleDB

CREATE MATERIALIZED VIEW http_metrics_1m
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 minute', time) AS bucket,
  service,
  endpoint,
  region,
  status_class,
  COUNT(*) AS requests,
  AVG(latency_ms) AS avg_latency,
  approx_percentile(0.99, percentile_agg(latency_ms)) AS p99_latency,
  SUM(bytes_sent) AS bytes
FROM http_metrics
GROUP BY bucket, service, endpoint, region, status_class;

SELECT add_continuous_aggregate_policy('http_metrics_1m',
  start_offset => INTERVAL '2 hours',
  end_offset => INTERVAL '1 minute',
  schedule_interval => INTERVAL '1 minute');

Now a "last 30 days" dashboard reads 43k points per series instead of 2.6B. Sub-second response on the same hardware.

For "last 1 year" you'd typically stack another aggregate:

CREATE MATERIALIZED VIEW http_metrics_1h WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', bucket) AS bucket, ...
FROM http_metrics_1m
GROUP BY ...;

Grafana queries pick the resolution matching the time range.

Prometheus

Use recording rules:

groups:
- name: http_metrics_aggregations
  interval: 30s
  rules:
  - record: http:request_rate:5m
    expr: rate(http_requests_total[5m])
  - record: http:request_latency_p99:5m
    expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

Alerts and dashboards then query the pre-computed series.

Retention and Tiering

Old data shouldn't be paid for like new data. Common tiers:

0-7d:    1s raw     (hot, expensive storage)
7-30d:   1min agg   (warm)
30-365d: 1h agg     (cool)
> 1yr:   archived or dropped

TimescaleDB

-- Retention on raw
SELECT add_retention_policy('http_metrics', INTERVAL '7 days');

-- Compression on chunks older than 1 day
ALTER TABLE http_metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'service, endpoint',
  timescaledb.compress_orderby = 'time DESC'
);

SELECT add_compression_policy('http_metrics', INTERVAL '1 day');

Compression gives 10-30× space savings. Compressed chunks are still queryable, just slightly slower.

S3 tiered storage

Timescale's commercial tier and InfluxDB v3 both support tiered storage: hot chunks on local NVMe, warm on cheaper instance storage, cold on S3. Queries are transparent across tiers.

-- TimescaleDB Cloud: tier old chunks to S3
SELECT add_tier_policy('http_metrics', INTERVAL '30 days');

Joining Time-Series with Relational Data

A common need: a service incident report that joins per-second metrics with the user accounts affected.

In TimescaleDB this is free — it's Postgres:

SELECT
  m.time, m.endpoint, m.latency_ms, u.tier, u.region
FROM http_metrics_1m m
JOIN users u ON u.id = m.user_id
WHERE m.time > NOW() - INTERVAL '1 hour'
  AND u.tier = 'enterprise';

In InfluxDB / Prometheus this is painful: you'd export to a relational store first. If your queries routinely join business data with metrics, TimescaleDB's SQL is a big win.

Gap Filling and Interpolation

A device misses a heartbeat; you still want a smooth graph. TimescaleDB:

SELECT
  time_bucket_gapfill('5 minutes', time) AS bucket,
  device_id,
  AVG(temperature) AS temp,
  interpolate(AVG(temperature)) AS temp_interpolated,
  locf(AVG(temperature)) AS temp_last_observed
FROM sensors
WHERE time BETWEEN NOW() - INTERVAL '24 hours' AND NOW()
GROUP BY bucket, device_id;

InfluxDB Flux has aggregateWindow(createEmpty: true) and fill().

Late Arrivals

Sensors disconnect, batch dumps arrive hours later. The TSDB needs to handle out-of-order writes:

TimescaleDB: supports out-of-order writes within the current uncompressed chunk. If you write into a compressed chunk you'll get an error unless you allow decompression-on-write.
InfluxDB v3: handles late arrivals; reorganization happens in background.
Prometheus: by design, doesn't accept out-of-order. For late data use remote write or push via Pushgateway with care.

If late writes are frequent, keep chunks/blocks uncompressed for the late window (e.g., 6 hours), compress after.

Multi-Tenant / Per-Customer Series

A B2B SaaS that wants per-customer metrics has a cardinality problem. Options:

Tag with customer_id — works up to thousands of customers; degrades past tens of thousands.
Per-customer database / bucket — InfluxDB orgs / buckets; physical isolation.
Aggregate per-customer in app, send only aggregates — best at huge scale.
Use logs / traces for per-customer events, time-series only for global metrics.

The right answer depends on customer count and metric volume per customer.

Alerting from TSDB Data

Common pattern: continuous aggregate → Grafana alert → PagerDuty.

# Grafana alert rule on TimescaleDB
- alert: P99 latency above 500ms
  expr: |
    SELECT p99_latency FROM http_metrics_1m
    WHERE service = 'checkout' AND time > now() - INTERVAL '5 minutes'
    GROUP BY service
    HAVING max(p99_latency) > 500
  for: 5m

For Prometheus the same idea is in alertmanager:

- alert: HighP99Latency
  expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)) > 0.5
  for: 5m

Tip: alert on aggregates, not raw data. Raw is noisy; 5-min P99 is the actual SLO question.

Anti-Patterns

Schema with unbounded tag cardinality. user_id, request_id, session_id as tags will explode your TSDB. Put them in logs or traces.

One big table forever. No retention, no compression, no aggregates. You'll be migrating in 18 months under emergency.

Querying raw data on 1-year dashboards. No matter how good the database, scanning 2B points is slow. Build the aggregate.

No backup strategy. Time-series data is often the operational record. Lose it, lose the incident reconstruction.

Treating time-series like OLTP. Updates and deletes on time-series data are slow and break compression. Append-only is the model.

What's Next

Best Practices — capacity, HA, backups, monitoring, common pitfalls

Patterns

On this page