Instrumentation
Metric types, what each is for, and how to instrument a real application
Instrumentation
A scraping stack is only as useful as the metrics your apps emit. This page covers the four metric types, when to use which, and a complete instrumentation example.
The Four Metric Types
| Type | Goes | Example | Use for |
|---|---|---|---|
| Counter | Up only (resets on restart) | http_requests_total, errors_total | Counting events |
| Gauge | Up and down | temperature, active_connections, memory_usage_bytes | Current value of something |
| Histogram | Buckets + count + sum | http_request_duration_seconds | Distributions, percentiles |
| Summary | Pre-computed quantiles client-side | request_duration_quantile | Quantiles when histograms can't be aggregated |
A quick way to choose:
- Counting things — Counter.
- Measuring a current level — Gauge.
- "How long did each request take" / "How big was each response" — Histogram.
Prefer Histogram over Summary in almost every case — histograms aggregate across instances cleanly; summaries don't.
What to Measure: USE and RED
Two mental models worth keeping:
USE (for every resource):
- Utilization — % time the resource was busy
- Saturation — degree of overload / queue depth
- Errors — error count
RED (for every request-driven service):
- Rate — requests/sec
- Errors — failed requests/sec
- Duration — latency distribution
Plus Google SRE's Four Golden Signals: Latency, Traffic, Errors, Saturation.
Instrumenting a Node.js App
Using the prom-client library:
const express = require('express');
const client = require('prom-client');
const app = express();
// Default metrics: CPU, memory, event loop, GC, fd count
client.collectDefaultMetrics({ prefix: 'app_' });
// Counter — total HTTP requests
const httpRequestsTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'path', 'status'],
});
// Histogram — request duration with sensible buckets
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'path'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});
// Gauge — current open connections
const activeConnections = new client.Gauge({
name: 'active_connections',
help: 'Number of active connections',
});
// Middleware records every request
app.use((req, res, next) => {
const endTimer = httpRequestDuration.startTimer({
method: req.method,
path: req.route?.path,
});
activeConnections.inc();
res.on('finish', () => {
httpRequestsTotal.inc({
method: req.method,
path: req.route?.path,
status: res.statusCode,
});
activeConnections.dec();
endTimer();
});
next();
});
// The endpoint Prometheus scrapes
app.get('/metrics', async (req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});
app.listen(3000);Hit /metrics and you'll see:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/health",status="200"} 1432
http_requests_total{method="POST",path="/users",status="201"} 87
# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",path="/health",le="0.01"} 1420
http_request_duration_seconds_bucket{method="GET",path="/health",le="0.05"} 1432
http_request_duration_seconds_bucket{method="GET",path="/health",le="+Inf"} 1432
http_request_duration_seconds_sum{method="GET",path="/health"} 12.3
http_request_duration_seconds_count{method="GET",path="/health"} 1432Notice how the histogram becomes three real metrics: _bucket (cumulative counts at each le boundary), _sum, and _count. PromQL's histogram_quantile reads all three.
Histogram Buckets — Pick Carefully
Buckets are fixed at metric registration; you can't add new ones later (well, you can, but old data won't have them). Pick buckets that span the latency range you care about:
// Good for typical web APIs: 10ms ... 10s
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
// Good for microservices RPCs: 1ms ... 1s
buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1]
// Good for batch jobs: seconds to minutes
buckets: [1, 5, 10, 30, 60, 300, 600, 1800]A budget of 10-12 buckets is plenty. Each bucket × each label combination = one time series; bucket explosion is real.
Labels: Power and Pitfall
Labels turn one metric into a multi-dimensional dataset. They also explode storage if misused.
Use labels for dimensions with a small, bounded set of values:
| Good labels | Bad labels |
|---|---|
method (GET, POST, ...) | user_id (millions of users) |
status (200, 404, 500, ...) | request_id (one per request) |
path (route templates, not raw URLs) | Full URL with query params |
region, environment, pod | Timestamps, error messages |
Rule of thumb: total active series across the metric < ~10,000. Beyond that you're heading for cardinality trouble.
Library Choices by Language
| Language | Library |
|---|---|
| Go | prometheus/client_golang |
| Python | prometheus_client |
| Node.js | prom-client |
| Java | simpleclient / Micrometer |
| Ruby | prometheus-client |
| .NET | prometheus-net |
| Rust | prometheus |
All follow the same conventions: register metrics, expose /metrics, let Prometheus scrape.
Already-Built Exporters
You don't have to instrument everything yourself. There's an exporter for almost everything:
| Exporter | What it exports |
|---|---|
node-exporter | Linux host metrics (CPU, memory, disk, network) |
cadvisor | Container metrics |
postgres_exporter | PostgreSQL stats |
mysqld_exporter | MySQL stats |
redis_exporter | Redis stats |
blackbox_exporter | Probes external HTTP/TCP/DNS endpoints |
kube-state-metrics | Kubernetes object state |
nginx-exporter | nginx stub_status / NGINX Plus |
Pull them up alongside your app and add a scrape_config entry.
What's Next
You can emit metrics from anything. Next, learn how to ask questions of them → PromQL.