Steven's Knowledge

Instrumentation

Metric types, what each is for, and how to instrument a real application

Instrumentation

A scraping stack is only as useful as the metrics your apps emit. This page covers the four metric types, when to use which, and a complete instrumentation example.

The Four Metric Types

TypeGoesExampleUse for
CounterUp only (resets on restart)http_requests_total, errors_totalCounting events
GaugeUp and downtemperature, active_connections, memory_usage_bytesCurrent value of something
HistogramBuckets + count + sumhttp_request_duration_secondsDistributions, percentiles
SummaryPre-computed quantiles client-siderequest_duration_quantileQuantiles when histograms can't be aggregated

A quick way to choose:

  • Counting things — Counter.
  • Measuring a current level — Gauge.
  • "How long did each request take" / "How big was each response" — Histogram.

Prefer Histogram over Summary in almost every case — histograms aggregate across instances cleanly; summaries don't.

What to Measure: USE and RED

Two mental models worth keeping:

USE (for every resource):

  • Utilization — % time the resource was busy
  • Saturation — degree of overload / queue depth
  • Errors — error count

RED (for every request-driven service):

  • Rate — requests/sec
  • Errors — failed requests/sec
  • Duration — latency distribution

Plus Google SRE's Four Golden Signals: Latency, Traffic, Errors, Saturation.

Instrumenting a Node.js App

Using the prom-client library:

const express = require('express');
const client = require('prom-client');

const app = express();

// Default metrics: CPU, memory, event loop, GC, fd count
client.collectDefaultMetrics({ prefix: 'app_' });

// Counter — total HTTP requests
const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'path', 'status'],
});

// Histogram — request duration with sensible buckets
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'path'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
});

// Gauge — current open connections
const activeConnections = new client.Gauge({
  name: 'active_connections',
  help: 'Number of active connections',
});

// Middleware records every request
app.use((req, res, next) => {
  const endTimer = httpRequestDuration.startTimer({
    method: req.method,
    path: req.route?.path,
  });
  activeConnections.inc();

  res.on('finish', () => {
    httpRequestsTotal.inc({
      method: req.method,
      path: req.route?.path,
      status: res.statusCode,
    });
    activeConnections.dec();
    endTimer();
  });

  next();
});

// The endpoint Prometheus scrapes
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

app.listen(3000);

Hit /metrics and you'll see:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/health",status="200"} 1432
http_requests_total{method="POST",path="/users",status="201"} 87

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",path="/health",le="0.01"} 1420
http_request_duration_seconds_bucket{method="GET",path="/health",le="0.05"} 1432
http_request_duration_seconds_bucket{method="GET",path="/health",le="+Inf"} 1432
http_request_duration_seconds_sum{method="GET",path="/health"} 12.3
http_request_duration_seconds_count{method="GET",path="/health"} 1432

Notice how the histogram becomes three real metrics: _bucket (cumulative counts at each le boundary), _sum, and _count. PromQL's histogram_quantile reads all three.

Histogram Buckets — Pick Carefully

Buckets are fixed at metric registration; you can't add new ones later (well, you can, but old data won't have them). Pick buckets that span the latency range you care about:

// Good for typical web APIs: 10ms ... 10s
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]

// Good for microservices RPCs: 1ms ... 1s
buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1]

// Good for batch jobs: seconds to minutes
buckets: [1, 5, 10, 30, 60, 300, 600, 1800]

A budget of 10-12 buckets is plenty. Each bucket × each label combination = one time series; bucket explosion is real.

Labels: Power and Pitfall

Labels turn one metric into a multi-dimensional dataset. They also explode storage if misused.

Use labels for dimensions with a small, bounded set of values:

Good labelsBad labels
method (GET, POST, ...)user_id (millions of users)
status (200, 404, 500, ...)request_id (one per request)
path (route templates, not raw URLs)Full URL with query params
region, environment, podTimestamps, error messages

Rule of thumb: total active series across the metric < ~10,000. Beyond that you're heading for cardinality trouble.

Library Choices by Language

LanguageLibrary
Goprometheus/client_golang
Pythonprometheus_client
Node.jsprom-client
Javasimpleclient / Micrometer
Rubyprometheus-client
.NETprometheus-net
Rustprometheus

All follow the same conventions: register metrics, expose /metrics, let Prometheus scrape.

Already-Built Exporters

You don't have to instrument everything yourself. There's an exporter for almost everything:

ExporterWhat it exports
node-exporterLinux host metrics (CPU, memory, disk, network)
cadvisorContainer metrics
postgres_exporterPostgreSQL stats
mysqld_exporterMySQL stats
redis_exporterRedis stats
blackbox_exporterProbes external HTTP/TCP/DNS endpoints
kube-state-metricsKubernetes object state
nginx-exporternginx stub_status / NGINX Plus

Pull them up alongside your app and add a scrape_config entry.

What's Next

You can emit metrics from anything. Next, learn how to ask questions of them → PromQL.

On this page