Steven's Knowledge

Getting Started

Stand up Prometheus, Grafana, and Alertmanager with Docker Compose, and scrape your first target

Getting Started

The fastest way to see the stack in action: Docker Compose. You'll get Prometheus scraping itself and a node-exporter, plus Grafana with Prometheus pre-wired as a data source.

A Working Stack

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./prometheus/alerts:/etc/prometheus/alerts
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'                  # /-/reload endpoint

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus

  alertmanager:
    image: prom/alertmanager:v0.27.0
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml

  node-exporter:
    image: prom/node-exporter:v1.7.0
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'

volumes:
  prometheus-data:
  grafana-data:

Prometheus Config

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

rule_files:
  - "alerts/*.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  # Prometheus scrapes itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # System metrics from the node-exporter container
  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

  # Your application
  - job_name: "api-server"
    metrics_path: /metrics
    static_configs:
      - targets: ["api-server:3000"]
        labels:
          environment: "production"

Grafana Auto-Provisioning

Tell Grafana about Prometheus on startup — no clicking required:

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      timeInterval: "15s"

Bring It Up

docker compose up -d

# Prometheus UI
open http://localhost:9090

# Grafana
open http://localhost:3001          # admin / admin

# Alertmanager
open http://localhost:9093

# Node Exporter raw metrics
curl http://localhost:9100/metrics | head -40

Your First Queries

In the Prometheus UI's Graph tab, paste these:

# All metric names: type "{" and let the autocomplete show you
up

# CPU usage % per node
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory used %
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

# Disk used % at /
(node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"})
/ node_filesystem_size_bytes{mountpoint="/"} * 100

up is the special metric Prometheus generates for every scrape: 1 means the target was reachable, 0 means it wasn't. Alert on up == 0 for every job.

Three Things Worth Knowing Now

Scrape interval and resolution. A 15s scrape_interval means the finest resolution you can ever query is 15s. Don't drop it below what your storage budget can hold.

Service discovery. Static targets are fine for two services. For dozens, use service discovery — Kubernetes (kubernetes_sd_configs), Consul, EC2, DNS, file-based. Prometheus auto-discovers what to scrape.

/-/reload. Hot-reload config without restarting:

curl -X POST http://localhost:9090/-/reload

Works because we passed --web.enable-lifecycle above. Without it, you'd have to restart the container.

What's Next

You have a working stack scraping system metrics. Next: get your application's metrics into it → Instrumentation.

On this page