The default open-source monitoring stack - from first scrape to production-grade alerting and dashboards

Prometheus & Grafana

Prometheus is an open-source time-series database and monitoring system. Grafana is a visualization platform that connects to Prometheus (and other data sources) to build dashboards and alerts. Together they're the default open-source monitoring stack.

Why This Stack

Without metrics	With Prometheus + Grafana
Reactive: users tell you it's broken	Proactive: alerts fire before users notice
`top` and `tail -f` on a box	One dashboard for every service
No baseline for "normal"	Histograms and percentiles for every endpoint
Capacity planning by guess	Capacity planning from actual usage trends

The Architecture

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ App+/metrics│  │Node Exporter│  │  cAdvisor   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       └────────────────┼────────────────┘
                        │ pull (scrape)
                 ┌──────▼──────┐
                 │  Prometheus │  (TSDB + PromQL)
                 └──────┬──────┘
                        │ query
              ┌─────────┼─────────┐
              ▼                   ▼
       ┌────────────┐     ┌────────────┐
       │   Grafana  │     │Alertmanager│
       │(Dashboards)│     │  (Alerts)  │
       └────────────┘     └────────────┘

Two ideas to internalize:

Pull, not push. Prometheus scrapes GET /metrics on a schedule. Apps don't push.
Labels are dimensions. A metric http_requests_total with labels {method, path, status} becomes a multi-dimensional dataset you can slice with PromQL.

Learning Path

Read in this order if you're new — each page builds on the previous one.

1. Getting Started

Stand up the stack with Docker Compose; scrape your first target

2. Instrumentation

Metric types and instrumenting a real app

3. PromQL

The query language - rates, percentiles, and the patterns you'll use daily

4. Alerting

Alert rules and Alertmanager routing

5. Grafana Dashboards

Data sources, panels, dashboard design, USE/RED/Golden Signals

6. Best Practices

Cardinality, recording rules, retention, federation, long-term storage

When NOT to Use Prometheus

It's the right default, but not the only option:

Massive cardinality / per-user metrics? Prometheus struggles. Look at ClickHouse-based stacks (Cube, Aperture), or commercial APM (Datadog, New Relic).
You want logs + traces + metrics in one place? Grafana's broader stack (Loki for logs, Tempo for traces) or an APM SaaS.
You can't run servers? Managed Prometheus exists — AWS Managed Prometheus, Grafana Cloud, Chronosphere.

Prometheus collects metrics: numeric time-series sampled over time. For logs (string events) use Loki / Elasticsearch / a SaaS; for traces (request paths through services) use Tempo / Jaeger. Together they form the "three pillars of observability."

Prometheus & Grafana

Prometheus & Grafana

Why This Stack

The Architecture

Learning Path

1. Getting Started

2. Instrumentation

3. PromQL

4. Alerting

5. Grafana Dashboards

6. Best Practices

When NOT to Use Prometheus

On this page