Prometheus & Grafana
The default open-source monitoring stack - from first scrape to production-grade alerting and dashboards
Prometheus & Grafana
Prometheus is an open-source time-series database and monitoring system. Grafana is a visualization platform that connects to Prometheus (and other data sources) to build dashboards and alerts. Together they're the default open-source monitoring stack.
Why This Stack
| Without metrics | With Prometheus + Grafana |
|---|---|
| Reactive: users tell you it's broken | Proactive: alerts fire before users notice |
top and tail -f on a box | One dashboard for every service |
| No baseline for "normal" | Histograms and percentiles for every endpoint |
| Capacity planning by guess | Capacity planning from actual usage trends |
The Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ App+/metrics│ │Node Exporter│ │ cAdvisor │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
│ pull (scrape)
┌──────▼──────┐
│ Prometheus │ (TSDB + PromQL)
└──────┬──────┘
│ query
┌─────────┼─────────┐
▼ ▼
┌────────────┐ ┌────────────┐
│ Grafana │ │Alertmanager│
│(Dashboards)│ │ (Alerts) │
└────────────┘ └────────────┘Two ideas to internalize:
- Pull, not push. Prometheus scrapes
GET /metricson a schedule. Apps don't push. - Labels are dimensions. A metric
http_requests_totalwith labels{method, path, status}becomes a multi-dimensional dataset you can slice with PromQL.
Learning Path
Read in this order if you're new — each page builds on the previous one.
1. Getting Started
Stand up the stack with Docker Compose; scrape your first target
2. Instrumentation
Metric types and instrumenting a real app
3. PromQL
The query language - rates, percentiles, and the patterns you'll use daily
4. Alerting
Alert rules and Alertmanager routing
5. Grafana Dashboards
Data sources, panels, dashboard design, USE/RED/Golden Signals
6. Best Practices
Cardinality, recording rules, retention, federation, long-term storage
When NOT to Use Prometheus
It's the right default, but not the only option:
- Massive cardinality / per-user metrics? Prometheus struggles. Look at ClickHouse-based stacks (Cube, Aperture), or commercial APM (Datadog, New Relic).
- You want logs + traces + metrics in one place? Grafana's broader stack (Loki for logs, Tempo for traces) or an APM SaaS.
- You can't run servers? Managed Prometheus exists — AWS Managed Prometheus, Grafana Cloud, Chronosphere.
Prometheus collects metrics: numeric time-series sampled over time. For logs (string events) use Loki / Elasticsearch / a SaaS; for traces (request paths through services) use Tempo / Jaeger. Together they form the "three pillars of observability."