Getting Started
Tag everything, enable Cost Explorer & CUR, set up Kubecost on a cluster, build your first cost dashboard
Getting Started
This page walks the Inform phase of FinOps: getting visibility into where the money goes. Without this, every "optimization" is a guess.
Step 1: Tag Everything
Tagging is the single highest-leverage FinOps practice. Without tags, your bill is a mystery.
Minimum useful tag set:
| Tag | Example values | Why |
|---|---|---|
Environment | prod, staging, dev | Separate the surprise from the safe |
Team | payments, growth, platform | Whose budget |
Service | checkout, web, auth | What costs what |
CostCenter | eng-platform, data | Maps to finance ledger |
Owner | team@company.com | Who to talk to |
AWS tagging policy
# Create tag policy at the org level (AWS Organizations)
aws organizations create-policy \
--name require-cost-tags \
--type TAG_POLICY \
--content file://tag-policy.jsontag-policy.json:
{
"tags": {
"Environment": {
"tag_key": { "@@assign": "Environment" },
"tag_value": { "@@assign": ["prod", "staging", "dev"] },
"enforced_for": { "@@assign": ["ec2:instance", "rds:db", "s3:bucket"] }
},
"Team": {
"tag_key": { "@@assign": "Team" }
}
}
}Tag future resources via policy; backfill existing resources with a script (boto3 + describe-* + create-tags).
Activate cost allocation tags
In AWS Billing console → Cost Allocation Tags → activate the ones you want to filter on. Takes 24 hours to appear in Cost Explorer.
Step 2: Enable the Billing Data Pipeline
The console UI is for spot checks. The data lives in:
# AWS Cost & Usage Report (CUR) - hourly, granular
aws cur put-report-definition --report-definition '{
"ReportName": "main-cur",
"TimeUnit": "HOURLY",
"Format": "Parquet",
"Compression": "Parquet",
"AdditionalSchemaElements": ["RESOURCES"],
"S3Bucket": "company-billing",
"S3Prefix": "cur/",
"S3Region": "us-east-1",
"AdditionalArtifacts": ["ATHENA"]
}'Then query via Athena:
SELECT
line_item_product_code,
resource_tags_user_team,
SUM(line_item_unblended_cost) AS cost
FROM main_cur
WHERE year = '2026' AND month = '5'
GROUP BY 1, 2
ORDER BY cost DESC
LIMIT 50;For GCP: enable BigQuery billing export. For Azure: enable Cost Management exports to a storage account.
Step 3: Set Up Kubecost / OpenCost on Kubernetes
If you run K8s, plain cloud billing tools can't tell you what each namespace / team costs — the cluster is one big EC2 line item. OpenCost (CNCF) and Kubecost (commercial, built on OpenCost) solve this:
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="your-token" \
--set kubecostProductConfigs.clusterName="prod-us-east-1"
# Port-forward the UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090
open http://localhost:9090After 24 hours of metrics collection you can see:
- Cost per namespace, deployment, label
- Idle vs. used CPU/memory
- Recommended rightsizing per workload
Pure OpenCost (no commercial bits):
helm install opencost opencost/opencost \
--namespace opencost --create-namespaceStep 4: Build a First Dashboard
The dashboard everyone needs: monthly cost trend, by team, with anomalies highlighted.
In Athena + QuickSight (or Grafana on Athena):
SELECT
date_trunc('month', line_item_usage_start_date) AS month,
resource_tags_user_team AS team,
SUM(line_item_unblended_cost) AS cost
FROM main_cur
WHERE year = '2026'
GROUP BY 1, 2
ORDER BY 1, 3 DESC;For a quick win without building from scratch:
- Vantage (free tier): point at AWS account, get a usable dashboard in 10 minutes.
- Infracost (open source): show cost diff in Terraform PRs.
Step 5: Find One Easy Win
Before optimizing systematically, find one big obvious leak. Common candidates:
# Unattached EBS volumes (paying for nothing)
aws ec2 describe-volumes --filters Name=status,Values=available \
--query "Volumes[*].[VolumeId, Size, CreateTime, Tags]" --output table
# Old snapshots
aws ec2 describe-snapshots --owner-ids self \
--query "Snapshots[?StartTime<='2025-01-01'].[SnapshotId, VolumeSize, StartTime]" \
--output table
# Stopped EC2 still attached to storage
aws ec2 describe-instances --filters Name=instance-state-name,Values=stopped \
--query "Reservations[*].Instances[*].[InstanceId, LaunchTime, Tags]"
# Old gen instances (more expensive than current gen)
aws ec2 describe-instances --filters Name=instance-type,Values=m4.*,c4.*,r4.* \
--query "Reservations[*].Instances[*].[InstanceId, InstanceType, Tags]"
# NAT Gateway data processed (a quietly huge bill line)
# Check in Cost Explorer: filter Service = "EC2 - Other", Usage type = NatGateway-BytesFor a 2-year-old AWS account, this typically surfaces $5-50k/yr of pure waste.
Step 6: Set Up Budgets & Alerts
You want a Slack ping before the bill arrives, not after.
# AWS Budget with SNS alert
aws budgets create-budget --account-id 123456789012 \
--budget '{
"BudgetName": "monthly-eng-budget",
"BudgetLimit": { "Amount": "50000", "Unit": "USD" },
"TimeUnit": "MONTHLY",
"BudgetType": "COST"
}' \
--notifications-with-subscribers '[{
"Notification": {
"NotificationType": "FORECASTED",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 90
},
"Subscribers": [{ "SubscriptionType": "SNS", "Address": "arn:aws:sns:..." }]
}]'Wire SNS → Lambda → Slack webhook. The alert that matters: forecasted month > budget (catches anomalies before month-end).
Step 7: First Optimization Pass
With one month of tagged data and Kubecost installed, look for:
- Unused EBS / EIPs / load balancers. Pure waste; delete.
- Dev/staging running nights & weekends. Schedule auto-stop (saves ~70% of dev cost).
- No RIs / SP coverage. Buy 1-year no-upfront Savings Plans for your baseline compute. 27% off, low commitment risk.
- Right-size top 10 most expensive workloads. Kubecost/Compute Optimizer recommendations.
- S3 lifecycle policies. Move old logs to IA / Glacier after 30/90 days.
Each of these is a one-day project; together they often shave 15-30% off the bill in the first month of practice.
What's Next
You can see and act on cost. Next:
- Patterns — showback, commitment strategy, Spot, anomaly detection, unit economics
- Best Practices — tagging hygiene, team structure, eng incentives, common pitfalls