Steven's Knowledge

Getting Started

Tag everything, enable Cost Explorer & CUR, set up Kubecost on a cluster, build your first cost dashboard

Getting Started

This page walks the Inform phase of FinOps: getting visibility into where the money goes. Without this, every "optimization" is a guess.

Step 1: Tag Everything

Tagging is the single highest-leverage FinOps practice. Without tags, your bill is a mystery.

Minimum useful tag set:

TagExample valuesWhy
Environmentprod, staging, devSeparate the surprise from the safe
Teampayments, growth, platformWhose budget
Servicecheckout, web, authWhat costs what
CostCentereng-platform, dataMaps to finance ledger
Ownerteam@company.comWho to talk to

AWS tagging policy

# Create tag policy at the org level (AWS Organizations)
aws organizations create-policy \
  --name require-cost-tags \
  --type TAG_POLICY \
  --content file://tag-policy.json

tag-policy.json:

{
  "tags": {
    "Environment": {
      "tag_key": { "@@assign": "Environment" },
      "tag_value": { "@@assign": ["prod", "staging", "dev"] },
      "enforced_for": { "@@assign": ["ec2:instance", "rds:db", "s3:bucket"] }
    },
    "Team": {
      "tag_key": { "@@assign": "Team" }
    }
  }
}

Tag future resources via policy; backfill existing resources with a script (boto3 + describe-* + create-tags).

Activate cost allocation tags

In AWS Billing console → Cost Allocation Tags → activate the ones you want to filter on. Takes 24 hours to appear in Cost Explorer.

Step 2: Enable the Billing Data Pipeline

The console UI is for spot checks. The data lives in:

# AWS Cost & Usage Report (CUR) - hourly, granular
aws cur put-report-definition --report-definition '{
  "ReportName": "main-cur",
  "TimeUnit": "HOURLY",
  "Format": "Parquet",
  "Compression": "Parquet",
  "AdditionalSchemaElements": ["RESOURCES"],
  "S3Bucket": "company-billing",
  "S3Prefix": "cur/",
  "S3Region": "us-east-1",
  "AdditionalArtifacts": ["ATHENA"]
}'

Then query via Athena:

SELECT
  line_item_product_code,
  resource_tags_user_team,
  SUM(line_item_unblended_cost) AS cost
FROM main_cur
WHERE year = '2026' AND month = '5'
GROUP BY 1, 2
ORDER BY cost DESC
LIMIT 50;

For GCP: enable BigQuery billing export. For Azure: enable Cost Management exports to a storage account.

Step 3: Set Up Kubecost / OpenCost on Kubernetes

If you run K8s, plain cloud billing tools can't tell you what each namespace / team costs — the cluster is one big EC2 line item. OpenCost (CNCF) and Kubecost (commercial, built on OpenCost) solve this:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken="your-token" \
  --set kubecostProductConfigs.clusterName="prod-us-east-1"

# Port-forward the UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090
open http://localhost:9090

After 24 hours of metrics collection you can see:

  • Cost per namespace, deployment, label
  • Idle vs. used CPU/memory
  • Recommended rightsizing per workload

Pure OpenCost (no commercial bits):

helm install opencost opencost/opencost \
  --namespace opencost --create-namespace

Step 4: Build a First Dashboard

The dashboard everyone needs: monthly cost trend, by team, with anomalies highlighted.

In Athena + QuickSight (or Grafana on Athena):

SELECT
  date_trunc('month', line_item_usage_start_date) AS month,
  resource_tags_user_team AS team,
  SUM(line_item_unblended_cost) AS cost
FROM main_cur
WHERE year = '2026'
GROUP BY 1, 2
ORDER BY 1, 3 DESC;

For a quick win without building from scratch:

  • Vantage (free tier): point at AWS account, get a usable dashboard in 10 minutes.
  • Infracost (open source): show cost diff in Terraform PRs.

Step 5: Find One Easy Win

Before optimizing systematically, find one big obvious leak. Common candidates:

# Unattached EBS volumes (paying for nothing)
aws ec2 describe-volumes --filters Name=status,Values=available \
  --query "Volumes[*].[VolumeId, Size, CreateTime, Tags]" --output table

# Old snapshots
aws ec2 describe-snapshots --owner-ids self \
  --query "Snapshots[?StartTime<='2025-01-01'].[SnapshotId, VolumeSize, StartTime]" \
  --output table

# Stopped EC2 still attached to storage
aws ec2 describe-instances --filters Name=instance-state-name,Values=stopped \
  --query "Reservations[*].Instances[*].[InstanceId, LaunchTime, Tags]"

# Old gen instances (more expensive than current gen)
aws ec2 describe-instances --filters Name=instance-type,Values=m4.*,c4.*,r4.* \
  --query "Reservations[*].Instances[*].[InstanceId, InstanceType, Tags]"

# NAT Gateway data processed (a quietly huge bill line)
# Check in Cost Explorer: filter Service = "EC2 - Other", Usage type = NatGateway-Bytes

For a 2-year-old AWS account, this typically surfaces $5-50k/yr of pure waste.

Step 6: Set Up Budgets & Alerts

You want a Slack ping before the bill arrives, not after.

# AWS Budget with SNS alert
aws budgets create-budget --account-id 123456789012 \
  --budget '{
    "BudgetName": "monthly-eng-budget",
    "BudgetLimit": { "Amount": "50000", "Unit": "USD" },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "FORECASTED",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 90
    },
    "Subscribers": [{ "SubscriptionType": "SNS", "Address": "arn:aws:sns:..." }]
  }]'

Wire SNS → Lambda → Slack webhook. The alert that matters: forecasted month > budget (catches anomalies before month-end).

Step 7: First Optimization Pass

With one month of tagged data and Kubecost installed, look for:

  1. Unused EBS / EIPs / load balancers. Pure waste; delete.
  2. Dev/staging running nights & weekends. Schedule auto-stop (saves ~70% of dev cost).
  3. No RIs / SP coverage. Buy 1-year no-upfront Savings Plans for your baseline compute. 27% off, low commitment risk.
  4. Right-size top 10 most expensive workloads. Kubecost/Compute Optimizer recommendations.
  5. S3 lifecycle policies. Move old logs to IA / Glacier after 30/90 days.

Each of these is a one-day project; together they often shave 15-30% off the bill in the first month of practice.

What's Next

You can see and act on cost. Next:

  • Patterns — showback, commitment strategy, Spot, anomaly detection, unit economics
  • Best Practices — tagging hygiene, team structure, eng incentives, common pitfalls

On this page