Steven's Knowledge

FinOps & Cloud Cost

OpenCost, Kubecost, Vantage, Cloudability - bringing engineering, finance, and product together to spend cloud dollars wisely

FinOps & Cloud Cost Management

FinOps is the practice of bringing financial accountability to the variable spend model of cloud. The cloud unlocks speed; it also unlocks the ability to spend a fortune by accident. FinOps is how mature orgs keep those two in tension productively.

The discipline isn't "save money" — it's "spend with intent." Sometimes the right call is to spend more; sometimes much less. FinOps gives the org the data and the loops to decide deliberately.

Why FinOps

Without FinOpsWith FinOps
Cloud bill is a surprise each monthBill is forecasted, owned, tracked
Engineers don't know what their service costsCost attribution per team, per service
Idle / oversized resources accumulateContinuous rightsizing
Reserved Instances / Savings Plans unusedCommitment management runs as a function
Cost vs. value invisibleUnit economics ($ per request, per user) tracked
Finance reactive ("why is it so high?")Finance partners with eng on budgets and plans

If your cloud bill has crossed ~$100k/month, FinOps stops being optional.

The FinOps Framework

The FinOps Foundation defines three phases that run continuously:

┌──────────┐      ┌──────────┐      ┌──────────┐
│  Inform  │ ───> │ Optimize │ ───> │ Operate  │
└──────────┘      └──────────┘      └──────────┘
     ^                                    │
     └────────────────────────────────────┘
  1. Inform: visibility, allocation, benchmarking, budgeting & forecasting
  2. Optimize: rightsizing, commitments, workload optimization, architecture
  3. Operate: continuous improvement, automation, organizational alignment

Most orgs spend years just on Inform because tagging discipline is hard.

The Players

Cloud-native tools

ToolBest for
AWS Cost Explorer / Cost & Usage Report (CUR)AWS billing data; the source of truth
Azure Cost ManagementAzure equivalent
GCP Billing Reports + BigQuery exportGCP equivalent
AWS Compute OptimizerNative rightsizing recommendations
AWS Savings Plans / RIsCommitment-based discounts

These are free with the cloud, but most teams find them insufficient — slow UI, hard to slice by team, no Kubernetes attribution.

Third-party FinOps platforms

ToolWhere it shines
VantageMulti-cloud + SaaS; clean UI; affordable for small teams
Cloudability (IBM)Enterprise; deep AWS + Azure + GCP; commitment management
CloudHealth (VMware/Broadcom)Enterprise; mature governance features
CloudZeroUnit economics focus ($ per customer, $ per feature)
DensifyML-driven rightsizing
Spot.io (NetApp)Auto-running on Spot / preemptible safely

Kubernetes cost

ToolNotes
OpenCostCNCF; open standard for K8s cost attribution
KubecostCommercial product built on OpenCost; team/namespace/label allocation
Karpenter (AWS-native)Just-in-time node provisioning; uses Spot effectively
Goldilocks (Fairwinds)Vertical Pod Autoscaler recommendations for K8s rightsizing

FOCUS Specification

FOCUS (FinOps Open Cost & Usage Specification) is a vendor-neutral schema for billing data. AWS, Azure, GCP, and Oracle all now export FOCUS-compliant data. Multi-cloud cost analysis stops being a custom-ETL nightmare.

Where Cloud Cost Hides

The bill grows in places that aren't obvious. Common culprits:

CategoryTypical waste
Idle resourcesStopped-but-not-terminated EC2; unattached EBS volumes; old snapshots
Oversized resourcesr5.4xlarge running at 8% CPU
Dev/staging running 24/7Could shut down nights/weekends (70% reduction)
No commitment coverageOn-demand prices for steady workloads (RIs / SP save 30-70%)
EgressData leaving the cloud, especially cross-region or cross-AZ
NAT Gateway$0.045/GB processed + $0.045/hr; surprisingly large
Old gen instancesm4 / c4 cost more than m5 / c5 with worse performance
Storage tieringCold S3 data on Standard instead of IA / Glacier
Forgotten environmentsThe POC from 2 years ago, still spinning
Logging volumeCloudWatch Logs / Datadog ingestion at unbounded rates
K8s overprovisioningPods reserving 4 CPU, using 0.1 CPU

A common pattern: 30-50% of cloud spend is waste in unoptimized orgs. Even good orgs find 10-20% on a fresh sweep.

Unit Economics

The deeper FinOps question: what does each business unit cost?

Examples:

  • $0.001 per API request
  • $0.30 per active user per month
  • $0.15 per customer
  • $1.20 per checkout

When you track these, you connect cloud cost to business value. A 10x traffic spike is fine if cost per request stays flat; a 30% margin erosion is alarming even if total bill flat.

Tools like CloudZero are built specifically for this. You can also derive it manually: total cost / business metric, per month, by service.

Learning Path

When NOT to Obsess

FinOps maximizes value, not minimizes cost. Don't:

  • Spend $200k of engineering time to save $20k/yr.
  • Block a launch over cost projections — ship, then optimize.
  • Rightsize so aggressively that you have no headroom for a burst.
  • Force teams to use Spot for workloads that need stability.
  • Treat cost as the only metric — reliability and velocity often matter more.

The line: optimize the boring waste, accept the cost of strategic value, and forecast deliberately.

The FinOps maturity stages (per FinOps Foundation): Crawl (basic visibility, ad-hoc), Walk (defined processes, regular optimization), Run (continuous, automated, embedded in eng workflow). Most orgs are in Crawl; the gap to Walk is mostly tagging discipline and one dedicated FinOps person.

On this page