FinOps & Cloud Cost
OpenCost, Kubecost, Vantage, Cloudability - bringing engineering, finance, and product together to spend cloud dollars wisely
FinOps & Cloud Cost Management
FinOps is the practice of bringing financial accountability to the variable spend model of cloud. The cloud unlocks speed; it also unlocks the ability to spend a fortune by accident. FinOps is how mature orgs keep those two in tension productively.
The discipline isn't "save money" — it's "spend with intent." Sometimes the right call is to spend more; sometimes much less. FinOps gives the org the data and the loops to decide deliberately.
Why FinOps
| Without FinOps | With FinOps |
|---|---|
| Cloud bill is a surprise each month | Bill is forecasted, owned, tracked |
| Engineers don't know what their service costs | Cost attribution per team, per service |
| Idle / oversized resources accumulate | Continuous rightsizing |
| Reserved Instances / Savings Plans unused | Commitment management runs as a function |
| Cost vs. value invisible | Unit economics ($ per request, per user) tracked |
| Finance reactive ("why is it so high?") | Finance partners with eng on budgets and plans |
If your cloud bill has crossed ~$100k/month, FinOps stops being optional.
The FinOps Framework
The FinOps Foundation defines three phases that run continuously:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Inform │ ───> │ Optimize │ ───> │ Operate │
└──────────┘ └──────────┘ └──────────┘
^ │
└────────────────────────────────────┘- Inform: visibility, allocation, benchmarking, budgeting & forecasting
- Optimize: rightsizing, commitments, workload optimization, architecture
- Operate: continuous improvement, automation, organizational alignment
Most orgs spend years just on Inform because tagging discipline is hard.
The Players
Cloud-native tools
| Tool | Best for |
|---|---|
| AWS Cost Explorer / Cost & Usage Report (CUR) | AWS billing data; the source of truth |
| Azure Cost Management | Azure equivalent |
| GCP Billing Reports + BigQuery export | GCP equivalent |
| AWS Compute Optimizer | Native rightsizing recommendations |
| AWS Savings Plans / RIs | Commitment-based discounts |
These are free with the cloud, but most teams find them insufficient — slow UI, hard to slice by team, no Kubernetes attribution.
Third-party FinOps platforms
| Tool | Where it shines |
|---|---|
| Vantage | Multi-cloud + SaaS; clean UI; affordable for small teams |
| Cloudability (IBM) | Enterprise; deep AWS + Azure + GCP; commitment management |
| CloudHealth (VMware/Broadcom) | Enterprise; mature governance features |
| CloudZero | Unit economics focus ($ per customer, $ per feature) |
| Densify | ML-driven rightsizing |
| Spot.io (NetApp) | Auto-running on Spot / preemptible safely |
Kubernetes cost
| Tool | Notes |
|---|---|
| OpenCost | CNCF; open standard for K8s cost attribution |
| Kubecost | Commercial product built on OpenCost; team/namespace/label allocation |
| Karpenter (AWS-native) | Just-in-time node provisioning; uses Spot effectively |
| Goldilocks (Fairwinds) | Vertical Pod Autoscaler recommendations for K8s rightsizing |
FOCUS Specification
FOCUS (FinOps Open Cost & Usage Specification) is a vendor-neutral schema for billing data. AWS, Azure, GCP, and Oracle all now export FOCUS-compliant data. Multi-cloud cost analysis stops being a custom-ETL nightmare.
Where Cloud Cost Hides
The bill grows in places that aren't obvious. Common culprits:
| Category | Typical waste |
|---|---|
| Idle resources | Stopped-but-not-terminated EC2; unattached EBS volumes; old snapshots |
| Oversized resources | r5.4xlarge running at 8% CPU |
| Dev/staging running 24/7 | Could shut down nights/weekends (70% reduction) |
| No commitment coverage | On-demand prices for steady workloads (RIs / SP save 30-70%) |
| Egress | Data leaving the cloud, especially cross-region or cross-AZ |
| NAT Gateway | $0.045/GB processed + $0.045/hr; surprisingly large |
| Old gen instances | m4 / c4 cost more than m5 / c5 with worse performance |
| Storage tiering | Cold S3 data on Standard instead of IA / Glacier |
| Forgotten environments | The POC from 2 years ago, still spinning |
| Logging volume | CloudWatch Logs / Datadog ingestion at unbounded rates |
| K8s overprovisioning | Pods reserving 4 CPU, using 0.1 CPU |
A common pattern: 30-50% of cloud spend is waste in unoptimized orgs. Even good orgs find 10-20% on a fresh sweep.
Unit Economics
The deeper FinOps question: what does each business unit cost?
Examples:
- $0.001 per API request
- $0.30 per active user per month
- $0.15 per customer
- $1.20 per checkout
When you track these, you connect cloud cost to business value. A 10x traffic spike is fine if cost per request stays flat; a 30% margin erosion is alarming even if total bill flat.
Tools like CloudZero are built specifically for this. You can also derive it manually: total cost / business metric, per month, by service.
Learning Path
1. Getting Started
Tag everything; enable Cost Explorer & CUR; set up Kubecost on a cluster; build first cost dashboard
2. Patterns
Showback/chargeback, rightsizing loops, commitment strategy, Spot usage, anomaly detection, budgets
3. Best Practices
Tagging policy, FinOps team structure, eng incentives, common pitfalls, when to optimize vs. when not to
When NOT to Obsess
FinOps maximizes value, not minimizes cost. Don't:
- Spend $200k of engineering time to save $20k/yr.
- Block a launch over cost projections — ship, then optimize.
- Rightsize so aggressively that you have no headroom for a burst.
- Force teams to use Spot for workloads that need stability.
- Treat cost as the only metric — reliability and velocity often matter more.
The line: optimize the boring waste, accept the cost of strategic value, and forecast deliberately.
The FinOps maturity stages (per FinOps Foundation): Crawl (basic visibility, ad-hoc), Walk (defined processes, regular optimization), Run (continuous, automated, embedded in eng workflow). Most orgs are in Crawl; the gap to Walk is mostly tagging discipline and one dedicated FinOps person.