Patterns
Showback/chargeback, rightsizing loops, commitment strategy, Spot/preemptible, anomaly detection, unit economics
Patterns
Patterns that move FinOps from "look at the bill" to "spend with intent."
Showback and Chargeback
| Model | Definition | Effect |
|---|---|---|
| Showback | Each team sees their cost; no internal billing | Awareness; weak incentive |
| Chargeback | Each team's cost deducts from their budget | Strong incentive; can create friction |
| Hybrid | Showback by default; chargeback above a threshold | Most common in mid-size orgs |
Start with showback. The first time a team sees "your service costs $40k/month" they'll find optimizations on their own. Chargeback creates real incentive but also creates gaming ("we'll just use the central platform team's resources") — only introduce when the org is mature enough.
Rightsizing Loops
Rightsizing isn't a one-time event; it's a loop:
┌─> Measure utilization (CPU, mem, IOPS, network)
│ │
│ v
│ Identify oversized resources (utilization < 40%)
│ │
│ v
│ Propose change (downsize / consolidate)
│ │
│ v
│ Apply (off-peak, with rollback ready)
│ │
│ v
└── Verify (no perf regression, no on-call wakeups)For VMs: AWS Compute Optimizer, Azure Advisor, GCP Recommender. For K8s: Goldilocks (VPA recommendations) or Kubecost rightsizing.
Cadence: top 10 most expensive workloads, every quarter. Trying to rightsize everything burns engineer time on micro-savings.
Commitment Strategy
Cloud providers give big discounts for commitment:
| AWS Construct | Discount | Commitment | Best for |
|---|---|---|---|
| Compute Savings Plan | up to 66% | 1-3 years, any instance family/region | Steady compute baseline |
| EC2 Instance Savings Plan | up to 72% | 1-3 years, specific family | Predictable workloads |
| Reserved Instances (RI) | up to 75% | 1-3 years, specific config | Legacy; SP usually better |
| Spot Instances | up to 90% | Can be reclaimed in 2 min | Fault-tolerant workloads |
A practical strategy:
- Baseline coverage: Compute Savings Plan for ~70% of steady-state usage (1-year, no upfront).
- Burst on-demand: Above baseline runs at full price; that's fine, it's bursty.
- Spot for batch & stateless: Karpenter on AWS, Spot.io across clouds.
- Re-evaluate quarterly as baseline shifts.
Don't over-commit. A 3-year RI for a service that gets deprecated in 18 months is worse than on-demand.
Spot / Preemptible Usage
Spot instances can be reclaimed at 2 minutes' notice but cost 60-90% less. Good targets:
- Batch jobs (data pipelines, ML training, CI runners)
- Stateless web tiers with replicas
- Dev/staging environments
- Anything that can checkpoint or retry
Bad targets:
- Stateful single-instance DBs
- Long-lived workloads that can't tolerate interruption
- Anything where 2-min eviction breaks SLO
Tooling that makes Spot safe:
- AWS Karpenter auto-replaces interrupted nodes within seconds
- Spot.io orchestrates Spot + on-demand across clouds
- Cluster Autoscaler with mixed instance policy falls back to on-demand on capacity loss
Anomaly Detection
You want to know when spend suddenly changes — not at month-end review.
- AWS Cost Anomaly Detection (free) — ML-based; alerts via SNS
- Vantage / CloudZero anomaly alerts — daily reports
- Custom: Athena query on CUR; alert if any service's daily cost > 2× previous 7-day avg
WITH daily AS (
SELECT
date(line_item_usage_start_date) AS d,
line_item_product_code AS svc,
SUM(line_item_unblended_cost) AS cost
FROM main_cur
WHERE year = '2026'
GROUP BY 1, 2
),
baseline AS (
SELECT
svc,
AVG(cost) AS avg_cost
FROM daily
WHERE d BETWEEN date_add('day', -8, current_date) AND date_add('day', -1, current_date)
GROUP BY svc
)
SELECT d.svc, d.cost, b.avg_cost, d.cost / NULLIF(b.avg_cost, 0) AS ratio
FROM daily d JOIN baseline b USING (svc)
WHERE d.d = current_date - INTERVAL '1' DAY
AND d.cost > 100
AND d.cost > 2 * b.avg_cost
ORDER BY ratio DESC;Common anomaly causes: a test left running, a recursive Lambda, a CloudWatch metric stream sending to S3 in a loop, a new feature ingesting more data than expected.
Unit Economics
The most strategic FinOps practice. Pick metrics that connect cost to business value:
| Business metric | Cost ratio |
|---|---|
| Request | $ per million requests |
| Customer | $ per active user per month |
| Transaction | $ per checkout / per order |
| Build | $ per CI build |
| Tenant (B2B SaaS) | $ per customer per month, by plan tier |
Build a dashboard with cost ÷ business metric over time:
$ per active user per month
0.40 ┤ ╭─── alert!
0.30 ┤ ╭───────╯
0.20 ┤──────────────────────────╯
0.10 ┤
└─────────────────────────────────────────
Jan Feb Mar Apr May Jun Jul AugIf the ratio is falling: scaling efficiently. If rising: investigate before total cost gets dramatic.
Architecture-Level Patterns
Some savings only come from changing the architecture:
| Pattern | Savings |
|---|---|
| Reduce egress: cache in CDN; keep traffic in-region | Egress is often 5-15% of bill |
| S3 → S3 Intelligent-Tiering | Auto-moves cold data to IA; saves 40-95% on storage |
| Multi-AZ only where needed: dev doesn't need it | Cross-AZ traffic at $0.01/GB adds up |
| Lambda for spiky workloads, ECS/EKS for steady | Avoid Lambda for 24/7 work; avoid containers for 1% duty cycle |
| Replace NAT Gateway with VPC endpoints for S3/DynamoDB | NAT is $0.045/hr + per-GB; endpoints are flat or free |
| Aurora Serverless for spiky DB workloads | Pay per ACU-hour vs. 24/7 provisioned |
| Move logs to Loki / OpenSearch from CloudWatch | CloudWatch Logs at $0.50/GB ingest is brutal at scale |
Pick one per quarter. Don't try them all.
FinOps in CI/CD
Cost as a deploy-time concern:
- Infracost in Terraform PRs: "this PR adds $342/month"
- Cost guardrails: block PRs that exceed N% cost increase without approval
- Pre-merge K8s rightsizing checks: warn if requested resources >> historical usage
# Infracost in GitHub Actions
- name: Run Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- run: infracost diff --path . --format json --out-file /tmp/infracost.json
- run: infracost comment github --path /tmp/infracost.json \
--repo $GITHUB_REPOSITORY --pull-request ${{ github.event.pull_request.number }} \
--github-token ${{ github.token }}A reviewer who sees "this PR adds $1.2k/month" makes a different decision than one who only sees the diff.
Anti-Patterns
Optimizing what's easy to measure. Egress is easy to see and easy to obsess over; meanwhile the $200k/month idle Redshift cluster sits ignored. Always sort by total cost.
One-time cleanups. A 3-day FinOps sprint that drops the bill 20%, then nothing for a year. Without a loop, waste regrows.
Cost-cutting at the expense of velocity. "We can't ship that, it'll add cost" is the wrong default. Cost matters; velocity matters more. Optimize what's already shipped.
RI sprawl. Buying RIs for workloads that change every quarter. Use Savings Plans (more flexible) instead.
Chargeback before showback. Forces teams to play accounting games before they even understand their cost.
What's Next
- Best Practices — tagging policy, team structure, eng incentives, common pitfalls