Showback/chargeback, rightsizing loops, commitment strategy, Spot/preemptible, anomaly detection, unit economics

Patterns

Patterns that move FinOps from "look at the bill" to "spend with intent."

Showback and Chargeback

Model	Definition	Effect
Showback	Each team sees their cost; no internal billing	Awareness; weak incentive
Chargeback	Each team's cost deducts from their budget	Strong incentive; can create friction
Hybrid	Showback by default; chargeback above a threshold	Most common in mid-size orgs

Start with showback. The first time a team sees "your service costs $40k/month" they'll find optimizations on their own. Chargeback creates real incentive but also creates gaming ("we'll just use the central platform team's resources") — only introduce when the org is mature enough.

Rightsizing Loops

Rightsizing isn't a one-time event; it's a loop:

   ┌─> Measure utilization (CPU, mem, IOPS, network)
   │       │
   │       v
   │   Identify oversized resources (utilization < 40%)
   │       │
   │       v
   │   Propose change (downsize / consolidate)
   │       │
   │       v
   │   Apply (off-peak, with rollback ready)
   │       │
   │       v
   └── Verify (no perf regression, no on-call wakeups)

For VMs: AWS Compute Optimizer, Azure Advisor, GCP Recommender. For K8s: Goldilocks (VPA recommendations) or Kubecost rightsizing.

Cadence: top 10 most expensive workloads, every quarter. Trying to rightsize everything burns engineer time on micro-savings.

Commitment Strategy

Cloud providers give big discounts for commitment:

AWS Construct	Discount	Commitment	Best for
Compute Savings Plan	up to 66%	1-3 years, any instance family/region	Steady compute baseline
EC2 Instance Savings Plan	up to 72%	1-3 years, specific family	Predictable workloads
Reserved Instances (RI)	up to 75%	1-3 years, specific config	Legacy; SP usually better
Spot Instances	up to 90%	Can be reclaimed in 2 min	Fault-tolerant workloads

A practical strategy:

Baseline coverage: Compute Savings Plan for ~70% of steady-state usage (1-year, no upfront).
Burst on-demand: Above baseline runs at full price; that's fine, it's bursty.
Spot for batch & stateless: Karpenter on AWS, Spot.io across clouds.
Re-evaluate quarterly as baseline shifts.

Don't over-commit. A 3-year RI for a service that gets deprecated in 18 months is worse than on-demand.

Spot / Preemptible Usage

Spot instances can be reclaimed at 2 minutes' notice but cost 60-90% less. Good targets:

Batch jobs (data pipelines, ML training, CI runners)
Stateless web tiers with replicas
Dev/staging environments
Anything that can checkpoint or retry

Bad targets:

Stateful single-instance DBs
Long-lived workloads that can't tolerate interruption
Anything where 2-min eviction breaks SLO

Tooling that makes Spot safe:

AWS Karpenter auto-replaces interrupted nodes within seconds
Spot.io orchestrates Spot + on-demand across clouds
Cluster Autoscaler with mixed instance policy falls back to on-demand on capacity loss

Anomaly Detection

You want to know when spend suddenly changes — not at month-end review.

AWS Cost Anomaly Detection (free) — ML-based; alerts via SNS
Vantage / CloudZero anomaly alerts — daily reports
Custom: Athena query on CUR; alert if any service's daily cost > 2× previous 7-day avg

WITH daily AS (
  SELECT
    date(line_item_usage_start_date) AS d,
    line_item_product_code AS svc,
    SUM(line_item_unblended_cost) AS cost
  FROM main_cur
  WHERE year = '2026'
  GROUP BY 1, 2
),
baseline AS (
  SELECT
    svc,
    AVG(cost) AS avg_cost
  FROM daily
  WHERE d BETWEEN date_add('day', -8, current_date) AND date_add('day', -1, current_date)
  GROUP BY svc
)
SELECT d.svc, d.cost, b.avg_cost, d.cost / NULLIF(b.avg_cost, 0) AS ratio
FROM daily d JOIN baseline b USING (svc)
WHERE d.d = current_date - INTERVAL '1' DAY
  AND d.cost > 100
  AND d.cost > 2 * b.avg_cost
ORDER BY ratio DESC;

Common anomaly causes: a test left running, a recursive Lambda, a CloudWatch metric stream sending to S3 in a loop, a new feature ingesting more data than expected.

Unit Economics

The most strategic FinOps practice. Pick metrics that connect cost to business value:

Business metric	Cost ratio
Request	$ per million requests
Customer	$ per active user per month
Transaction	$ per checkout / per order
Build	$ per CI build
Tenant (B2B SaaS)	$ per customer per month, by plan tier

Build a dashboard with cost ÷ business metric over time:

$ per active user per month
0.40 ┤                                  ╭─── alert!
0.30 ┤                          ╭───────╯
0.20 ┤──────────────────────────╯
0.10 ┤
     └─────────────────────────────────────────
       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug

If the ratio is falling: scaling efficiently. If rising: investigate before total cost gets dramatic.

Architecture-Level Patterns

Some savings only come from changing the architecture:

Pattern	Savings
Reduce egress: cache in CDN; keep traffic in-region	Egress is often 5-15% of bill
S3 → S3 Intelligent-Tiering	Auto-moves cold data to IA; saves 40-95% on storage
Multi-AZ only where needed: dev doesn't need it	Cross-AZ traffic at $0.01/GB adds up
Lambda for spiky workloads, ECS/EKS for steady	Avoid Lambda for 24/7 work; avoid containers for 1% duty cycle
Replace NAT Gateway with VPC endpoints for S3/DynamoDB	NAT is $0.045/hr + per-GB; endpoints are flat or free
Aurora Serverless for spiky DB workloads	Pay per ACU-hour vs. 24/7 provisioned
Move logs to Loki / OpenSearch from CloudWatch	CloudWatch Logs at $0.50/GB ingest is brutal at scale

Pick one per quarter. Don't try them all.

FinOps in CI/CD

Cost as a deploy-time concern:

Infracost in Terraform PRs: "this PR adds $342/month"
Cost guardrails: block PRs that exceed N% cost increase without approval
Pre-merge K8s rightsizing checks: warn if requested resources >> historical usage

# Infracost in GitHub Actions
- name: Run Infracost
  uses: infracost/actions/setup@v3
  with:
    api-key: ${{ secrets.INFRACOST_API_KEY }}
- run: infracost diff --path . --format json --out-file /tmp/infracost.json
- run: infracost comment github --path /tmp/infracost.json \
    --repo $GITHUB_REPOSITORY --pull-request ${{ github.event.pull_request.number }} \
    --github-token ${{ github.token }}

A reviewer who sees "this PR adds $1.2k/month" makes a different decision than one who only sees the diff.

Anti-Patterns

Optimizing what's easy to measure. Egress is easy to see and easy to obsess over; meanwhile the $200k/month idle Redshift cluster sits ignored. Always sort by total cost.

One-time cleanups. A 3-day FinOps sprint that drops the bill 20%, then nothing for a year. Without a loop, waste regrows.

Cost-cutting at the expense of velocity. "We can't ship that, it'll add cost" is the wrong default. Cost matters; velocity matters more. Optimize what's already shipped.

RI sprawl. Buying RIs for workloads that change every quarter. Use Savings Plans (more flexible) instead.

Chargeback before showback. Forces teams to play accounting games before they even understand their cost.

What's Next

Best Practices — tagging policy, team structure, eng incentives, common pitfalls

Patterns

On this page