Steven's Knowledge

Best Practices

Production CDN - multi-CDN, security (WAF, bot mitigation), observability, cost control, pitfalls

Best Practices

The CDN sits in the critical path of every external request. A CDN outage is a site outage. These patterns keep that risk small and the bill smaller.

Hash Your Asset Filenames

The single most impactful pattern. Filenames change when content changes:

/assets/main.a4f9e21c.css      # immutable; ship a year-long TTL
/assets/main.b3d72ed8.css      # next deploy

Set Cache-Control: public, max-age=31536000, immutable. No purging, ever — the URL of the new version is different, so it's a new cache entry.

Webpack, Vite, Next.js, Rails Sprockets — all do this out of the box. The only thing you must purge is the HTML that references these assets.

Origin Protection

Your origin should not be reachable directly from the internet. Otherwise:

  • Attackers bypass the CDN by hitting origin IPs, defeating WAF and DDoS protection.
  • A misconfigured DNS record exposes you.
  • Cost protection (origin egress fees) is undermined.

Lock it down:

StrategyHow
CDN-specific firewallCloudflare Tunnel (cloudflared) — origin has no public IP at all
IP allowlistOrigin firewall accepts only the CDN's published IP ranges
Shared secret headerOrigin requires X-CDN-Secret: ... that only the CDN sends
mTLS between CDN and originCDN presents a client cert origin trusts

At least one of these is non-negotiable in production.

WAF and Bot Mitigation

CDNs ship with WAF rule sets. Turn them on:

  • OWASP Core Ruleset — SQL injection, XSS, common exploits.
  • Provider managed rulesets — Cloudflare Managed Rules, Fastly Next-Gen WAF.
  • Rate limiting per IP — defaults are too generous; tune per endpoint.
  • Bot management — score requests; challenge or block suspicious ones.

Start in detect-only mode, review the dashboard for a week, then flip to enforce. Real users get caught by overzealous WAF rules; tune before enforcing.

DDoS Considerations

All major CDNs absorb L3/L4 DDoS as a baseline service. For L7 (HTTP) DDoS:

  • Cache as much as possible — a 200 GB/s attack on cached content is harmless.
  • Rate limiting on /login-style endpoints. The expensive ones.
  • Bot challenges for routes that shouldn't see bots.
  • Anomaly detection — sudden 100× traffic to one path is suspicious.

Test your protections occasionally with a controlled load test from outside your network.

Multi-CDN

When one CDN's outage is unacceptable:

PatternNotes
DNS-level failoverActive/passive; DNS provider health-checks; slow failover (TTL-bound)
DNS-level weightedActive/active; split traffic; managed via Route 53, NS1, Cloudflare Load Balancer
Header-drivenApp layer picks CDN per request (cookie/header) — complex

Multi-CDN doubles cost and operational complexity. Reserve for situations where seconds of downtime cost serious money. For most teams, one CDN with auto-failover (CDN to alternate origin) is enough.

Observability

SignalWhat to watch
Hit ratio (per route, overall)< 90% on public static content = misconfiguration
Origin bandwidthA spike when traffic is flat = cache miss event
5xx rate from origin (visible at CDN)Origin in trouble; stale-if-error saves you
5xx rate from CDNThe CDN itself is in trouble; consider failover
Latency p50/p99 per regionOne bad POP hurts users there
Purge frequencySpike = something is wrong with cache strategy

CDNs export logs and metrics; pipe them into your observability stack. Cloudflare Logpush, Fastly real-time logs, CloudFront access logs all dump to S3 / Splunk / a SaaS — see ELK.

Cost Control

CDN bills can surprise. Things to watch:

  • Bandwidth out — usually the biggest line item. Higher cache hit rate = lower bill.
  • Cache misses to expensive regions — Australia and South America egress to origin is pricey.
  • Image transformations — per-transformation cost; cache the transforms.
  • Edge functions — per-invocation cost; expensive on cache-miss-heavy traffic.
  • Free tier limits — Cloudflare's "free" tier has tight limits for paid features (Workers, Cache Reserve).

Best practices:

KnobEffect
Aggressive s-maxageOrigin bandwidth way down
Compress everything (Brotli)30%+ bandwidth reduction
Image optimizationOften 50-80% size reduction
Cache Reserve (Cloudflare) / Origin ShieldCache miss traffic to origin drops sharply
Block bots earlyDon't pay to serve them

Common Pitfalls

PitfallSymptomFix
Set-Cookie on a cacheable responseAll hits become DYNAMICMake response cookie-free or strip at edge
Vary: User-AgentHit rate near zeroNormalize on origin; vary on a small bucket
Caching auth-required URLsLogged-in users see other users' contentMark private; bypass cache for cookie present
Purge-everything on every deployCache constantly coldPurge specific URLs/tags; hash filenames
Long TTL with no purge planStale content for hoursEither purge on update or short TTL + stale-while-revalidate
Trusting CDN with secrets in URLLogged in URLs (/admin/...) hit CDNDifferent domain, or Cache-Control: private enforced
Mixing CDN IPs in WAF allowlistBlock CDN itselfUse the CDN's published IP-range JSON; rotate automatically

Region-Specific Considerations

  • China: most Western CDNs work poorly inside the GFW; specific China-tier services (Cloudflare China, Alibaba CDN, ChinaCache) needed. Requires an ICP license.
  • Russia: similar but with different sanctions/regulations to navigate.
  • South America / Africa: fewer POPs; prefer CDNs with explicit presence there.
  • Latency-sensitive APIs: measure real RUM data; the CDN that's fastest globally may not be fastest in your audience's region.

Testing CDN Changes

Three safety nets:

  1. Preview / staging environment with the same CDN config as production.
  2. Synthetic checks that hit a few representative URLs from multiple regions and alert on cache misses, slow loads, or content drift.
  3. Real user monitoring (RUM) — actual visitor latency tracked over time.

A common mistake: change cache rules on Friday, watch the bill on Monday.

Checklist

Production CDN checklist

  • Hashed filenames for all static assets; year-long immutable cache
  • HTML cached with short s-maxage + stale-while-revalidate
  • Origin not reachable from public internet (tunnel / IP allowlist / shared secret)
  • WAF / managed ruleset enabled (start in detect mode, then enforce)
  • Rate limiting on /login, /signup, password-reset endpoints
  • Image optimization on; appropriate <img srcset> markup
  • Surrogate keys / cache tags used for fine-grained purges
  • Cookie/query-string normalization rules in place
  • Cache hit ratio monitored; alert on sudden drops
  • Origin bandwidth alerted on (spike = cache breakage)
  • Logs streamed off-CDN to your observability stack
  • DDoS protection enabled at L3/L4; L7 rate limits configured
  • CDN failover path documented and tested
  • Compression (Brotli + gzip fallback) enabled
  • Negative caching for 404s on public paths

On this page