Steven's Knowledge

Best Practices

Tuning false positives, runbook for active attacks, origin protection, compliance, common pitfalls, scaling

Best Practices

The operational realities of running WAF, DDoS, and bot management without losing legitimate customers.

Tune Before You Block

The cardinal rule: start in log/detection-only mode. Every WAF you adopt, every managed rule set, every custom rule — first runs as "would-have-blocked" not "blocks."

Process:

  1. Enable in Log/Count mode.
  2. Watch for 1-2 weeks during normal traffic.
  3. Identify false positives — review every "would have blocked" against legitimate user traffic.
  4. Add exceptions for the false positives.
  5. Switch to Block mode.
  6. Monitor — false positive reports from users come in for weeks; iterate.

Skipping step 1 is how you accidentally block your CEO from logging in on a Friday night.

Runbook: Active DDoS

When the alarm goes off:

# DDoS Response Runbook

## Detection signals
- Origin bandwidth saturated
- Edge provider notifies of ongoing attack
- Spike in 5xx errors with normal app metrics
- Sudden spike in unique IPs

## Step 1: Confirm and triage
- Check edge provider dashboard (Cloudflare Analytics, AWS Shield)
- Classify: L3/L4 volumetric? L7 application? Mixed?
- Is the origin affected (yes = current defenses inadequate)?

## Step 2: Mitigate
- L3/L4: edge provider should be absorbing — confirm scrubbing engaged
- L7: enable "I'm Under Attack" mode (Cloudflare) or tighten rate limits
- Geo-block if attack pattern is clearly from specific countries
- Engage provider's incident response (Shield Advanced, Cloudflare Enterprise SRT)

## Step 3: Communicate
- Status page update
- Internal Slack #incident channel
- If customer-impacting: customer-facing comm via support / status

## Step 4: Adjust + sustain
- Add custom rules targeting attack signature
- Increase capacity at origin (autoscale)
- Monitor for evolution of attack pattern

## Step 5: Post-attack
- Postmortem
- Were our defenses adequate?
- Add lasting rules for future similar attacks
- Update threat model

The first time you face an attack, you'll appreciate having pre-written this. The on-call's job is to follow steps, not invent procedures at 3 AM.

Origin Protection: Verify It

Test that attackers can't bypass your edge:

# What does the origin look like from outside?
dig +short app.your-domain.com
# Should resolve to Cloudflare/CloudFront IPs

# Try to find the real origin via certificate transparency logs
crt.sh -d your-domain.com  # check for leaked origin certs

# Try common patterns for origin hostnames
for sub in origin direct backend api-direct; do
  dig +short $sub.your-domain.com
done

# Scan for the origin IP in your AWS / GCP / Azure ranges
# (your security team likely has tooling for this)

Common leaks:

  • DNS records for origin.example.com resolve directly to the origin
  • Email server (MX records) on same IP as web origin
  • Old TLS certs from before edge protection include the origin's hostname
  • Subdomain takeovers expose origin

Audit quarterly. Origin protection bypassed is WAF defeated.

False Positive Management

Every WAF generates them. Process:

  • User reports: support ticket says "I can't submit the form." First check: WAF logs. Was the request blocked?
  • Block-rate alarm: sudden spike in blocks for a specific rule = either an attack or a new false positive
  • Per-rule monitoring: rules that block more than X% of traffic get reviewed weekly

Add allowlist exceptions surgically:

# Bad: too broad
Allow if user_agent contains "Mozilla"

# Better: scope to known good traffic
Allow if ip.src in $known_partner_ips AND http.request.uri.path eq "/api/webhook"

Document every exception with a ticket / rationale. Review quarterly — old exceptions get stale and become risks.

Compliance Considerations

  • PCI DSS: requires WAF or equivalent for in-scope environments
  • HIPAA: WAF logs (with PHI care) help audit
  • SOC 2: WAF + DDoS protection evidences operational security controls
  • GDPR: WAF logs often contain personal data; retention + access controls apply

Document:

  • WAF configuration in version control (Terraform / IaC)
  • Change management process
  • Incident response runbooks
  • Retention policy for WAF logs (often 1-7 years depending on regulator)
  • Data processing addendum (DPA) for the WAF provider if it processes user data

API Risk Management

APIs are an increasingly hot attack surface. OWASP API Security Top 10 (2023):

  1. Broken Object Level Authorization (BOLA)
  2. Broken Authentication
  3. Broken Object Property Level Authorization
  4. Unrestricted Resource Consumption
  5. Broken Function Level Authorization
  6. Unrestricted Access to Sensitive Business Flows
  7. SSRF
  8. Security Misconfiguration
  9. Improper Inventory Management
  10. Unsafe Consumption of APIs

WAFs help with #4 (rate limits), #7 (SSRF patterns), some of #8. The rest require code-level fixes. WAF + API gateway + secure code is the trio.

Specialized API security tools (Salt, Noname, 42Crunch) go deeper — schema validation, behavioral baselines per endpoint, anomaly detection on API usage.

Customer Impact Awareness

The WAF blocks things customers do. Discipline:

  • Notify customers if you tighten rules in a way that might affect them (e.g., new geo restrictions)
  • Public IP allowlist mechanism for B2B customers — they tell you their corporate egress, you allowlist
  • Self-service Captcha re-verification if their device fingerprint trips bot management
  • Support team has WAF visibility — when a customer reports issues, support can check WAF logs

The worst WAF outcome is a customer who can't reach your product and has no way to communicate that fact (because the support form is also blocked).

Cost Optimization

WAF / DDoS / bot management costs add up:

  • Free tier maximization: Cloudflare's free plan covers a lot; use it where appropriate
  • Per-request pricing awareness: AWS WAF charges per million requests + per rule. Optimize rule count.
  • Cache aggressively: every cached request is one not WAF-evaluated → cheaper
  • DDoS protection levels: Shield Advanced costs ~$3k/month — only for high-risk workloads
  • Bot management premium tiers: only on the routes that matter (auth, checkout, scraping targets)

Trade-off: a quieter origin (more aggressive WAF) costs less in origin compute but more in WAF. Find your equilibrium.

Scaling Considerations

As you grow:

  • Multi-account / multi-region WAFs: rule management gets harder; consolidate via Terraform
  • Per-tenant rules in multi-tenant SaaS: orchestrate carefully
  • Custom rules can hit limits: AWS WAF has rule capacity units (WCU) — complex rules consume more
  • Compliance scope can grow: more regions = more residency requirements
  • Bot management premium scales by request volume: budget accordingly

Common Pitfalls

Blocking without observing. Block mode without prior log analysis. Real users hit edge cases; you don't see them; complaints arrive instead.

Origin reachable directly. WAF lulls into false security; attacker bypasses.

Cache breakage from WAF rules. Custom rule on header X means cache varies by X. Cache hit rate crashes.

No alerting on rules. Rule blocks 100% of one IP for a week; that IP was your CEO at a conference. Alert on blocks for authenticated users.

Set-and-forget. WAF rules deployed in 2023 still running in 2026 unchanged. Threat landscape moves; review quarterly.

Trusting client-provided headers. Rules like if X-Forwarded-For is X, allow are trivially spoofable. Edge providers know the real client IP; trust their headers, not arbitrary ones.

Bot challenge in API. JS challenge breaks programmatic clients. Use API-aware patterns (JWT verification, API keys) instead of browser challenges for API traffic.

One-way edge. WAF rules in Git, deployment one-way (CI → WAF) but no observation of drift if someone changes via console. Reconciliation.

Forgetting non-HTTP: WAF doesn't help if you also expose SSH, FTP, SMB, etc. Those need their own protection.

Checklist

WAF / DDoS / Bot Management production readiness:

  • Edge provider (Cloudflare / AWS / Akamai) protecting all public web traffic
  • Origin IPs locked down to edge provider only (or Tunnel/PrivateLink)
  • WAF managed rule sets enabled
  • WAF custom rules in Git, deployed via IaC
  • Rate limiting on login and sensitive endpoints
  • Bot management with allowlist for verified good bots
  • Geo controls if business is region-specific
  • WAF logs ingested into SIEM / observability
  • Alerts on block rate spikes, blocked authenticated users, origin direct access
  • DDoS response runbook tested
  • Customer support has WAF visibility and exception process
  • Quarterly review of WAF rules, exceptions, false positives
  • Cost tracked; tier matches business risk
  • Origin protection verified quarterly (try to bypass)
  • CVE response process includes virtual patching via WAF
  • Multi-tenant rules / per-tenant limits if applicable

What's Next

You have edge security. Connect it to:

  • API Gateway — the WAF protects the gateway; the gateway handles auth and routing
  • CDN — same edge often delivers cache, WAF, DDoS
  • VPN & Zero Trust — Cloudflare Tunnel hides origin entirely
  • Monitoring — WAF metrics feed alerts and dashboards
  • Policy as Code — WAF rules as code, reviewed via PR
  • Identity — bot management complements user identity

On this page