Best Practices

Tuning false positives, runbook for active attacks, origin protection, compliance, common pitfalls, scaling

Best Practices

The operational realities of running WAF, DDoS, and bot management without losing legitimate customers.

Tune Before You Block

The cardinal rule: start in log/detection-only mode. Every WAF you adopt, every managed rule set, every custom rule — first runs as "would-have-blocked" not "blocks."

Process:

Enable in Log/Count mode.
Watch for 1-2 weeks during normal traffic.
Identify false positives — review every "would have blocked" against legitimate user traffic.
Add exceptions for the false positives.
Switch to Block mode.
Monitor — false positive reports from users come in for weeks; iterate.

Skipping step 1 is how you accidentally block your CEO from logging in on a Friday night.

Runbook: Active DDoS

When the alarm goes off:

# DDoS Response Runbook

## Detection signals
- Origin bandwidth saturated
- Edge provider notifies of ongoing attack
- Spike in 5xx errors with normal app metrics
- Sudden spike in unique IPs

## Step 1: Confirm and triage
- Check edge provider dashboard (Cloudflare Analytics, AWS Shield)
- Classify: L3/L4 volumetric? L7 application? Mixed?
- Is the origin affected (yes = current defenses inadequate)?

## Step 2: Mitigate
- L3/L4: edge provider should be absorbing — confirm scrubbing engaged
- L7: enable "I'm Under Attack" mode (Cloudflare) or tighten rate limits
- Geo-block if attack pattern is clearly from specific countries
- Engage provider's incident response (Shield Advanced, Cloudflare Enterprise SRT)

## Step 3: Communicate
- Status page update
- Internal Slack #incident channel
- If customer-impacting: customer-facing comm via support / status

## Step 4: Adjust + sustain
- Add custom rules targeting attack signature
- Increase capacity at origin (autoscale)
- Monitor for evolution of attack pattern

## Step 5: Post-attack
- Postmortem
- Were our defenses adequate?
- Add lasting rules for future similar attacks
- Update threat model

The first time you face an attack, you'll appreciate having pre-written this. The on-call's job is to follow steps, not invent procedures at 3 AM.

Origin Protection: Verify It

Test that attackers can't bypass your edge:

# What does the origin look like from outside?
dig +short app.your-domain.com
# Should resolve to Cloudflare/CloudFront IPs

# Try to find the real origin via certificate transparency logs
crt.sh -d your-domain.com  # check for leaked origin certs

# Try common patterns for origin hostnames
for sub in origin direct backend api-direct; do
  dig +short $sub.your-domain.com
done

# Scan for the origin IP in your AWS / GCP / Azure ranges
# (your security team likely has tooling for this)

Common leaks:

DNS records for origin.example.com resolve directly to the origin
Email server (MX records) on same IP as web origin
Old TLS certs from before edge protection include the origin's hostname
Subdomain takeovers expose origin

Audit quarterly. Origin protection bypassed is WAF defeated.

False Positive Management

Every WAF generates them. Process:

User reports: support ticket says "I can't submit the form." First check: WAF logs. Was the request blocked?
Block-rate alarm: sudden spike in blocks for a specific rule = either an attack or a new false positive
Per-rule monitoring: rules that block more than X% of traffic get reviewed weekly

Add allowlist exceptions surgically:

# Bad: too broad
Allow if user_agent contains "Mozilla"

# Better: scope to known good traffic
Allow if ip.src in $known_partner_ips AND http.request.uri.path eq "/api/webhook"

Document every exception with a ticket / rationale. Review quarterly — old exceptions get stale and become risks.

Compliance Considerations

PCI DSS: requires WAF or equivalent for in-scope environments
HIPAA: WAF logs (with PHI care) help audit
SOC 2: WAF + DDoS protection evidences operational security controls
GDPR: WAF logs often contain personal data; retention + access controls apply

Document:

WAF configuration in version control (Terraform / IaC)
Change management process
Incident response runbooks
Retention policy for WAF logs (often 1-7 years depending on regulator)
Data processing addendum (DPA) for the WAF provider if it processes user data

API Risk Management

APIs are an increasingly hot attack surface. OWASP API Security Top 10 (2023):

Broken Object Level Authorization (BOLA)
Broken Authentication
Broken Object Property Level Authorization
Unrestricted Resource Consumption
Broken Function Level Authorization
Unrestricted Access to Sensitive Business Flows
SSRF
Security Misconfiguration
Improper Inventory Management
Unsafe Consumption of APIs

WAFs help with #4 (rate limits), #7 (SSRF patterns), some of #8. The rest require code-level fixes. WAF + API gateway + secure code is the trio.

Specialized API security tools (Salt, Noname, 42Crunch) go deeper — schema validation, behavioral baselines per endpoint, anomaly detection on API usage.

Customer Impact Awareness

The WAF blocks things customers do. Discipline:

Notify customers if you tighten rules in a way that might affect them (e.g., new geo restrictions)
Public IP allowlist mechanism for B2B customers — they tell you their corporate egress, you allowlist
Self-service Captcha re-verification if their device fingerprint trips bot management
Support team has WAF visibility — when a customer reports issues, support can check WAF logs

The worst WAF outcome is a customer who can't reach your product and has no way to communicate that fact (because the support form is also blocked).

Cost Optimization

WAF / DDoS / bot management costs add up:

Free tier maximization: Cloudflare's free plan covers a lot; use it where appropriate
Per-request pricing awareness: AWS WAF charges per million requests + per rule. Optimize rule count.
Cache aggressively: every cached request is one not WAF-evaluated → cheaper
DDoS protection levels: Shield Advanced costs ~$3k/month — only for high-risk workloads
Bot management premium tiers: only on the routes that matter (auth, checkout, scraping targets)

Trade-off: a quieter origin (more aggressive WAF) costs less in origin compute but more in WAF. Find your equilibrium.

Scaling Considerations

As you grow:

Multi-account / multi-region WAFs: rule management gets harder; consolidate via Terraform
Per-tenant rules in multi-tenant SaaS: orchestrate carefully
Custom rules can hit limits: AWS WAF has rule capacity units (WCU) — complex rules consume more
Compliance scope can grow: more regions = more residency requirements
Bot management premium scales by request volume: budget accordingly

Common Pitfalls

Blocking without observing. Block mode without prior log analysis. Real users hit edge cases; you don't see them; complaints arrive instead.

Origin reachable directly. WAF lulls into false security; attacker bypasses.

Cache breakage from WAF rules. Custom rule on header X means cache varies by X. Cache hit rate crashes.

No alerting on rules. Rule blocks 100% of one IP for a week; that IP was your CEO at a conference. Alert on blocks for authenticated users.

Set-and-forget. WAF rules deployed in 2023 still running in 2026 unchanged. Threat landscape moves; review quarterly.

Trusting client-provided headers. Rules like if X-Forwarded-For is X, allow are trivially spoofable. Edge providers know the real client IP; trust their headers, not arbitrary ones.

Bot challenge in API. JS challenge breaks programmatic clients. Use API-aware patterns (JWT verification, API keys) instead of browser challenges for API traffic.

One-way edge. WAF rules in Git, deployment one-way (CI → WAF) but no observation of drift if someone changes via console. Reconciliation.

Forgetting non-HTTP: WAF doesn't help if you also expose SSH, FTP, SMB, etc. Those need their own protection.

Checklist

What's Next

You have edge security. Connect it to:

API Gateway — the WAF protects the gateway; the gateway handles auth and routing
CDN — same edge often delivers cache, WAF, DDoS
VPN & Zero Trust — Cloudflare Tunnel hides origin entirely
Monitoring — WAF metrics feed alerts and dashboards
Policy as Code — WAF rules as code, reviewed via PR
Identity — bot management complements user identity

Best Practices

On this page