Steven's Knowledge

Patterns

Rule tuning, SIEM integration, MITRE ATT&CK mapping, blocking with Tetragon, anomaly detection, multi-cluster

Patterns

The patterns that turn a runtime security install into a useful signal.

Rule Tuning

Default rulesets generate noise. Tune for your environment:

  • Identify "normal" exec activity: which containers legitimately run shells? CI/debug pods often do.
  • Tag known-good IPs and domains: monitoring systems, partner APIs, public CDNs.
  • Add image allowlists: production images shouldn't run /usr/bin/python ad-hoc; tooling images may.
  • Per-namespace rule subsets: production strict; sandbox permissive.
# Falco: override a macro to add exceptions
- macro: trusted_containers
  condition: >
    (container.image.repository in ("our-registry/internal/debug-tools",
                                    "our-registry/internal/sre-tooling"))

The goal isn't zero alerts — it's "every alert is worth investigating." A 100-alert-per-day Falco that nobody reads has negative value.

Severity-Driven Routing

Don't send all events to one channel:

# falcosidekick rules
INFO/DEBUG    → archive (Loki / S3)
NOTICE        → Slack #security-noise channel (review weekly)
WARNING       → Slack #security-alerts (review within hours)
ERROR         → PagerDuty / Opsgenie (page on-call)
CRITICAL      → PagerDuty + incident response Slack room (mobilize)

Each severity tier has a defined response. The on-call doesn't see WARNING noise; the security analyst doesn't get paged for NOTICE.

SIEM and Forensics

Falco/Tetragon are detection — the events should go into a system where you can search and correlate:

  • OpenSearch / Elasticsearch (with Wazuh, OpenSearch Security Analytics): events queryable; alerts derived from queries
  • Splunk: enterprise SIEM with extensive runtime container detection content
  • Sumo Logic, Datadog Cloud SIEM, Microsoft Sentinel: SaaS options
  • Self-hosted with ClickHouse + Grafana: cost-effective for high-volume

The flow:

Falco / Tetragon → falcosidekick → Kafka → Vector → SIEM
                                                  → S3 archive (long-term)

Connect to Observability Pipelines. The same pipeline that handles logs handles security events — different routes, different retention.

Long-term archive matters: when you find an incident from 6 months ago, you need the events from then. Compliance often mandates 6-12 month retention; security investigation may need years.

MITRE ATT&CK Mapping

The MITRE ATT&CK framework categorizes attacker behaviors. Map your detections:

TacticFalco/Tetragon rules
Initial AccessWeb shell deployed
ExecutionShell in container, scripting language abuse
PersistenceNew cron entry, modified binary, kernel module load
Privilege Escalationcap_set use, setuid binary execution
Defense EvasionDisable security tools, indicator removal
Credential AccessRead /etc/shadow, service account token access
DiscoveryNetwork scanning, cluster API exploration
Lateral MovementSSH from container, K8s API calls from non-admin pods
CollectionMass file reads, database dump commands
Command and ControlOutbound to unknown IPs, DNS tunneling
ExfiltrationLarge egress, unusual destination
ImpactCryptomining processes, ransomware-like patterns

Falco rules carry tags: [..., mitre_credential_access, T1552]. Searching SIEM by MITRE technique lets you ask "show me all credential-access events" across the fleet.

Tetragon Prevention Patterns

Tetragon (and KubeArmor, Cilium with TC) can block, not just detect. Patterns:

Block dangerous binaries in production

apiVersion: cilium.io/v1alpha1
kind: TracingPolicyNamespaced
metadata: { name: block-dangerous, namespace: production }
spec:
  kprobes:
    - call: "sys_execve"
      syscall: true
      args: [{ index: 0, type: "string" }]
      selectors:
        - matchArgs:
            - { index: 0, operator: "In", values: ["/usr/bin/nc", "/usr/bin/socat", "/bin/sh"] }
          matchActions:
            - { action: Sigkill }

Block egress from non-system pods

selectors:
  - matchArgs:
      - { index: 0, operator: "NotIn", values: ["10.0.0.0/8"] }
    matchPodSelector:
      - matchExpressions:
          - { key: app.kubernetes.io/component, operator: "NotIn", values: ["proxy", "egress-gateway"] }
    matchActions:
      - { action: Sigkill }

Allow-listed file paths

Block reads of sensitive files from any container that's not authorized to read them.

Warning: prevention can break production. Always test in detect-only mode first; only after weeks of clean signal, enable prevention.

Behavioral Detection / Anomaly

Static rules catch known patterns. Behavioral detection notes "this container normally doesn't do X" — and alerts when X happens.

Approaches:

  • Baseline + deviation: record each workload's normal syscalls/processes/network in a learning period; alert on deviations
  • Profile per image: trust profiles ship with images; runtime enforces
  • ML-based anomaly detection: Sysdig, Aqua, commercial offerings

OSS-side: Cilium Network Observability (Hubble) + Tetragon with custom rules; Tracee with behavioral signatures.

Behavioral detection trades false positives for catching unknown patterns. A combination — broad static rules + targeted behavioral checks on high-value workloads — works in practice.

Cloud-Native Specifics

K8s introduces specific patterns:

  • Service account abuse: a pod uses its mounted SA token to call kubernetes.default.svc when not expected
  • Privileged container: a pod with privileged: true is high-trust; runtime detection alerts on dangerous behavior
  • Cross-namespace lateral: a pod in webapp namespace connecting to kube-system services
  • Container escape: writing to /proc/sys/, mounting host paths, exploiting kernel CVEs

Modern rulesets cover these. The K8s-aware detection in Tetragon and commercial tools (Sysdig, Aqua) make the alerts richer: "pod=checkout-7f8b9c, ServiceAccount=default, image=checkout:v1.2.3, namespace=production."

Multi-Cluster

For organizations with many clusters:

  • Centralized SIEM: every cluster's runtime events flow to one place
  • Cross-cluster correlation: an attacker probing one cluster, then another, should be visible
  • Per-cluster rule policies: dev clusters more permissive; prod stricter
  • Centralized rule deployment: GitOps manages rulesets across clusters

Architecture:

Cluster A → Falco → falcosidekick → Kafka ┐
Cluster B → Falco → falcosidekick → Kafka ├→ Central SIEM
Cluster C → Falco → falcosidekick → Kafka ┘

Container Image Drift

A useful runtime check: did this container's processes diverge from what its image contained? If a base image has only /usr/bin/checkout, but at runtime a bash appears, something changed. In-memory injection is one cause.

Tools: Tracee's behavioral signatures, Sysdig's image-process drift detection, custom rules.

Compliance Use Cases

Specific frameworks driving specific patterns:

  • PCI DSS 11.5.1: file integrity monitoring (FIM) — alert on changes to critical files; runtime catches the change in real time, not on next scheduled scan.
  • SOC 2 CC7.2: monitoring of system operations.
  • HIPAA 164.308(a)(1)(ii)(D): audit controls and ongoing monitoring of access to PHI.
  • NIST 800-171/53: continuous monitoring; runtime detection is one component.

Connect to Policy as Code — admission policies enforce before deploy; runtime detects after. Combined, they cover the lifecycle.

Anti-Patterns

Alerts to email: a folder no one reads. Use SIEM + on-call escalation by severity.

No tuning: out-of-the-box rules generate too much noise; team learns to ignore Falco. Tune in the first month; gain trust.

No baseline: you can't tell "spike" from "normal" without a baseline. Capture two weeks of clean signal before tightening thresholds.

Prevention without testing: enabling Tetragon Sigkill on a rule that fires sometimes legitimately. Now production breaks unpredictably. Detection for weeks first.

Single rule, all-or-nothing: a noisy rule is disabled completely, losing useful signal. Carve out specific exceptions; keep the rule for unknown cases.

Runtime security as the only layer: ignoring supply chain, admission, network. It's the last layer, not the only.

Logs without retention: events from 6 months ago when you need them for the incident reconstruction… gone. Long-term archive.

What's Next

  • Best Practices — alert fatigue, response runbook, compliance, performance, pitfalls

On this page