Patterns
Policy structure, bundles, testing, mutation, exceptions, gradual rollout, library structure, cross-cutting policies
Patterns
The patterns that turn ad-hoc rules into a maintainable policy system.
Policy as a Function
A policy is a pure function: input → decision. Treat it like code:
- Single responsibility per rule. "Pods must not run as root" — not "Pods must not run as root AND must have labels AND must use approved images." Three rules, three reasons.
- Tested like code. Every rule has positive (should deny) and negative (should allow) test cases.
- Reviewed like code. PRs into the policy repo go through code review.
- Released like code. Don't push to prod directly; stage and observe first.
Bundles and Discovery
OPA loads policies from "bundles" — signed tarballs served over HTTP. The policy file lives in Git; CI builds the bundle; OPA pulls.
policy-repo/
├── kubernetes/
│ ├── required_labels.rego
│ ├── no_privileged.rego
│ └── ...
├── terraform/
│ └── s3_no_public.rego
└── tests/
└── ...CI:
opa test policy-repo/ # all tests pass
opa build -b policy-repo/ -o bundle.tar.gz
aws s3 cp bundle.tar.gz s3://policy-bundles/v1.2.3/OPA / Gatekeeper / clients pull s3://policy-bundles/latest/ periodically. Updating policy = updating Git = updating the bundle.
Testing Policies
Two layers:
-
Unit tests:
opa testruns Rego test rules. Fast, hermetic.test_root_user_denied { deny[_] with input as {"request": {"object": {"spec": {"containers": [{"securityContext": {"runAsUser": 0}}]}}}} } test_non_root_user_allowed { count(deny) == 0 with input as {"request": {"object": {"spec": {"containers": [{"securityContext": {"runAsUser": 1000}}]}}}} } -
Integration tests: spin up Gatekeeper in CI, apply known-bad and known-good resources, assert that bad ones fail and good ones succeed.
kubectl apply -f testdata/bad-pod.yaml && exit 1 # should fail kubectl apply -f testdata/good-pod.yaml || exit 1 # should succeed
Coverage matters: every policy needs both positive and negative tests. Missing the negative case is how you accidentally block legitimate workloads.
Mutation Before Validation
When you can fix it automatically, do — don't make engineers fix things you could fix yourself:
- Default
runAsNonRoot: trueon pods that don't specify it - Inject
teamlabel based on the namespace - Add resource requests/limits at sensible defaults
- Strip privileged flags silently
A common stack: Kyverno mutates first (add labels, defaults), then Gatekeeper validates the now-augmented resource. Engineers experience fewer "your pod is missing field X" rejections.
Exceptions and Phased Rollout
Real policies have legitimate exceptions. Some patterns:
Allow-list namespaces
spec:
match:
excludedNamespaces: [kube-system, istio-system]System namespaces often need things user namespaces shouldn't.
Labels-as-opt-out
Resource carries policy.example.com/skip: high-priority. Policy reads the label, skips if a real reason is given. Audit log captures which workloads opted out.
deny[msg] {
not input.request.object.metadata.labels["policy.example.com/skip-root-check"]
container := input.request.object.spec.containers[_]
container.securityContext.runAsUser == 0
msg := "Container runs as root and has no exemption"
}Warn mode → Enforce mode
Gatekeeper has enforcementAction: warn (just log) and enforcementAction: deny (block):
spec:
enforcementAction: warn # for 2 weeks
match: ...Roll out as warn, watch the audit log for legitimate hits, then switch to deny. Avoids the "Friday afternoon outage because a policy fired in prod for the first time" pattern.
Time-boxed exceptions
Encode the expiration in the label: policy.example.com/skip-until: 2026-06-01. Policy denies if now() > skip_until. Auto-renews into enforcement.
Library Structure
For policy repos that grow past ~20 files:
policies/
├── lib/
│ ├── kubernetes/
│ │ ├── pods.rego # helpers: containers(), is_root(), etc.
│ │ └── images.rego # helpers: registry_of(), tag_of()
│ └── common/
│ └── labels.rego
├── rules/
│ ├── pod_security.rego # uses lib/kubernetes/pods.rego
│ ├── image_registries.rego # uses lib/kubernetes/images.rego
│ └── required_labels.rego
├── tests/
│ └── (mirrors rules/)
└── policy.yaml # bundle manifestDRY the predicates; keep rules thin. The same is_root() helper is referenced from multiple deny rules.
Cross-Cutting Policies (OPA's Big Win)
One Rego policy can run in many contexts:
- Kubernetes admission via Gatekeeper / OPA sidecar
- Terraform CI via conftest
- API authorization via OPA sidecar called per request
- Image admission via OPA + image scanner integration
# Same "no_public_buckets" policy
package public_buckets
deny[msg] {
# K8s: check Service of type=LoadBalancer
input.kind == "Service"
input.spec.type == "LoadBalancer"
not input.metadata.annotations["allow-public"]
msg := "Public LoadBalancer requires explicit allow-public annotation"
}
deny[msg] {
# Terraform: check S3 bucket
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("S3 bucket %v has public ACL", [resource.address])
}One concept ("nothing public without explicit approval"), many enforcement points.
Policy as Data
For lookup-heavy rules, separate policy (the logic) from data (the table):
# rules/allowed_images.rego
package allowed_images
deny[msg] {
container := input.request.object.spec.containers[_]
registry := split(container.image, "/")[0]
not data.allowed_registries[registry]
msg := sprintf("Image %v from disallowed registry", [container.image])
}// data/allowed_registries.json
{
"allowed_registries": {
"gcr.io": true,
"registry.company.com": true,
"quay.io": true
}
}Adding a registry is a one-line PR with a known-safe shape. The logic stays stable; the data evolves.
Mutating Webhook Order Matters
In Kubernetes, multiple mutating webhooks can fire. Order matters:
- Istio injects its sidecar
- Linkerd adds proxy injection annotations
- Your policy mutator adds labels
- Network policy controller adds default-deny
If two mutators conflict, last-writer-wins. Use the reinvocationPolicy: IfNeeded on validating webhooks to re-evaluate after mutation completes.
Audit Mode for Discovery
Before writing a policy, run it in audit-only mode and see what would fire. Gatekeeper has audit mode that scans existing cluster state:
kubectl get constraints -A
# Shows current violations against each constraint, without blockingThis is how you find out what's already wrong before writing a policy that blocks new things. Often the answer is "fix what's there first, then enforce."
Composing Policies with Different Tools
A practical multi-layer stack:
| Layer | Tool | Why |
|---|---|---|
| Pre-merge (Terraform) | conftest in CI | Catch before resources exist |
| K8s admission (block) | Gatekeeper / Kyverno | Stop bad workloads at the gate |
| K8s admission (defaults) | Kyverno mutate | Make right-by-default easy |
| Runtime (cloud config) | Cloud Custodian / AWS Config | Catch drift in cloud resources |
| API authorization | OPA / Cedar / OpenFGA | Per-request access checks |
| Image signature | Cosign + policy | Only signed images run |
Each layer catches what the previous missed. Defense in depth.
Anti-Patterns
Policies as wikis. "Here's a YAML for a NetworkPolicy you should apply." Engineers won't. Make it a Gatekeeper constraint or a Kyverno auto-generate.
Block-only. If every policy is "deny," your engineers experience PaC as friction. Mix in mutation and warnings for soft enforcement.
Untested policies in prod. The first time the rule fires is the first time you find the false positive. Test in CI, run in warn mode in staging, then enforce.
Cross-cutting policies in vendor-specific tools. If you use Sentinel for Terraform and OPA for K8s and a custom CI script for images, the same rule lives in three places. Pick one (usually OPA) and standardize.
Policy author ≠ subject expert. A central security team writing policies for product teams without consulting them produces theoretical rules that don't fit real workflows. Co-write policies with the people who'll comply with them.
What's Next
- Best Practices — lifecycle, performance, debugging, compliance, pitfalls