OPA and Gatekeeper
“https://www.openpolicyagent.org/ | https://open-policy-agent.github.io/gatekeeper/”
OPA (Open Policy Agent) is a policy engine that decouples policy from code. It evaluates policies written in Rego (a DSL) against inputs (typically JSON), and returns decisions. Gatekeeper is the k8s-specific implementation of OPA — it runs as a validating (and optionally mutating) admission webhook, evaluates Rego policies against k8s objects, and rejects (or warns about) non-compliant objects. OPA / Gatekeeper is one of the two major policy engines in the k8s ecosystem (the other being Kyverno).
Table of Contents
- The Policy Engine Concept
- OPA Architecture
- Rego — the Policy Language
- Gatekeeper Components
- The Constraint Template Pattern
- A Working Example
- The Audit Mode
- Mutating with Gatekeeper
- OPA Outside Gatekeeper
- The Rego vs CEL Decision
- Performance and Caching
- Common Policy Patterns
- Operations and Debugging
- Gotchas and Common Mistakes
1. The Policy Engine Concept
A policy engine is a system that takes inputs and returns decisions:
Input (JSON) Policy (Rego) Output (Decision)
+-----------------+ +----------------+ +----------------+
| Pod spec | | "all images | | allowed: true |
| (kind, name, |→ | must come |→ | OR |
| namespace, | | from ECR" | | allowed: false |
| spec, ...) | | | | |
+-----------------+ +----------------+ +----------------+
In k8s, the input is a k8s object (Pod, Deployment, etc.), the policy is “must come from approved registries”, and the output is “this Pod is allowed” or “this Pod is denied”.
1.1 Why decouple policy from code
If policy is in the application code:
- Every change requires a code change.
- The team that owns the app is the only one that can review / change.
- Policy is per-app, not cluster-wide.
If policy is in a policy engine:
- Policy is declarative, not imperative.
- Multiple apps can share the same policy.
- The platform team can own policy without owning the apps.
OPA’s design: policy is data, not code. Rego is the language; OPA is the evaluator. You can ship the same Rego package to multiple enforcement points (k8s admission, API gateway, CI checks, etc.).
2. OPA Architecture
OPA decision:
1. Load Rego policy
2. Load input (JSON)
3. Evaluate
4. Return decision
OPA is stateless. It doesn’t have a database. It just evaluates Rego against the input. The “data” in OPA is the input + any data the policy imports (loaded from files, APIs, etc.).
The “decision” is whatever the policy returns. For Gatekeeper, it’s allowed: true | false.
2.1 OPA vs Gatekeeper
- OPA — the engine. Generic. Can be embedded in any system. Has its own HTTP API.
- Gatekeeper — the k8s implementation. Runs as a webhook. Uses OPA under the hood (or a recent fork, see Conftest below).
Gatekeeper is the deployment; OPA is the engine.
2.2 The Conftest tool
Conftest is a CLI that runs OPA against config files. Useful for CI:
# install
brew install conftest
# run against a manifest
conftest test deployment.yaml
# applies the policy in /policy/*.regoConftest is the “shift-left” version of Gatekeeper. Same Rego, but for static files in CI.
3. Rego — the Policy Language
Rego is a declarative logic language. It’s not imperative (no if/else); it’s a set of rules that produce values.
3.1 A simple rule
package kubernetes.admission
# Rule: deny if the image doesn't come from the approved registry
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not startswith(container.image, "gcr.io/my-project/")
msg := sprintf("image '%v' is not from the approved registry", [container.image])
}The structure:
package— the namespace for the rule.deny[msg]— a set of denial messages. If the set is non-empty, the request is denied.input.request.kind.kind == "Pod"— an expression. If true, the rule continues.container := input.request.object.spec.containers[_]— a variable. The_is a wildcard for array index.not startswith(...)— negation. The rule applies if the image is NOT in the approved registry.msg := ...— set the message.
The rule produces a denial message for every container with a bad image. If the set is empty, the request is allowed.
3.2 The input format
The input to OPA is a JSON object. For Gatekeeper, it’s an AdmissionReview request:
{
"request": {
"uid": "...",
"kind": {"group": "", "version": "v1", "kind": "Pod"},
"operation": "CREATE",
"object": {... full Pod ...},
"userInfo": {...}
}
}A Rego rule accesses input.request.object.spec.containers to get the Pod’s containers.
3.3 Common Rego patterns
Iterate over a list:
deny[msg] {
input.request.object.spec.containers[i]
...
}The [i] binds i to each index. You can also use [_] for “any element” (no binding).
Multiple conditions:
deny[msg] {
input.request.kind.kind == "Pod"
input.request.object.spec.containers[_]
# multiple conditions separated by newlines (implicit AND)
}Check existence:
deny[msg] {
not input.request.object.metadata.labels.app
msg := "all Pods must have the 'app' label"
}Combine with OR:
deny[msg] {
input.request.object.spec.containers[_].securityContext.privileged == true
msg := "privileged containers are not allowed"
}
# OR: separate deny rule
deny[msg] {
input.request.object.spec.hostNetwork == true
msg := "hostNetwork is not allowed"
}The deny set is the union of all deny rules. If any rule produces a message, the request is denied.
3.4 The data builtin
OPA can load external data via the data builtin. For example:
deny[msg] {
container := input.request.object.spec.containers[_]
not data.approved_images[container.image]
msg := sprintf("image %v is not in the approved list", [container.image])
}The data.approved_images is loaded from outside OPA — a JSON file, a webhook, etc.
In Gatekeeper, this is the “external data” provider pattern. The Gatekeeper calls a sidecar to get data for the policy.
4. Gatekeeper Components
Gatekeeper is composed of:
gatekeeper-controller-manager— the control plane. Watches for ConstraintTemplates and Constraints, configures the webhook, audits existing objects.gatekeeper-audit(in v3.7+) — runs periodically to check existing objects against the policies. Reports violations.- The admission webhook — called by the apiserver for every request, evaluates policies.
- Mutating webhook (in v3.7+) — can mutate objects (e.g. add labels).
- ConstraintTemplate — a CRD that defines a parameterized policy.
- Constraint — an instance of a ConstraintTemplate with specific values.
The architecture:
apiserver
│
│ AdmissionReview
│
gatekeeper-controller-manager (validating webhook)
│
├── load templates, constraints, syncs
│
├── evaluate policy (Rego)
│
└── return allowed / denied
5. The Constraint Template Pattern
A ConstraintTemplate is a CRD that defines a parameterized Rego policy. The “constraint” is an instance of the template with specific values.
5.1 A ConstraintTemplate
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sapprovedregistry
spec:
crd:
spec:
names: { kind: K8sApprovedRegistry }
validation:
openAPIV3Schema:
properties:
registries:
type: array
items: { type: string }
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sapprovedregistry
violation[{"msg": msg, "details": {}}] {
container := input.review.object.spec.containers[_]
not startswith(container.image, input.parameters.registries[_])
msg := sprintf("image '%v' is not from an approved registry", [container.image])
}The template:
- Defines a CRD
K8sApprovedRegistrywith aregistriesparameter. - The Rego policy uses
input.parameters.registries(the values from the constraint).
5.2 A Constraint (instance)
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sApprovedRegistry
metadata: { name: must-come-from-ecr }
spec:
match:
kinds: [{ apiGroups: [""], kinds: ["Pod"] }]
namespaces: ["prod", "staging"] # optional
parameters:
registries:
- "123456789.dkr.ecr.us-east-1.amazonaws.com/"
- "gcr.io/my-project/"The constraint:
- Matches Pods in
prodandstaging. - The
parameters.registriesis passed to the Rego asinput.parameters.registries.
Gatekeeper combines the template + constraint to produce the final Rego. The result is the policy that’s evaluated.
6. A Working Example
A complete policy: “all Pods must have a team label, and the value must be one of frontend, backend, or platform.”
6.1 The template
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata: { name: k8srequiredlabels }
spec:
crd:
spec:
names: { kind: K8sRequiredLabels }
validation:
openAPIV3Schema:
properties:
labels:
type: array
items: { type: object }
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg, "details": {}}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_].key}
missing := required - provided
count(missing) > 0
msg := sprintf("missing labels: %v", [missing])
}6.2 The constraint
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata: { name: must-have-team-label }
spec:
match:
kinds: [{ apiGroups: [""], kinds: ["Pod"] }]
parameters:
labels:
- key: team
allowedRegex: "^(frontend|backend|platform)$"Wait, that’s a slightly different pattern. Let me redo with the team-only check:
# (refined rego for the team label)
violation[{"msg": msg, "details": {}}] {
value := input.review.object.metadata.labels.team
not valid_team(value)
msg := sprintf("invalid team label: '%v'", [value])
}
valid_team(team) {
team == "frontend"
}
valid_team(team) {
team == "backend"
}
valid_team(team) {
team == "platform"
}This is the “deny if the team label is not one of the allowed values” pattern. The Rego is the policy; the constraint is the instance.
7. The Audit Mode
Audit mode runs policies against existing objects in the cluster, not just at admission time.
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sApprovedRegistry
metadata: { name: must-come-from-ecr }
spec:
enforcementAction: dryrun # "warn" / "dryrun" / "deny"
match: {...}
parameters: {...}The enforcementAction field:
deny(default) — reject non-compliant objects at admission.dryrun— allow, but record a violation in the audit log.warn— allow, but warn the user via the admission response.
dryrun is the standard “I’m rolling out a new policy, let me see what’s already broken” mode.
8. Mutating with Gatekeeper
Gatekeeper 3.7+ supports mutating admission via the MutatingAdmissionPolicy CRD. You write a Rego that returns a JSON patch:
package mutator.default_labels
mutate[patch] {
not input.review.object.metadata.labels.app
patch := {"op": "add", "path": "/metadata/labels/app", "value": input.review.object.metadata.name}
}The patch is applied to the object before it’s stored. This is the “inject default labels” pattern.
Mutating webhooks are riskier than validating. A bug in the mutator can corrupt objects. Use sparingly.
9. OPA Outside Gatekeeper
OPA is generic. It can be used for:
- API gateway authorization — OPA at the gateway, evaluates “can this user call this endpoint”.
- Microservice authorization — the app calls OPA at request time: “can this user access this resource?”
- Terraform validation — Conftest (OPA) in CI, runs Rego against
.tffiles. - SSH / sudo authorization — OPA at the authn layer.
- Kafka authorization — OPA at the broker, evaluates “can this client read this topic?”
For k8s, Gatekeeper is the deployment. For everything else, OPA is the engine.
10. The Rego vs CEL Decision
CEL (Common Expression Language) is the alternative to Rego, supported natively by k8s (no OPA needed).
| Rego | CEL | |
|---|---|---|
| Used in | OPA / Gatekeeper | k8s native (1.30+), Kyverno |
| Standard | OPA’s DSL | Google’s CEL (used in CEL-Go) |
| Engine | OPA (separate binary) | Built into the apiserver |
| Performance | Slower (separate process) | Faster (in-process) |
| Expressiveness | High (logic programming) | Medium (expression language) |
| Familiarity | Rego-specific | More familiar to most developers |
The decision:
- Use Rego if you want to share policies across systems (k8s, API gateway, CI, etc.) — OPA is the lingua franca.
- Use CEL if you’re k8s-only and want the simplest deployment — no extra pods.
- Use Kyverno if you want k8s-native (no separate language) and YAML-style policies.
11. Performance and Caching
Gatekeeper is on the admission hot path. A slow Gatekeeper slows down all admission.
11.1 The cache
Gatekeeper caches the result of evaluation per (object UID, policy). If a request comes in for the same object with the same policy, the cached result is used. The cache is invalidated when constraints or templates change.
11.2 The metrics
# check the Gatekeeper pod's metrics
kubectl -n gatekeeper-system port-forward <gatekeeper-pod> 8888:8888
curl localhost:8888/metricsKey metrics:
gatekeeper_admission_requests_total— total admission requests.gatekeeper_admission_response_time_seconds— response time.gatekeeper_violations_total— number of policy violations.
11.3 The slow policy
A policy that’s slow:
- Has a large iteration (e.g. over all containers, all env vars, all volumes).
- Calls out to external data.
- Has expensive string operations.
To speed up:
- Limit the constraint’s
matchto specific resources / namespaces. - Use the
matchto skip irrelevant requests. - Avoid external data calls.
12. Common Policy Patterns
12.1 “All images must come from a specific registry”
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not startswith(container.image, input.parameters.registries[_])
msg := sprintf("image '%v' is not from an approved registry", [container.image])
}12.2 “All Pods must have resource requests and limits”
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.requests
msg := sprintf("container '%v' has no resource requests", [container.name])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits
msg := sprintf("container '%v' has no resource limits", [container.name])
}12.3 “No privileged containers”
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf("container '%v' is privileged", [container.name])
}12.4 “No host namespaces”
violation[{"msg": msg}] {
input.review.object.spec.hostNetwork == true
msg := "hostNetwork is not allowed"
}
violation[{"msg": msg}] {
input.review.object.spec.hostPID == true
msg := "hostPID is not allowed"
}
violation[{"msg": msg}] {
input.review.object.spec.hostIPC == true
msg := "hostIPC is not allowed"
}12.5 “All Deployments must have at least 3 replicas”
violation[{"msg": msg}] {
input.review.object.kind == "Deployment"
input.review.object.spec.replicas < 3
msg := "Deployments must have at least 3 replicas"
}(Note: this needs to be on a Deployment object, not a Pod. The match’s kinds would be apps/Deployment.)
13. Operations and Debugging
13.1 Common commands
# check the Gatekeeper pods
kubectl -n gatekeeper-system get pods
# list templates
kubectl get constrainttemplates
# list constraints
kubectl get constraints
# or by kind
kubectl get k8sapprovedregistry
# describe a constraint
kubectl describe k8sapprovedregistry <name>
# check the audit status
kubectl get k8sapprovedregistry <name> -o jsonpath='{.status.violations}'
# shows existing violations13.2 The “policy is rejecting everything” case
# 1. Find the violating object
kubectl get events --field-selector reason=FailedCreate -A
# look for "admission webhook denied the request"
# 2. Check which policy denied it
# the event message usually says "Denied by K8sApprovedRegistry" or similar
# 3. See the policy
kubectl get k8sapprovedregistry <name> -o yaml
# look at the parameters
# 4. See the violation
kubectl get k8sapprovedregistry <name> -o jsonpath='{.status.violations}'
# the violation object has the offending object name13.3 The “Gatekeeper is slow” case
# 1. Check the response time metric
kubectl -n gatekeeper-system port-forward <gatekeeper-pod> 8888:8888
curl localhost:8888/metrics | grep response_time
# 2. Check the policy for expensive operations
# large iterations, external data calls
# 3. Add namespaceSelector / objectSelector to limit scope
# the constraint's match should be as narrow as possible14. Gotchas and Common Mistakes
14.1 The 20+ common mistakes
-
Rego is not a general-purpose language. It’s a DSL for policy. Don’t try to do everything in it.
-
A policy that catches “all” resources is dangerous. Gatekeeper runs on every admission. A bad policy can block all Pods.
-
Use
dryrunfirst. Don’t go straight todeny. Run in audit mode, see the violations, then promote todeny. -
The
matchis critical. A constraint without a namespace match applies cluster-wide. A constraint with a wrong resource match applies to the wrong objects. -
The
admissionReviewVersionson the Gatekeeper’s webhook config is["v1"]. If you’re on an older k8s, may be different. -
A mutating webhook is harder to debug than a validating one. The patches must be correct. A bad patch is rejected by the apiserver.
-
Rego’s
[_]is “any element”. A rule with multiple[_]is “any combination”. Be careful — the rule may match more than you think. -
The
databuiltin requires external data providers. Setting up external data is non-trivial. For most policies, the input is enough. -
A ConstraintTemplate’s Rego must be valid Rego. A syntax error in the Rego is a silent failure — Gatekeeper logs an error, but the policy is not active.
-
Constraint parameters are validated against the CRD’s OpenAPI schema. A bad parameter is rejected by the apiserver.
-
The
auditmode runs periodically. It doesn’t run on every object. There’s a default interval (60s) for audits. -
The
enforcementActionfield is on the Constraint, not the Template. The same Template can have Constraints with different enforcement actions. -
A
dryrunConstraint still produces audit records. Look at thestatus.violationsto see what would be denied. -
A
warnConstraint produces a warning in thekubectloutput. The user sees “Warning:would have denied this.” -
Gatekeeper’s mutating webhook is in beta. It works, but the API may change.
-
A policy that depends on a
defaultvalue of an unset field may not work. Rego can’t distinguish “field unset” from “field is empty string”. -
The
input.review.objectis the full k8s object (Pod, Deployment, etc.). The structure depends on the kind. -
A policy that requires the object to be in a specific state (e.g. “Pod has a running status”) doesn’t work at admission time. The Pod is being created; the status is not yet set.
-
Rego’s
count()is for set cardinality. Usecount(...) > 0to check non-emptiness. -
The
sprintffunction is for string formatting. Use it to build dynamic messages. -
A ConstraintTemplate’s CRD is auto-generated. The CRD’s name is the template’s
metadata.name, the kind iscrd.spec.names.kind. -
A Constraint must reference a registered ConstraintTemplate. If the template doesn’t exist, the constraint is rejected.
-
The audit results are in
status.violationsof the Constraint. Not in events. -
Gatekeeper is a single point of failure for admission. If Gatekeeper is down,
failurePolicy: Ignoreis the safe default (don’t reject). -
The
regofield in a ConstraintTemplate is a string. Multi-line strings use YAML’s|(literal) or>(folded). -
A Rego rule with a missing condition may not match. The rule produces no violations, so the request is allowed. Be explicit.
-
A policy that uses
time.now()is not idempotent. The same request evaluated twice may produce different results. -
A policy that uses
randis not idempotent. Don’t. -
The
matchselector supportsexcludedNamespacesandlabelSelector. Use them to scope the policy precisely. -
A
ClusterConstraint(notConstraint) is cluster-scoped. Most constraints are namespace-scoped. ClusterConstraints are for policies that need cluster-wide visibility.
See also
- Kyverno — the alternative to OPA / Gatekeeper
- Admission Controllers — how Gatekeeper fits in
- PSS — the built-in alternative for basic checks
- Image Hardening — one of the most common policy targets