Skip to main content

Policy as Code: Enforcing Governance

Key Takeaways for AI & Readers
  • Automated Governance: Policy as Code replaces manual reviews and "best practices" documents with automated rules enforced at the API level. Non-compliant resources are blocked before they ever reach etcd, preventing misconfigurations from reaching production.
  • Admission Control: Policies are implemented as Kubernetes admission webhooks. When a user runs kubectl apply, the request is intercepted by the policy engine, which validates (or mutates) it before the resource is persisted.
  • Kyverno: A Kubernetes-native policy engine where policies are written entirely in YAML. Supports validation, mutation, generation, image verification, and cleanup policies. Lower learning curve for Kubernetes engineers.
  • OPA/Gatekeeper: Uses the Rego query language for policy logic, which is more powerful for complex cross-resource and cross-domain constraints. Backed by the CNCF and widely adopted in enterprises.
  • Common Policy Categories: Require labels, block latest tag, enforce resource limits, restrict container registries, require non-root containers, enforce network policies, and verify image signatures.
  • Audit Mode: Both engines support running policies in audit/warn mode to identify violations in existing resources without blocking new deployments, enabling safe rollout.

In a large organization, you cannot rely on humans consistently following "best practices" documentation. Configuration standards drift within weeks. Developers forget to set resource limits, use the latest image tag in production, or deploy containers running as root. Manual code reviews catch some issues, but they do not scale and they miss things.

Policy as Code solves this by encoding governance rules as machine-enforceable policies. These policies run as Kubernetes admission controllers — they intercept every kubectl apply, helm install, and CI/CD deployment, and either block non-compliant resources or automatically fix them.

1. How Admission Controllers Work

When you create or update a resource in Kubernetes, the request flows through the API server's admission chain:

  1. Authentication: Who is making the request?
  2. Authorization (RBAC): Are they allowed to?
  3. Mutating Admission: Modify the request (add defaults, inject sidecars).
  4. Validating Admission: Accept or reject the request.
  5. Persistence: Save to etcd.

Developer Input

runAsNonRoot:
👮
Kyverno Policy
disallow-root-user

Policy engines (Kyverno, Gatekeeper) register as webhook admission controllers in steps 3 and 4. They see every resource creation and modification and can either modify the resource (mutation) or reject it (validation).

2. Kyverno: Kubernetes-Native Policies

Kyverno is designed specifically for Kubernetes. Policies are written in YAML using Kubernetes-style resource definitions, making them immediately familiar to anyone who works with Kubernetes manifests.

Validation: Require Labels

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-team-label
annotations:
policies.kyverno.io/title: Require Team Label
policies.kyverno.io/description: >-
All Deployments and StatefulSets must have a 'team' label
for ownership tracking and incident response.
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce # block non-compliant resources
background: true # also scan existing resources
rules:
- name: check-team-label
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
validate:
message: >-
The label 'team' is required on all Deployments and StatefulSets.
Add metadata.labels.team to your resource.
pattern:
metadata:
labels:
team: "?*" # must be non-empty

Validation: Block Latest Tag

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-latest-tag
spec:
validationFailureAction: Enforce
rules:
- name: validate-image-tag
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Using the ':latest' tag is not allowed. Specify an explicit tag or digest."
pattern:
spec:
containers:
- image: "!*:latest & !*:*" # block :latest and untagged images
=(initContainers): # also check init containers if present
- image: "!*:latest & !*:*"

Validation: Enforce Resource Limits

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "All containers must have CPU and memory requests and limits defined."
pattern:
spec:
containers:
- resources:
requests:
memory: "?*"
cpu: "?*"
limits:
memory: "?*"

Validation: Restrict Container Registries

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
rules:
- name: validate-registries
match:
any:
- resources:
kinds:
- Pod
validate:
message: >-
Images must come from approved registries:
registry.example.com or ghcr.io/myorg.
pattern:
spec:
containers:
- image: "registry.example.com/* | ghcr.io/myorg/*"

Mutation: Inject Default Labels

Mutation policies automatically modify resources to conform to standards. This is useful for adding defaults that developers commonly forget:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-labels
spec:
rules:
- name: add-managed-by
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
- DaemonSet
mutate:
patchStrategicMerge:
metadata:
labels:
+(app.kubernetes.io/managed-by): "kyverno-defaulted"
+(environment): "production"
- name: add-default-security-context
match:
any:
- resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
containers:
- (name): "*"
+(securityContext):
+(allowPrivilegeEscalation): false
+(readOnlyRootFilesystem): true

The +() syntax means "add if not present" — it does not overwrite values that the developer explicitly set.

Generation: Auto-Create Resources

Generate policies create companion resources automatically. For example, creating a default NetworkPolicy whenever a new namespace is created:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-networkpolicy
spec:
rules:
- name: default-deny-ingress
match:
any:
- resources:
kinds:
- Namespace
exclude:
any:
- resources:
namespaces:
- kube-system
- kube-public
generate:
synchronize: true # keep in sync if the policy changes
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-ingress
namespace: "{{request.object.metadata.name}}"
data:
spec:
podSelector: {}
policyTypes:
- Ingress # deny all ingress by default

Image Verification: Cosign Signatures

Kyverno can verify that container images are signed with Cosign before allowing them to run:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: Enforce
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-signature
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "registry.example.com/*"
attestors:
- entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----

3. OPA Gatekeeper: Rego-Based Policies

OPA (Open Policy Agent) Gatekeeper uses the Rego language for policy logic. Rego is more powerful than YAML patterns for complex constraints that require data aggregation, cross-resource references, or conditional logic.

Architecture

Gatekeeper uses two custom resources:

  • ConstraintTemplate: Defines the policy logic in Rego and the parameters it accepts.
  • Constraint: An instance of a template with specific parameter values.

ConstraintTemplate: Require Labels

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels

violation[{"msg": msg}] {
# Get the list of required labels from parameters
required := input.parameters.labels[_]
# Check if the label exists on the resource
not input.review.object.metadata.labels[required]
msg := sprintf("Resource is missing required label: %v", [required])
}

Constraint: Apply the Template

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: deployments-must-have-team
spec:
enforcementAction: deny # or "dryrun" for audit mode
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment", "StatefulSet"]
excludedNamespaces:
- kube-system
- gatekeeper-system
parameters:
labels:
- "team"
- "cost-center"

ConstraintTemplate: Block Privileged Containers

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sblockprivileged
spec:
crd:
spec:
names:
kind: K8sBlockPrivileged
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sblockprivileged

violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
container.securityContext.privileged == true
msg := sprintf("Privileged container '%v' is not allowed", [container.name])
}

violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
container.securityContext.privileged == true
msg := sprintf("Privileged init container '%v' is not allowed", [container.name])
}

4. Kyverno vs. OPA Gatekeeper

AspectKyvernoOPA Gatekeeper
Policy languageYAML (Kubernetes-native)Rego (purpose-built policy language)
Learning curveLow for K8s engineersModerate-to-high (Rego syntax)
ValidationYesYes
MutationYes (native)Limited (external data only)
GenerationYes (create resources)No
Image verificationYes (Cosign, Notary)Via external tools
Audit existing resourcesYes (background: true)Yes (audit mode)
Cross-resource logicLimitedPowerful (Rego data aggregation)
CNCF statusCNCF IncubatingCNCF Graduated (OPA)
CommunityGrowing rapidlyLarge, mature ecosystem
Testingkyverno test CLIopa test framework
Best forTeams wanting YAML simplicityTeams needing complex logic or already using OPA

Recommendation: Start with Kyverno if your policies are straightforward Kubernetes guardrails (label enforcement, image restrictions, resource limits). Choose Gatekeeper if you need complex conditional logic, cross-resource validation, or already use OPA elsewhere in your infrastructure.

5. Policy Testing

Kyverno Testing

# test.yaml — Kyverno test case
apiVersion: cli.kyverno.io/v1alpha1
kind: Test
metadata:
name: test-require-labels
policies:
- require-team-label.yaml
resources:
- deployment-with-label.yaml
- deployment-without-label.yaml
results:
- policy: require-team-label
rule: check-team-label
resource: deployment-with-label
result: pass
- policy: require-team-label
rule: check-team-label
resource: deployment-without-label
result: fail
# Run tests
kyverno test .

Gatekeeper Testing

# Test Rego policies with OPA CLI
opa test ./policies/ -v

Both tools integrate into CI/CD pipelines so policies can be tested before deployment, the same way you test application code.

6. Audit Mode and Gradual Rollout

Rolling out policies that block resources in production is risky. Both engines support audit/dry-run modes:

Kyverno: Set validationFailureAction: Audit to log violations without blocking. Use kubectl get policyreport -A to view violations in existing resources.

Gatekeeper: Set enforcementAction: dryrun on the Constraint. Use kubectl get constraints to see violation counts.

  1. Deploy the policy in audit mode.
  2. Wait 1-2 weeks. Review violations in policy reports.
  3. Fix or exempt the legitimate violations.
  4. Switch to enforce mode.
  5. Monitor for unexpected rejections.

Common Pitfalls

  1. Deploying in enforce mode immediately: This will break existing workloads. Always start with audit/warn.
  2. Not excluding system namespaces: Blocking kube-system, monitoring, or the policy engine's own namespace will cause cluster instability. Always exclude critical namespaces.
  3. Overly broad policies: A policy that requires labels on all resources (including Secrets, ConfigMaps, ServiceAccounts) creates excessive friction. Target specific resource kinds.
  4. Webhook timeouts: If the policy engine is down or slow, admission requests will fail. Configure failurePolicy: Ignore for non-critical policies so API server requests are not blocked when the webhook is unavailable.
  5. Ignoring init containers: Policies that check containers should also check initContainers and ephemeralContainers.
  6. Not testing policies: Treat policies as code — write tests, run them in CI, and review changes in pull requests.

Best Practices

  1. Start with 5-10 essential policies (require labels, block latest, enforce limits, restrict registries, require non-root). Do not try to cover every edge case on day one.
  2. Use audit mode for at least 2 weeks before enforcing any policy.
  3. Exclude system namespaces (kube-system, kube-public, gatekeeper-system, kyverno) from all policies.
  4. Version control your policies in Git and deploy them via GitOps (ArgoCD, Flux).
  5. Combine mutation and validation: Use mutation to add sensible defaults, then validate that the result meets your standards.
  6. Set failurePolicy: Ignore on non-critical validating webhooks to prevent policy engine downtime from blocking all deployments.
  7. Monitor policy engine health: Set up alerts for webhook latency, error rates, and pod health.
  8. Use policy reports (Kyverno) or constraint status (Gatekeeper) to regularly audit compliance across the cluster.
  9. Integrate policy checks into CI/CD: Run kyverno apply or opa eval in your pipeline to catch violations before they reach the cluster.

What's Next?

  • Explore Pod Security Admission for built-in Kubernetes security standards that complement policy engines.
  • Learn about Secrets Management and enforce that workloads use external secrets operators instead of inline values.
  • See Multi-Tenancy for combining policies with namespace isolation for team governance.
  • Understand Progressive Delivery and enforce that all Rollouts include AnalysisTemplates via policy.