Webhooks: Intercepting the API
- API Interception: Admission webhooks allow external services to intercept and modify (Mutating) or approve/reject (Validating) API requests before they are persisted in etcd. They are the extension point that powers policy engines, sidecar injection, and custom validation logic.
- Mutating vs. Validating: Mutating webhooks are called first and can alter resource definitions (inject sidecars, add labels, set defaults). Validating webhooks are called second and can only allow or deny requests. A single webhook server can implement both types.
- Dynamic Policy Enforcement: Webhooks provide a dynamic and extensible way to enforce custom policies and automate resource modifications without recompiling the Kubernetes API server. They are the mechanism behind tools like Istio, Kyverno, OPA/Gatekeeper, and cert-manager.
- Availability Risk: A failing webhook server can block all resource creation in the cluster. The
failurePolicysetting (Fail or Ignore) determines whether a webhook outage causes cluster lockup or silently bypasses checks. Production webhooks must be highly available. - Performance Impact: Every webhook adds latency to API requests. Timeouts, namespace selectors, and object selectors should be configured carefully to minimize the performance impact on cluster operations.
How does Istio automatically inject a sidecar container into every Pod? How does your cluster block Pods that lack a cost-center label? How does cert-manager automatically add TLS annotations? The answer is Admission Webhooks -- the API server's built-in extension mechanism for intercepting, modifying, and validating every resource change.
1. The Admission Control Chain
When you run kubectl apply, your request passes through several stages in the API server before it is saved to etcd. Webhooks sit in the admission control phase, after authentication and authorization but before persistence.
Injecting Sidecar...
The full request flow is:
- Authentication: The API server verifies the caller's identity (client certificate, token, OIDC).
- Authorization: RBAC or ABAC checks whether the caller is allowed to perform the action.
- Mutating Admission Webhooks: Called in order. Each webhook can modify the resource. After all mutating webhooks run, the resource is re-validated against the schema.
- Object Schema Validation: The API server validates the mutated resource against the OpenAPI schema.
- Validating Admission Webhooks: Called in parallel (not sequentially). Each webhook can approve or reject the request. If any validating webhook rejects, the request fails.
- Persistence: The resource is written to etcd.
Mutating Admission Webhooks
Mutating webhooks are called first and can modify the resource being created or updated. They receive the resource as a JSON patch target and return a JSON Patch or the modified resource.
Common use cases:
- Sidecar injection: Istio's webhook adds an Envoy sidecar container to every Pod in labeled namespaces.
- Default resource limits: Inject default CPU/memory requests and limits if not specified.
- Label injection: Add organizational labels (team, cost-center, environment) automatically.
- Image mutation: Rewrite image references to point to an internal registry mirror.
Validating Admission Webhooks
Validating webhooks are called second and can only approve or reject the request. They cannot modify the resource. Validating webhooks run in parallel for better performance.
Common use cases:
- Policy enforcement: Reject Pods that request privileged mode or host networking.
- Label requirements: Reject resources missing required labels.
- Image allowlisting: Reject images from unapproved registries.
- Resource quotas: Enforce custom resource limits beyond what ResourceQuota supports.
2. Webhook Configuration
Webhook configurations are cluster-scoped resources that tell the API server which webhook servers to call and for which resources.
# MutatingWebhookConfiguration: inject sidecar into all pods
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: sidecar-injector
annotations:
cert-manager.io/inject-ca-from: webhooks/sidecar-injector-tls
webhooks:
- name: sidecar-injector.example.com
admissionReviewVersions: ["v1"]
sideEffects: None # Required: declare no side effects
timeoutSeconds: 10 # Fail fast — default is 10s, max 30s
failurePolicy: Fail # Reject requests if webhook is unavailable
reinvocationPolicy: IfNeeded # Re-invoke if another webhook mutates the object
clientConfig:
service:
name: sidecar-injector # Webhook server Service name
namespace: webhooks
path: /mutate # HTTP endpoint on the webhook server
port: 443
namespaceSelector:
matchLabels:
sidecar-injection: enabled # Only apply to labeled namespaces
objectSelector:
matchLabels:
inject-sidecar: "true" # Only apply to labeled pods
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"] # Only on pod creation, not updates
resources: ["pods"]
scope: Namespaced
# ValidatingWebhookConfiguration: enforce security policies
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: security-policy-validator
webhooks:
- name: security-policy.example.com
admissionReviewVersions: ["v1"]
sideEffects: None
timeoutSeconds: 5 # Validation should be fast
failurePolicy: Fail
matchPolicy: Equivalent # Match equivalent API versions
clientConfig:
service:
name: policy-validator
namespace: webhooks
path: /validate
port: 443
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values: ["kube-system", "kube-public"] # Exclude system namespaces
rules:
- apiGroups: ["apps"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["deployments", "statefulsets", "daemonsets"]
scope: Namespaced
3. How the API Server Calls Webhooks
The API server sends an AdmissionReview request to the webhook server over HTTPS. The request contains the resource being created or updated, along with metadata about the operation.
# AdmissionReview request (sent by API server to webhook)
apiVersion: admission.k8s.io/v1
kind: AdmissionReview
request:
uid: "abc-123" # Unique request ID
kind:
group: ""
version: v1
kind: Pod
resource:
group: ""
version: v1
resource: pods
operation: CREATE
userInfo:
username: "developer@example.com"
object: # The resource being created
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: myregistry.io/app:v1.0
The webhook server must respond with an AdmissionReview response containing either an allowed/denied decision (validating) or a JSON Patch (mutating).
# AdmissionReview response (mutating — adds a sidecar container)
apiVersion: admission.k8s.io/v1
kind: AdmissionReview
response:
uid: "abc-123" # Must match the request UID
allowed: true
patchType: JSONPatch
patch: "W3sib3AiOiAiYWRkIiwgInBhdGgiOiAiL3NwZWMvY29udGFpbmVycy8tIiwgInZhbHVlIjogeyJuYW1lIjogImVudm95IiwgImltYWdlIjogImVudm95cHJveHkvZW52b3k6djEuMjgifX1d"
# Base64-decoded patch:
# [{"op": "add", "path": "/spec/containers/-", "value": {"name": "envoy", "image": "envoyproxy/envoy:v1.28"}}]
4. Failure Policies
The failurePolicy field determines what happens when the webhook server is unreachable or returns an error.
failurePolicy: Fail (Default for Security)
If the webhook is unavailable or times out, the API request is rejected. This is the safe default for security-critical webhooks. However, it means a webhook outage can freeze the entire cluster -- no Pods, Deployments, or any other watched resources can be created or updated.
failurePolicy: Ignore (Default for Reliability)
If the webhook is unavailable, the API request is allowed through without the webhook check. This ensures cluster availability but means security policies are silently bypassed during webhook outages.
Best Practice
For production clusters, most security-critical webhooks should use failurePolicy: Fail combined with:
- At least 3 replicas of the webhook server, spread across availability zones.
- PodDisruptionBudget with
minAvailable: 2to prevent all replicas from going down during node maintenance. - PriorityClass set to
system-cluster-criticalso the webhook server is scheduled before application pods. - Resource requests and limits to prevent resource starvation.
5. Side Effects and Reinvocation
sideEffects
The sideEffects field declares whether the webhook has side effects beyond modifying the admission response. The API server uses this for dry-run requests (kubectl apply --dry-run=server).
- None: The webhook has no side effects. Dry-run requests will call the webhook.
- NoneOnDryRun: The webhook has side effects on real requests but behaves correctly for dry-run. Dry-run requests will call the webhook.
reinvocationPolicy
When multiple mutating webhooks are configured, one webhook's mutation might affect another webhook's logic. The reinvocationPolicy controls whether webhooks are called again after mutations.
- Never (default): Each mutating webhook is called exactly once.
- IfNeeded: The webhook is re-invoked if the object was modified by another mutating webhook after this one ran.
6. Namespace and Object Selectors
Scope your webhooks to minimize their blast radius and performance impact.
namespaceSelector
Limits the webhook to resources in namespaces matching label criteria. This is the most important scoping mechanism.
# Only intercept resources in namespaces with the "env: production" label
namespaceSelector:
matchLabels:
env: production
A critical best practice is to exclude system namespaces (kube-system, kube-public, kube-node-lease) from your webhooks. A webhook that intercepts kube-system resources can prevent CoreDNS, kube-proxy, or the CNI plugin from starting, which will cascade into a cluster-wide outage.
objectSelector
Limits the webhook to resources matching specific labels. This allows fine-grained opt-in at the resource level.
# Only intercept pods with the "inject-sidecar: true" label
objectSelector:
matchLabels:
inject-sidecar: "true"
7. Performance Impact and Timeout Configuration
Every webhook adds latency to API server requests. In a cluster with multiple webhooks (Istio, Kyverno, cert-manager, custom policies), the cumulative latency can significantly slow down deployments.
- Timeout: Set
timeoutSecondsas low as practical. Validating webhooks should respond in under 5 seconds. Mutating webhooks should respond in under 10 seconds. The maximum is 30 seconds. - Scope rules tightly: Use
rulesto limit which resources and operations trigger the webhook. A webhook that fires on every GET request will destroy API server performance. - Use namespaceSelector: Avoid cluster-wide webhooks when possible.
- Monitor webhook latency: The API server exposes
apiserver_admission_webhook_admission_duration_secondsmetrics. Alert if webhook latency exceeds your thresholds.
8. Common Webhook Patterns
Sidecar Injection (Istio-Style)
A mutating webhook that adds a container to every Pod in labeled namespaces. The webhook reads the Pod spec, adds the sidecar container and init container, and returns the modified Pod.
Default Labels and Annotations
A mutating webhook that adds organizational metadata (team, environment, cost-center) to resources that lack them. This ensures consistent labeling without burdening developers.
Image Registry Rewriting
A mutating webhook that rewrites container image references from public registries to an internal mirror. For example, nginx:1.25 becomes mirror.internal.io/library/nginx:1.25.
Policy Validation
A validating webhook that enforces organizational policies: no privileged containers, required resource limits, required labels, restricted host mounts. This is what Kyverno and OPA/Gatekeeper implement under the hood.
9. Debugging Webhooks
When a webhook rejects a request, the error message appears in kubectl output. However, debugging why a webhook rejected or incorrectly mutated a request requires deeper investigation.
# Check webhook configurations
# kubectl get mutatingwebhookconfigurations
# kubectl get validatingwebhookconfigurations
# Describe a specific webhook to see its rules and selectors
# kubectl describe mutatingwebhookconfiguration sidecar-injector
# Check webhook server logs
# kubectl logs -n webhooks deployment/sidecar-injector
# Test a dry-run to see if the webhook would fire
# kubectl apply --dry-run=server -f pod.yaml -v=6
Common debugging steps:
- Check if the webhook is registered:
kubectl get mutatingwebhookconfigurationsand verify your webhook appears. - Check webhook server health: Ensure the webhook server Pods are running and their Service endpoint resolves.
- Check TLS certificates: The API server must trust the webhook's TLS certificate. Certificate expiration is a common cause of webhook failures.
- Check namespace labels: If your webhook uses
namespaceSelector, verify the target namespace has the correct labels. - Increase API server verbosity: Use
-v=6or higher with kubectl to see the raw admission review requests and responses.
10. cert-manager for Webhook Certificates
The API server communicates with webhooks over HTTPS, requiring valid TLS certificates. cert-manager automates certificate issuance and renewal for webhook servers.
# cert-manager Certificate for the webhook server
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: sidecar-injector-tls
namespace: webhooks
spec:
secretName: sidecar-injector-tls # Secret where the cert is stored
dnsNames:
- sidecar-injector.webhooks.svc # Must match the Service DNS name
- sidecar-injector.webhooks.svc.cluster.local
issuerRef:
name: cluster-ca-issuer # Reference to a cert-manager Issuer
kind: ClusterIssuer
duration: 8760h # 1 year validity
renewBefore: 720h # Renew 30 days before expiry
The cert-manager.io/inject-ca-from annotation on the webhook configuration automatically injects the CA bundle, so the API server trusts the webhook's certificate without manual CA distribution.
Common Pitfalls
- Webhook intercepting kube-system: A misconfigured webhook that intercepts
kube-systemcan prevent critical system components from starting, causing a cluster-wide outage. Always exclude system namespaces. - Certificate expiration: If the webhook's TLS certificate expires, the API server cannot reach it. With
failurePolicy: Fail, this freezes the cluster. Use cert-manager with automatic renewal. - Circular dependencies: A webhook server deployed in the same cluster it monitors can create a chicken-and-egg problem. If the webhook Pod needs to be recreated but the webhook blocks its own creation, the cluster is stuck. Use
namespaceSelectorto exclude the webhook's own namespace. - Too-broad rules: A webhook that fires on all resources, all operations, and all namespaces adds unnecessary latency. Scope rules to only the resources and operations you need to intercept.
- Not handling timeouts: Webhook servers that perform expensive operations (external API calls, database lookups) can timeout. Return a default response quickly and perform expensive checks asynchronously if possible.
- Missing PodDisruptionBudget: Without a PDB, node drains during maintenance can simultaneously terminate all webhook replicas, causing a cluster lockup.
What's Next?
- Explore ValidatingAdmissionPolicy (GA in Kubernetes 1.30), which provides in-process validation using CEL expressions without external webhook servers, reducing latency and operational complexity.
- Deploy cert-manager to automate TLS certificate management for your webhook servers.
- Implement custom webhooks using the controller-runtime library (Go) or FastAPI (Python) for rapid prototyping.
- Study how Kyverno and OPA/Gatekeeper implement policy engines on top of the webhook framework.
- Set up monitoring and alerting on
apiserver_admission_webhook_admission_duration_secondsto detect webhook performance degradation before it impacts cluster operations.