The API Aggregation Layer
- Beyond CRDs: The API Aggregation Layer extends the Kubernetes API with fully custom API servers that handle their own storage, business logic, and sub-resources -- capabilities that CRDs cannot provide.
- Proxy Architecture: The main
kube-apiserveracts as a proxy, forwarding requests for specific API groups to extension API servers running as pods in the cluster. Clients interact with a single unified API endpoint. - APIService Resource: The
APIServiceobject registers an extension API server with the main API server, specifying which API group and version it handles and where to route requests. - Authentication Flow: The main API server authenticates the client, then forwards the request to the extension server with the user's identity in HTTP headers. The extension server can implement its own authorization logic.
- Real-World Examples:
metrics-server(the most common aggregated API), the deprecatedservice-catalog, and custom APIs for hardware inventory, cost management, and multi-cluster federation. - Decision Point: Use CRDs for declarative data stored in etcd; use API Aggregation when you need custom storage backends, computed responses, sub-resource support, or protocol-level control.
Sometimes CRDs (Custom Resource Definitions) are not enough. When you need an API endpoint that computes responses on the fly (like live resource metrics), uses its own storage backend (like a time-series database), or implements complex sub-resources (like exec or logs), you use the API Aggregation Layer.
1. How API Aggregation Works
The API Aggregation Layer extends the Kubernetes API by allowing external API servers to register themselves as handlers for specific API groups. The main kube-apiserver acts as a smart reverse proxy:
- A client sends a request to
https://<api-server>/apis/metrics.k8s.io/v1beta1/nodes/worker-01. - The main API server checks its list of registered APIService objects.
- It finds that
metrics.k8s.io/v1beta1is handled by themetrics-serverService inkube-system. - It proxies the request to the extension API server pod(s) behind that Service.
- The extension server processes the request (in this case, returning the node's current CPU and memory usage) and returns the response.
- The main API server forwards the response back to the client.
From the client's perspective, the aggregated API is indistinguishable from built-in APIs. It appears in kubectl api-resources, supports kubectl get, and works with client-go and all standard Kubernetes tooling.
Enabling API Aggregation
API aggregation requires the --enable-aggregator-routing flag on the kube-apiserver. In most managed Kubernetes services and kubeadm clusters, this is enabled by default. Additionally, the kube-apiserver needs the following flags configured:
# API server flags for aggregation (typically already set)
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
2. CRDs vs. API Aggregation
This is a critical architectural decision. The following comparison covers the dimensions that matter most:
| Feature | CRDs | API Aggregation |
|---|---|---|
| Complexity | Low -- just a YAML definition | High -- requires building and operating a custom API server |
| Storage | Uses the cluster's etcd | Can use any backend (database, in-memory, external service) |
| Logic | Declarative; logic lives in controllers | Can implement arbitrary request handling and computed responses |
| Sub-resources | Limited (/status, /scale only) | Full control over any sub-resources |
| Validation | CEL expressions, webhooks | In-code validation (full programmatic control) |
| Versioning | Conversion webhooks for multi-version | Native multi-version support with in-code conversion |
| Performance | Bound by etcd read/write throughput | Optimized for specific access patterns |
| Availability | Tied to etcd and kube-apiserver | Independent failure domain (but must be available for API to work) |
| Example | Cert-Manager Certificate, Argo Workflow | metrics-server, service-catalog |
When to Choose CRDs
- You want to store declarative configuration (desired state + status).
- Your data model fits the Kubernetes resource model (metadata, spec, status).
- You are building controllers/operators that reconcile desired state.
- You want to leverage existing Kubernetes tooling (kubectl, RBAC, audit logging) without building a custom server.
When to Choose API Aggregation
- You need computed responses that do not come from stored data (e.g., real-time metrics, hardware status, cost estimates).
- You need a custom storage backend (time-series database, graph database, external SaaS API).
- You need custom sub-resources beyond
/statusand/scale(e.g.,/exec,/logs,/proxy). - You need protocol-level control (WebSocket upgrades, streaming responses, custom serialization).
- You need high-performance reads that would be bottlenecked by etcd.
3. The APIService Resource
The APIService object is how you register an extension API server with the main kube-apiserver.
# Register the metrics-server as the handler for metrics.k8s.io/v1beta1
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io # Format: <version>.<group>
spec:
group: metrics.k8s.io # API group being served
version: v1beta1 # Version being served
service:
name: metrics-server # Service name in the cluster
namespace: kube-system # Namespace of the service
port: 443 # Port the service listens on
groupPriorityMinimum: 100 # Priority for API group sorting
versionPriority: 100 # Priority for version sorting within the group
insecureSkipTLSVerify: false # Should be false in production
caBundle: <base64-encoded-CA> # CA to verify the extension server's TLS cert
APIService Status
The kube-apiserver periodically probes the extension server's health endpoint. The APIService status reflects whether the extension server is reachable:
kubectl get apiservices v1beta1.metrics.k8s.io
# NAME SERVICE AVAILABLE AGE
# v1beta1.metrics.k8s.io kube-system/metrics-server True 30d
If AVAILABLE is False, requests to that API group will fail with an error. This is a common troubleshooting signal: if kubectl top nodes returns an error, check the APIService status first.
4. Authentication and Authorization Flow
The authentication and authorization flow for aggregated APIs involves coordination between the main API server and the extension server.
Client ──(1. AuthN)──> kube-apiserver ──(2. Proxy)──> Extension API Server
|
(3. AuthZ: SubjectAccessReview)
|
(4. Handle Request)
|
(5. Return Response)
-
Authentication (AuthN): The main API server authenticates the client (bearer token, client certificate, OIDC, etc.) as it would for any API request.
-
Proxy with Identity: The main API server proxies the request to the extension server, injecting the authenticated user's identity into HTTP headers:
X-Remote-User: The authenticated username.X-Remote-Group: The user's groups (comma-separated).X-Remote-Extra-*: Any extra attributes (e.g., OIDC claims).
The extension server trusts these headers because the connection from the main API server is authenticated with the front-proxy client certificate.
-
Authorization (AuthZ): The extension server can perform its own authorization. Typically, it delegates back to the main API server by sending a
SubjectAccessReviewrequest. This ensures that standard RBAC rules apply to aggregated API resources. -
Request Handling: The extension server processes the request using its own logic and storage.
-
Response: The response flows back through the main API server to the client.
5. Building an Aggregated API Server
Building a custom aggregated API server is a significant undertaking. The Kubernetes project provides the apiserver library (k8s.io/apiserver) as a foundation.
Project Structure
A typical aggregated API server project includes:
my-api-server/
cmd/
apiserver/
main.go # Entry point
pkg/
apis/
mygroup/
types.go # API types (Go structs)
v1alpha1/
types.go # Versioned types
register.go # Register types with the scheme
registry/
myresource/
storage.go # Storage implementation (REST handlers)
apiserver/
apiserver.go # Server configuration and wiring
Key Components
// types.go: Define your API types
type HardwareInventory struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec HardwareInventorySpec `json:"spec"`
Status HardwareInventoryStatus `json:"status"`
}
type HardwareInventorySpec struct {
NodeName string `json:"nodeName"`
}
type HardwareInventoryStatus struct {
GPUs []GPUInfo `json:"gpus"`
CPUInfo CPUTopology `json:"cpuInfo"`
Memory MemoryInfo `json:"memory"`
}
The k8s.io/apiserver library provides:
- TLS configuration and certificate management.
- Request authentication via the front-proxy headers.
- Authorization delegation (SubjectAccessReview).
- API discovery and OpenAPI schema generation.
- Audit logging integration.
- Admission control hooks.
Alternative: Use the apiserver-builder Framework
The apiserver-builder project (from the SIG API Machinery) scaffolds aggregated API server projects, similar to how kubebuilder scaffolds CRD-based operators. It generates boilerplate code for types, storage, and server configuration.
6. Real-World Examples
metrics-server
The most ubiquitous aggregated API. metrics-server registers itself as the handler for metrics.k8s.io/v1beta1 and serves real-time CPU and memory usage data for nodes and pods. The HPA controller queries this API to make scaling decisions.
# These commands work because metrics-server is an aggregated API
kubectl top nodes
kubectl top pods -n production
# The API call behind the scenes
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .
metrics-server does not use etcd. It scrapes the kubelet's /metrics/resource endpoint on each node, aggregates the data in memory, and serves it via the aggregated API.
Custom Cost API
An organization might build an aggregated API that exposes per-namespace or per-pod cost estimates:
# Query cost data from a custom aggregated API
kubectl get --raw "/apis/cost.example.com/v1/namespaces/production/costs" | jq .
# {
# "monthlyCost": "$1,234.56",
# "topWorkloads": [
# {"name": "ml-training", "cost": "$450.00", "resource": "GPU"},
# {"name": "database", "cost": "$200.00", "resource": "Memory"}
# ]
# }
This API queries a cloud billing API and a Prometheus instance, computes cost attribution, and returns the result -- none of which is stored in etcd.
APIService Registration for a Custom API
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1.cost.example.com
spec:
group: cost.example.com
version: v1
service:
name: cost-api-server
namespace: kube-system
port: 443
groupPriorityMinimum: 1000
versionPriority: 15
caBundle: LS0tLS1CRUdJTi... # Base64-encoded CA certificate
---
apiVersion: v1
kind: Service
metadata:
name: cost-api-server
namespace: kube-system
spec:
selector:
app: cost-api-server
ports:
- port: 443
targetPort: 8443
protocol: TCP
7. API Discovery and kubectl Integration
Once an aggregated API is registered, it automatically appears in:
kubectl api-resources-- lists the resources served by the extension server.kubectl api-versions-- lists the API group and version.kubectl get <resource>-- works natively with standard kubectl commands.kubectl explain <resource>-- shows the resource schema (if OpenAPI spec is provided).
The main API server merges the extension server's OpenAPI schema into its own, providing a unified API discovery experience.
8. Common Pitfalls
-
APIService AVAILABLE=False. If the extension server pods are not running, crashing, or unreachable (network policy, wrong Service selector), the APIService will show
AVAILABLE=False. All requests to that API group will fail. This can cascade -- for example, ifmetrics-serveris down, HPA stops working. -
TLS certificate errors. The main API server must trust the extension server's TLS certificate. If the
caBundlein the APIService is wrong or expired, the proxy connection fails silently. Always rotate certificates before expiry. -
Missing front-proxy certificates. If the
kube-apiserverdoes not have--requestheader-client-ca-fileconfigured, it cannot authenticate itself to the extension server. The extension server will reject all proxied requests. -
Performance bottleneck. The main API server proxies all requests synchronously. If the extension server is slow (high latency, unresponsive), it can consume API server goroutines and degrade the entire cluster API. Set aggressive timeouts on the extension server.
-
Blocking cluster upgrades. If an APIService is
AVAILABLE=Falseduring a cluster upgrade, the upgrade may hang because the upgrade process checks API health. Ensure extension servers are running and healthy during upgrades, or temporarily remove the APIService. -
Confusing CRD and aggregation approaches. Some teams start with CRDs, realize they need computed responses, and attempt to "bolt on" logic via admission webhooks. This works to a point but becomes fragile. If your API fundamentally returns computed data, start with aggregation.
-
No high availability. Running a single replica of the extension API server creates a single point of failure for that entire API group. Run at least 2 replicas with proper leader election and readiness probes.
9. What's Next?
- CRDs and Operators: If you decided that CRDs are the right choice, learn how to build CRDs with kubebuilder and implement reconciliation controllers.
- Custom Schedulers: Some custom schedulers expose their own aggregated APIs for scheduling analytics and decision explanations. See Custom Schedulers.
- Monitoring: Monitor APIService health with
kubectl get apiservicesand set up alerts forAVAILABLE=Falseconditions. Theapiserver_request_totalmetric (filtered by API group) tracks request volume to aggregated APIs. - Security: Review the RBAC policies for your aggregated API resources. Aggregated APIs support standard Kubernetes RBAC rules -- you can grant access to specific resources and verbs within the custom API group.
- API Design: Follow the Kubernetes API conventions (metadata, spec, status pattern; list types; watch support) to ensure your aggregated API feels native to Kubernetes users.