The API Aggregation Layer

Key Takeaways for AI & Readers

Beyond CRDs: The API Aggregation Layer extends the Kubernetes API with fully custom API servers that handle their own storage, business logic, and sub-resources -- capabilities that CRDs cannot provide.
Proxy Architecture: The main kube-apiserver acts as a proxy, forwarding requests for specific API groups to extension API servers running as pods in the cluster. Clients interact with a single unified API endpoint.
APIService Resource: The APIService object registers an extension API server with the main API server, specifying which API group and version it handles and where to route requests.
Authentication Flow: The main API server authenticates the client, then forwards the request to the extension server with the user's identity in HTTP headers. The extension server can implement its own authorization logic.
Real-World Examples: metrics-server (the most common aggregated API), the deprecated service-catalog, and custom APIs for hardware inventory, cost management, and multi-cluster federation.
Decision Point: Use CRDs for declarative data stored in etcd; use API Aggregation when you need custom storage backends, computed responses, sub-resource support, or protocol-level control.

Sometimes CRDs (Custom Resource Definitions) are not enough. When you need an API endpoint that computes responses on the fly (like live resource metrics), uses its own storage backend (like a time-series database), or implements complex sub-resources (like exec or logs), you use the API Aggregation Layer.

1. How API Aggregation Works

Aggregator (kube-apiserver)

🚦

💾

Local Registry (etcd)

🛠️

Extension API Server

Metrics-Server / Prometheus

The Aggregation Layer allows you to provide custom APIs that look and act like native Kubernetes objects but are handled by external services.

The API Aggregation Layer extends the Kubernetes API by allowing external API servers to register themselves as handlers for specific API groups. The main kube-apiserver acts as a smart reverse proxy:

A client sends a request to https://<api-server>/apis/metrics.k8s.io/v1beta1/nodes/worker-01.
The main API server checks its list of registered APIService objects.
It finds that metrics.k8s.io/v1beta1 is handled by the metrics-server Service in kube-system.
It proxies the request to the extension API server pod(s) behind that Service.
The extension server processes the request (in this case, returning the node's current CPU and memory usage) and returns the response.
The main API server forwards the response back to the client.

From the client's perspective, the aggregated API is indistinguishable from built-in APIs. It appears in kubectl api-resources, supports kubectl get, and works with client-go and all standard Kubernetes tooling.

Enabling API Aggregation

API aggregation requires the --enable-aggregator-routing flag on the kube-apiserver. In most managed Kubernetes services and kubeadm clusters, this is enabled by default. Additionally, the kube-apiserver needs the following flags configured:

# API server flags for aggregation (typically already set)
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-allowed-names=front-proxy-client
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key

2. CRDs vs. API Aggregation

This is a critical architectural decision. The following comparison covers the dimensions that matter most:

Feature	CRDs	API Aggregation
Complexity	Low -- just a YAML definition	High -- requires building and operating a custom API server
Storage	Uses the cluster's `etcd`	Can use any backend (database, in-memory, external service)
Logic	Declarative; logic lives in controllers	Can implement arbitrary request handling and computed responses
Sub-resources	Limited (`/status`, `/scale` only)	Full control over any sub-resources
Validation	CEL expressions, webhooks	In-code validation (full programmatic control)
Versioning	Conversion webhooks for multi-version	Native multi-version support with in-code conversion
Performance	Bound by etcd read/write throughput	Optimized for specific access patterns
Availability	Tied to etcd and kube-apiserver	Independent failure domain (but must be available for API to work)
Example	Cert-Manager `Certificate`, Argo `Workflow`	`metrics-server`, `service-catalog`

When to Choose CRDs

You want to store declarative configuration (desired state + status).
Your data model fits the Kubernetes resource model (metadata, spec, status).
You are building controllers/operators that reconcile desired state.
You want to leverage existing Kubernetes tooling (kubectl, RBAC, audit logging) without building a custom server.

When to Choose API Aggregation

You need computed responses that do not come from stored data (e.g., real-time metrics, hardware status, cost estimates).
You need a custom storage backend (time-series database, graph database, external SaaS API).
You need custom sub-resources beyond /status and /scale (e.g., /exec, /logs, /proxy).
You need protocol-level control (WebSocket upgrades, streaming responses, custom serialization).
You need high-performance reads that would be bottlenecked by etcd.

3. The APIService Resource

The APIService object is how you register an extension API server with the main kube-apiserver.

# Register the metrics-server as the handler for metrics.k8s.io/v1beta1
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io       # Format: <version>.<group>
spec:
  group: metrics.k8s.io              # API group being served
  version: v1beta1                   # Version being served
  service:
    name: metrics-server             # Service name in the cluster
    namespace: kube-system           # Namespace of the service
    port: 443                        # Port the service listens on
  groupPriorityMinimum: 100         # Priority for API group sorting
  versionPriority: 100              # Priority for version sorting within the group
  insecureSkipTLSVerify: false      # Should be false in production
  caBundle: <base64-encoded-CA>     # CA to verify the extension server's TLS cert

APIService Status

The kube-apiserver periodically probes the extension server's health endpoint. The APIService status reflects whether the extension server is reachable:

kubectl get apiservices v1beta1.metrics.k8s.io
# NAME                       SERVICE                      AVAILABLE   AGE
# v1beta1.metrics.k8s.io     kube-system/metrics-server   True        30d

If AVAILABLE is False, requests to that API group will fail with an error. This is a common troubleshooting signal: if kubectl top nodes returns an error, check the APIService status first.

4. Authentication and Authorization Flow

The authentication and authorization flow for aggregated APIs involves coordination between the main API server and the extension server.

Client ──(1. AuthN)──> kube-apiserver ──(2. Proxy)──> Extension API Server
                                                           |
                                                    (3. AuthZ: SubjectAccessReview)
                                                           |
                                                    (4. Handle Request)
                                                           |
                                                    (5. Return Response)

Authentication (AuthN): The main API server authenticates the client (bearer token, client certificate, OIDC, etc.) as it would for any API request.
Proxy with Identity: The main API server proxies the request to the extension server, injecting the authenticated user's identity into HTTP headers:
- X-Remote-User: The authenticated username.
- X-Remote-Group: The user's groups (comma-separated).
- X-Remote-Extra-*: Any extra attributes (e.g., OIDC claims).
The extension server trusts these headers because the connection from the main API server is authenticated with the front-proxy client certificate.
Authorization (AuthZ): The extension server can perform its own authorization. Typically, it delegates back to the main API server by sending a SubjectAccessReview request. This ensures that standard RBAC rules apply to aggregated API resources.
Request Handling: The extension server processes the request using its own logic and storage.
Response: The response flows back through the main API server to the client.

5. Building an Aggregated API Server

Building a custom aggregated API server is a significant undertaking. The Kubernetes project provides the apiserver library (k8s.io/apiserver) as a foundation.

Project Structure

A typical aggregated API server project includes:

my-api-server/
  cmd/
    apiserver/
      main.go                  # Entry point
  pkg/
    apis/
      mygroup/
        types.go               # API types (Go structs)
        v1alpha1/
          types.go             # Versioned types
          register.go          # Register types with the scheme
    registry/
      myresource/
        storage.go             # Storage implementation (REST handlers)
    apiserver/
      apiserver.go             # Server configuration and wiring

Key Components

// types.go: Define your API types
type HardwareInventory struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   HardwareInventorySpec   `json:"spec"`
    Status HardwareInventoryStatus `json:"status"`
}

type HardwareInventorySpec struct {
    NodeName string `json:"nodeName"`
}

type HardwareInventoryStatus struct {
    GPUs    []GPUInfo    `json:"gpus"`
    CPUInfo CPUTopology  `json:"cpuInfo"`
    Memory  MemoryInfo   `json:"memory"`
}

The k8s.io/apiserver library provides:

TLS configuration and certificate management.
Request authentication via the front-proxy headers.
Authorization delegation (SubjectAccessReview).
API discovery and OpenAPI schema generation.
Audit logging integration.
Admission control hooks.

Alternative: Use the apiserver-builder Framework

The apiserver-builder project (from the SIG API Machinery) scaffolds aggregated API server projects, similar to how kubebuilder scaffolds CRD-based operators. It generates boilerplate code for types, storage, and server configuration.

6. Real-World Examples

metrics-server

The most ubiquitous aggregated API. metrics-server registers itself as the handler for metrics.k8s.io/v1beta1 and serves real-time CPU and memory usage data for nodes and pods. The HPA controller queries this API to make scaling decisions.

# These commands work because metrics-server is an aggregated API
kubectl top nodes
kubectl top pods -n production

# The API call behind the scenes
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .

metrics-server does not use etcd. It scrapes the kubelet's /metrics/resource endpoint on each node, aggregates the data in memory, and serves it via the aggregated API.

Custom Cost API

An organization might build an aggregated API that exposes per-namespace or per-pod cost estimates:

# Query cost data from a custom aggregated API
kubectl get --raw "/apis/cost.example.com/v1/namespaces/production/costs" | jq .
# {
#   "monthlyCost": "$1,234.56",
#   "topWorkloads": [
#     {"name": "ml-training", "cost": "$450.00", "resource": "GPU"},
#     {"name": "database", "cost": "$200.00", "resource": "Memory"}
#   ]
# }

This API queries a cloud billing API and a Prometheus instance, computes cost attribution, and returns the result -- none of which is stored in etcd.

APIService Registration for a Custom API

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1.cost.example.com
spec:
  group: cost.example.com
  version: v1
  service:
    name: cost-api-server
    namespace: kube-system
    port: 443
  groupPriorityMinimum: 1000
  versionPriority: 15
  caBundle: LS0tLS1CRUdJTi...       # Base64-encoded CA certificate
---
apiVersion: v1
kind: Service
metadata:
  name: cost-api-server
  namespace: kube-system
spec:
  selector:
    app: cost-api-server
  ports:
    - port: 443
      targetPort: 8443
      protocol: TCP

7. API Discovery and kubectl Integration

Once an aggregated API is registered, it automatically appears in:

kubectl api-resources -- lists the resources served by the extension server.
kubectl api-versions -- lists the API group and version.
kubectl get <resource> -- works natively with standard kubectl commands.
kubectl explain <resource> -- shows the resource schema (if OpenAPI spec is provided).

The main API server merges the extension server's OpenAPI schema into its own, providing a unified API discovery experience.

8. Common Pitfalls

APIService AVAILABLE=False. If the extension server pods are not running, crashing, or unreachable (network policy, wrong Service selector), the APIService will show AVAILABLE=False. All requests to that API group will fail. This can cascade -- for example, if metrics-server is down, HPA stops working.
TLS certificate errors. The main API server must trust the extension server's TLS certificate. If the caBundle in the APIService is wrong or expired, the proxy connection fails silently. Always rotate certificates before expiry.
Missing front-proxy certificates. If the kube-apiserver does not have --requestheader-client-ca-file configured, it cannot authenticate itself to the extension server. The extension server will reject all proxied requests.
Performance bottleneck. The main API server proxies all requests synchronously. If the extension server is slow (high latency, unresponsive), it can consume API server goroutines and degrade the entire cluster API. Set aggressive timeouts on the extension server.
Blocking cluster upgrades. If an APIService is AVAILABLE=False during a cluster upgrade, the upgrade may hang because the upgrade process checks API health. Ensure extension servers are running and healthy during upgrades, or temporarily remove the APIService.
Confusing CRD and aggregation approaches. Some teams start with CRDs, realize they need computed responses, and attempt to "bolt on" logic via admission webhooks. This works to a point but becomes fragile. If your API fundamentally returns computed data, start with aggregation.
No high availability. Running a single replica of the extension API server creates a single point of failure for that entire API group. Run at least 2 replicas with proper leader election and readiness probes.

9. What's Next?

CRDs and Operators: If you decided that CRDs are the right choice, learn how to build CRDs with kubebuilder and implement reconciliation controllers.
Custom Schedulers: Some custom schedulers expose their own aggregated APIs for scheduling analytics and decision explanations. See Custom Schedulers.
Monitoring: Monitor APIService health with kubectl get apiservices and set up alerts for AVAILABLE=False conditions. The apiserver_request_total metric (filtered by API group) tracks request volume to aggregated APIs.
Security: Review the RBAC policies for your aggregated API resources. Aggregated APIs support standard Kubernetes RBAC rules -- you can grant access to specific resources and verbs within the custom API group.
API Design: Follow the Kubernetes API conventions (metadata, spec, status pattern; list types; watch support) to ensure your aggregated API feels native to Kubernetes users.

1. How API Aggregation Works​

Enabling API Aggregation​

2. CRDs vs. API Aggregation​

When to Choose CRDs​

When to Choose API Aggregation​

3. The APIService Resource​

APIService Status​

4. Authentication and Authorization Flow​

5. Building an Aggregated API Server​

Project Structure​

Key Components​

Alternative: Use the apiserver-builder Framework​

6. Real-World Examples​

metrics-server​

Custom Cost API​

APIService Registration for a Custom API​

7. API Discovery and kubectl Integration​

8. Common Pitfalls​

9. What's Next?​