Cluster API (CAPI)
- Declarative Cluster Management: Cluster API (CAPI) extends Kubernetes to manage the lifecycle of Kubernetes clusters themselves using Kubernetes-native APIs. You define your desired cluster state in YAML, and CAPI controllers reconcile it into reality.
- Management Cluster Pattern: A dedicated "Management Cluster" runs the CAPI controllers and provisions and operates multiple "Workload Clusters" across various infrastructure providers. The management cluster holds all cluster definitions as Custom Resources.
- Infrastructure as Code for Clusters: CAPI enables GitOps for cluster creation, configuration, and upgrades, standardizing cluster operations across different environments. Every cluster change flows through version-controlled manifests.
- Provider Agnostic: It provides a unified way to manage clusters on diverse platforms (AWS, Azure, GCP, VMware, bare metal) through pluggable infrastructure, bootstrap, and control plane providers.
- ClusterClass Templating: ClusterClass resources let you define reusable cluster templates, so teams can stamp out standardized clusters with a single reference and override only what differs.
Cluster API is a Kubernetes sub-project that brings declarative, Kubernetes-style APIs to cluster creation, configuration, and management. Instead of writing bespoke Terraform modules or clicking through cloud consoles, you define your entire cluster as a set of Kubernetes Custom Resources. A central management cluster watches those resources and drives infrastructure to match.
1. Clusters Managing Clusters
The core idea behind CAPI is elegantly recursive: you use a Kubernetes cluster to manage other Kubernetes clusters. The management cluster runs CAPI controllers that watch for Cluster, Machine, and related resources, then call out to cloud APIs to create the infrastructure described by those resources.
Management Cluster
This pattern means that every cluster in your fleet is represented as a set of objects inside the management cluster's etcd. You can query, diff, audit, and GitOps your entire fleet the same way you manage Deployments and Services.
2. Why Use CAPI?
- Standardization: Use the same YAML structure to create a cluster on AWS, Azure, GCP, vSphere, or bare metal. Only the infrastructure-specific fields change; the core lifecycle semantics remain identical.
- Infrastructure as Code: Because a Cluster is just a Custom Resource, you can store it in Git, review changes in pull requests, and use ArgoCD or Flux to manage your entire fleet of Kubernetes clusters declaratively.
- Automated Upgrades: Upgrading a cluster's Kubernetes version is as simple as updating the
versionfield in the YAML. CAPI handles the rolling replacement of control plane nodes and worker machines one by one, respecting drain and cordon logic. - Self-Healing: If a node becomes unhealthy (detected by MachineHealthCheck resources), CAPI automatically removes it and provisions a replacement, keeping your desired machine count stable.
- Day-2 Operations: Beyond initial provisioning, CAPI manages scaling, node rotation, etcd backup coordination, and certificate renewal through its controller ecosystem.
3. Core CAPI Resources
CAPI introduces several Custom Resource Definitions that model the components of a Kubernetes cluster.
Cluster
The top-level resource representing a single Kubernetes cluster. It references an infrastructure-specific resource (for example, an AWSCluster) and a control plane resource.
Machine
Represents a single node in a cluster. Each Machine maps to a virtual machine or bare-metal server. Machines are typically not created directly; they are managed by higher-level abstractions.
MachineDeployment and MachineSet
These mirror the Deployment/ReplicaSet pattern for pods. A MachineDeployment manages MachineSets, which in turn manage Machines. When you update a MachineDeployment (for example, to change the Kubernetes version or instance type), CAPI performs a rolling replacement by creating a new MachineSet and scaling down the old one.
MachineHealthCheck
Watches Machines for unhealthy conditions (for example, a node that has been NotReady for five minutes). When a Machine is deemed unhealthy, the MachineHealthCheck marks it for remediation, and the owning MachineDeployment replaces it.
# A complete CAPI Cluster definition for AWS
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-us-east
namespace: clusters
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"] # Pod CIDR for the workload cluster
services:
cidrBlocks: ["10.96.0.0/12"] # Service CIDR
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-us-east-cp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
name: production-us-east
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
name: production-us-east
namespace: clusters
spec:
region: us-east-1
sshKeyName: capi-key
network:
vpc:
cidrBlock: "10.0.0.0/16" # VPC CIDR for the cluster
4. Infrastructure Providers
Infrastructure providers are controllers that translate generic CAPI resources into cloud-specific API calls. Each provider follows the CAPI contract, implementing the same interfaces with cloud-native semantics.
| Provider | Code Name | Target Platform |
|---|---|---|
| AWS | CAPA | Amazon EC2, EKS |
| Azure | CAPZ | Azure VMs, AKS |
| GCP | CAPG | Google Compute Engine, GKE |
| vSphere | CAPV | VMware vSphere |
| Metal3 | CAPM3 | Bare-metal via Ironic |
| Docker | CAPD | Local development (Docker containers as "machines") |
You install providers into the management cluster using clusterctl, the CAPI CLI tool. A single management cluster can run multiple providers simultaneously, enabling true multi-cloud fleet management.
5. Bootstrap and Control Plane Providers
Bootstrap Providers
Bootstrap providers generate the cloud-init or ignition configuration that turns a bare VM into a Kubernetes node. The most common is the Kubeadm Bootstrap Provider, which generates kubeadm init and kubeadm join configurations.
Control Plane Providers
Control plane providers manage the lifecycle of the Kubernetes control plane. The KubeadmControlPlane provider handles etcd membership, API server certificates, and rolling upgrades of control plane nodes. It ensures that only one control plane machine is replaced at a time and that etcd quorum is maintained throughout.
# KubeadmControlPlane with upgrade configuration
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: production-us-east-cp
namespace: clusters
spec:
replicas: 3 # 3 control plane nodes for HA
version: v1.29.2 # Target Kubernetes version
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: production-cp-template
kubeadmConfigSpec:
clusterConfiguration:
apiServer:
extraArgs:
audit-log-maxage: "30" # Keep audit logs for 30 days
audit-log-path: /var/log/audit.log
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external # Use external cloud controller
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Create 1 new node before removing old
6. MachineDeployments for Worker Nodes
Worker nodes are managed through MachineDeployments. This mirrors how you manage application pods with Deployments, giving you rollback, scaling, and rolling update capabilities for your cluster's compute layer.
# MachineDeployment for a pool of worker nodes
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: production-us-east-workers
namespace: clusters
spec:
clusterName: production-us-east
replicas: 5 # Desired number of worker nodes
selector:
matchLabels: {}
template:
spec:
clusterName: production-us-east
version: v1.29.2 # Must match control plane version
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: production-workers-config
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: production-workers-template
---
# MachineHealthCheck to auto-replace failed nodes
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: production-us-east-mhc
namespace: clusters
spec:
clusterName: production-us-east
selector:
matchLabels:
cluster.x-k8s.io/deployment-name: production-us-east-workers
unhealthyConditions:
- type: Ready
status: "False"
timeout: 5m # Node unhealthy for 5 minutes triggers replacement
- type: Ready
status: Unknown
timeout: 5m
maxUnhealthy: "40%" # Don't remediate if >40% of nodes are unhealthy
7. ClusterClass: Templated Cluster Creation
ClusterClass, introduced as a core feature in CAPI v1beta1, enables you to define a reusable cluster topology template. Instead of duplicating hundreds of lines of YAML for each new cluster, you define a ClusterClass once and reference it.
# Reference a ClusterClass to create a cluster from a template
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: dev-team-alpha
namespace: clusters
spec:
topology:
class: standard-aws-cluster # Reference to the ClusterClass
version: v1.29.2
controlPlane:
replicas: 1 # Dev clusters get 1 CP node
workers:
machineDeployments:
- class: default-worker
name: md-0
replicas: 2 # Override worker count
Platform teams define the ClusterClass with all the defaults (networking, machine types, security groups). Application teams consume it with just a few lines, overriding only what they need. This separation of concerns is critical for organizations managing dozens or hundreds of clusters.
8. Cluster Upgrades with CAPI
One of CAPI's strongest features is declarative cluster upgrades. To upgrade a cluster from Kubernetes 1.28 to 1.29, you update the version field in the KubeadmControlPlane and MachineDeployment resources. CAPI then orchestrates the upgrade automatically:
- Control plane first: CAPI creates a new control plane machine running 1.29, waits for it to join the cluster and become healthy, then removes an old 1.28 machine. This repeats for each control plane node, maintaining etcd quorum throughout.
- Workers second: CAPI creates a new MachineSet with the updated version, gradually scales it up, and scales down the old MachineSet. Nodes are drained before deletion, so workloads are migrated gracefully.
- Rollback: If the new machines fail health checks, CAPI halts the rollout. You can revert by changing the version field back.
9. Multi-Cluster Management Patterns
In production environments, CAPI is typically combined with GitOps tools:
- ArgoCD + CAPI: Store all Cluster manifests in Git. ArgoCD watches the repository and applies changes to the management cluster. A pull request to change a cluster's version triggers an automated upgrade.
- Fleet management: Use ArgoCD ApplicationSets to generate one Application per cluster, deploying shared infrastructure (monitoring, logging, policy engines) to every workload cluster automatically.
- Management cluster HA: The management cluster itself should be highly available, with its own backups and disaster recovery plan. Some organizations use a "bootstrap" cluster to create the management cluster, then pivot CAPI to manage itself.
Common Pitfalls
- Management cluster as a single point of failure: If your management cluster goes down, you cannot provision or upgrade workload clusters. Ensure it is highly available and backed up. Workload clusters continue to run independently, but lifecycle operations are blocked.
- Provider version mismatches: CAPI core and infrastructure providers must be compatible. Always check the provider compatibility matrix before upgrading.
- Ignoring MachineHealthChecks: Without them, failed nodes sit indefinitely. Always configure MachineHealthChecks with appropriate thresholds and set
maxUnhealthyto prevent cascading remediation during large-scale failures. - Skipping
clusterctl move: When migrating the management cluster, useclusterctl moveto transfer all CAPI objects. Manual copying misses owner references and secrets. - Overly permissive cloud credentials: The management cluster's cloud credentials can create and destroy infrastructure. Follow least-privilege principles and rotate credentials regularly.
What's Next?
- Explore ClusterResourceSets for automatically deploying CNI plugins and other add-ons to new workload clusters.
- Investigate CAPI Operator for managing CAPI provider installations declaratively.
- Learn about cluster autoscaler integration with CAPI for dynamic worker node scaling based on workload demand.
- Study Cluster API Provider Helm (CAAPH) for deploying Helm charts as part of cluster provisioning.
- Review the Cluster API Book for the official documentation and provider guides.