ReplicaSets: The Reconcilers
- Self-Healing: ReplicaSets maintain the desired count of Pods by constantly reconciling the actual state with the specified configuration. If a Pod crashes or is deleted, the ReplicaSet controller creates a replacement within seconds.
- Label-Based Tracking: They identify "their" Pods using label selectors rather than direct ownership. Any Pod matching the selector and without an existing controller owner is eligible for adoption.
- Deployment-Managed: In modern Kubernetes, you rarely manage ReplicaSets directly; instead, you manage Deployments which orchestrate ReplicaSets to enable rolling updates, rollbacks, and revision history.
- Owner References: Each Pod created by a ReplicaSet carries an
ownerReferencesmetadata field that links it back to the controlling ReplicaSet, enabling garbage collection when the ReplicaSet is deleted.
While you usually manage Deployments, under the hood, the Deployment creates and manages ReplicaSets. A ReplicaSet is the component that actually ensures the "Desired Number" of Pods is running at any given time. Understanding how ReplicaSets work gives you deep insight into Kubernetes self-healing and the entire controller pattern.
1. How the Reconciliation Loop Works
The ReplicaSet controller runs a continuous reconciliation loop inside the kube-controller-manager. On every iteration, the controller:
- Observes the current state by listing all Pods that match its label selector.
- Compares the count of running Pods against the
.spec.replicasfield (the desired state). - Acts to close the gap: it creates new Pods if there are too few, or deletes excess Pods if there are too many.
This loop runs every time a relevant event occurs (Pod creation, deletion, status change) and also on a periodic resync interval. The result is that the cluster continuously converges toward the desired state without manual intervention.
What Happens When a Pod Dies?
When a node fails or a Pod crashes, the kubelet stops reporting the Pod as Running. The ReplicaSet controller detects that the actual count has dropped below the desired count and immediately schedules a replacement Pod. The new Pod is assigned to a healthy node by the scheduler. This entire process typically completes in a few seconds.
2. ReplicaSet YAML Example
Here is a complete ReplicaSet manifest:
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
replicas: 3
selector:
matchLabels:
app: guestbook
tier: frontend
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google-samples/gb-frontend:v5
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
Key fields to understand:
| Field | Purpose |
|---|---|
spec.replicas | The desired number of Pod replicas. Defaults to 1 if omitted. |
spec.selector | A label selector that determines which Pods the ReplicaSet manages. Must match .spec.template.metadata.labels. |
spec.template | The Pod template used to create new replicas. This defines containers, volumes, and all Pod-level configuration. |
3. How They Find Pods: Label Selectors
ReplicaSets use label selectors to identify which Pods belong to them. The controller does not track Pods by name or by an internal list. Instead, it queries the Kubernetes API for all Pods whose labels match the selector defined in .spec.selector.
matchLabels vs. matchExpressions
ReplicaSets support two forms of selector:
# Simple equality-based selector
selector:
matchLabels:
app: web
tier: frontend
# Set-based selector for more complex logic
selector:
matchExpressions:
- key: app
operator: In
values: [web, api]
- key: environment
operator: NotIn
values: [staging]
The matchExpressions form supports operators In, NotIn, Exists, and DoesNotExist, giving you more flexible Pod selection.
Adoption and Orphaned Pods
If you manually create a Pod whose labels match a ReplicaSet's selector, the ReplicaSet adopts that Pod. If the ReplicaSet already has enough replicas, it will terminate the excess Pod. This behavior catches many beginners off guard.
Conversely, if you remove the matching labels from a Pod, the ReplicaSet releases (orphans) it. The Pod continues to run but is no longer managed by the ReplicaSet. The ReplicaSet then sees one fewer replica than desired and creates a replacement.
4. Ownership References
Every Pod created by a ReplicaSet carries an ownerReferences field in its metadata:
metadata:
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: frontend
uid: a1b2c3d4-e5f6-7890-abcd-ef1234567890
controller: true
blockOwnerDeletion: true
This reference serves two purposes:
- Garbage Collection: When the ReplicaSet is deleted, the garbage collector automatically deletes all Pods it owns (cascade delete). You can override this with
--cascade=orphaninkubectl deleteto leave Pods running. - Conflict Prevention: The owner reference prevents two controllers from fighting over the same Pod. Only the controller listed as
controller: truemanages the Pod.
5. Scaling a ReplicaSet
You can scale a ReplicaSet imperatively:
# Scale to 5 replicas
kubectl scale replicaset frontend --replicas=5
# Verify the scaling
kubectl get replicaset frontend
Or declaratively by updating the manifest and applying it:
kubectl apply -f frontend-replicaset.yaml
The Horizontal Pod Autoscaler (HPA) can also target a ReplicaSet, although in practice you should point the HPA at the Deployment instead:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: ReplicaSet
name: frontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
6. Why Not Use ReplicaSets Directly?
You should almost always use a Deployment instead of a ReplicaSet. Here is why:
| Capability | ReplicaSet | Deployment |
|---|---|---|
| Maintain N replicas | Yes | Yes (via ReplicaSet) |
| Rolling updates | No | Yes |
| Rollback to previous version | No | Yes (kubectl rollout undo) |
| Revision history | No | Yes (configurable via revisionHistoryLimit) |
| Pause and resume rollouts | No | Yes |
| Declarative update strategy | No | Yes (RollingUpdate, Recreate) |
When a Deployment performs a rolling update, it creates a new ReplicaSet with the updated Pod template and gradually scales it up while scaling down the old ReplicaSet. This is how zero-downtime deployments work. The old ReplicaSet is kept (with 0 replicas) so you can roll back later.
The Deployment-ReplicaSet Relationship
Deployment (nginx-deployment)
├── ReplicaSet (nginx-deployment-6b8f7c5d9) # revision 2 - current (3 replicas)
└── ReplicaSet (nginx-deployment-4a2e1b8c3) # revision 1 - previous (0 replicas)
You can see this relationship by running:
kubectl get replicasets -l app=nginx
Each ReplicaSet corresponds to a unique Pod template version. The Deployment tracks which ReplicaSets represent which revision.
7. Common Pitfalls
-
Mismatched labels: The labels in
.spec.selectormust match the labels in.spec.template.metadata.labels. If they do not match, the API server rejects the manifest. This is a validation rule, not a runtime error. -
Overlapping selectors: If two ReplicaSets have the same selector, they will fight over the same Pods. Each controller sees the Pods as its own and will try to reconcile independently. This leads to constant Pod creation and deletion. Always use unique label combinations per ReplicaSet.
-
Editing a ReplicaSet template does not update existing Pods: Changing the Pod template in a ReplicaSet only affects newly created Pods. Existing Pods continue running with the old configuration. This is the primary reason Deployments exist -- they automate the process of rolling out a new ReplicaSet when the template changes.
-
Accidentally adopting Pods: If you create standalone Pods with labels that match an existing ReplicaSet's selector, the ReplicaSet adopts them and may terminate them if the replica count is already satisfied.
-
Deleting a ReplicaSet without understanding cascade behavior: By default, deleting a ReplicaSet deletes all its Pods. Use
kubectl delete rs frontend --cascade=orphanif you want the Pods to survive.
8. Best Practices
- Always use Deployments: Create Deployments and let them manage ReplicaSets for you. This gives you rolling updates, rollbacks, and revision history out of the box.
- Set resource requests and limits: Define
resources.requestsandresources.limitsin your Pod template so the scheduler can make intelligent placement decisions and the kubelet can enforce resource boundaries. - Use Pod Disruption Budgets (PDBs): Pair your ReplicaSets (via Deployments) with PDBs to ensure that voluntary disruptions like node drains do not take down too many replicas simultaneously.
- Label everything consistently: Use a standard labeling scheme (
app,tier,version,environment) across all workloads to avoid selector collisions and simplify observability. - Monitor replica counts: Set up alerts for when the number of ready replicas diverges from the desired count for an extended period. This can indicate scheduling failures, resource exhaustion, or image pull errors.
What's Next?
- StatefulSets: Learn how Kubernetes handles workloads that need stable identity and persistent storage, such as databases.
- DaemonSets: Understand how to run exactly one Pod per node for infrastructure agents like log collectors and monitoring daemons.
- Health Checks: Configure liveness and readiness probes so your ReplicaSets can accurately determine Pod health.
- Resource Management: Learn how to set CPU and memory requests and limits in your Pod templates.