Deployments: Managed Updates
- Role: A Deployment manages ReplicaSets to provide declarative updates for Pods. It is the most commonly used workload controller in Kubernetes.
- Strategies:
RollingUpdate(gradual replacement, the default) vs.Recreate(kill all old Pods before creating new ones — causes downtime but avoids version mixing). - Rolling Update Tuning:
maxSurgecontrols how many extra Pods can exist during updates;maxUnavailablecontrols how many Pods can be unavailable. For zero-downtime, setmaxUnavailable: 0, maxSurge: 1. - Rollback: Built-in revision history allows instant rollback to any previous version with
kubectl rollout undo. - Advanced Patterns: Blue-green and canary deployments are achievable with multiple Deployments and Service selector manipulation.
1. Deep Dive: RollingUpdate Logic
The default strategy is RollingUpdate. It replaces Pods gradually. You control the speed and safety with two parameters:
maxSurge (Default: 25%)
- Definition: How many extra pods can be created above the desired replica count.
- Example: Replicas=4, maxSurge=25% (1 pod).
- Result: During update, you might have up to 5 pods running (4 old + 1 new).
- Higher Value: Faster rollout, but consumes more CPU/RAM quota.
maxUnavailable (Default: 25%)
- Definition: How many pods can be down during the update.
- Example: Replicas=4, maxUnavailable=25% (1 pod).
- Result: You are guaranteed to have at least 3 pods running at all times.
- Zero Value: Setting this to 0 ensures 100% capacity is maintained, but requires
maxSurge > 0.
Pro Tip: For critical high-availability apps, set maxUnavailable: 0 and maxSurge: 1. This ensures you never drop below full capacity.
Rolling Update Math: Worked Examples
Understanding the exact numbers helps you predict capacity requirements during rollouts. Kubernetes applies these rounding rules: maxSurge rounds up (ceil) and maxUnavailable rounds down (floor). Absolute values (e.g., maxSurge: 2) need no rounding.
Example 1: Default 25%/25% with 4 replicas
spec:
replicas: 4
strategy:
rollingUpdate:
maxSurge: 25% # ceil(4 * 0.25) = ceil(1.0) = 1
maxUnavailable: 25% # floor(4 * 0.25) = floor(1.0) = 1
- Max total pods during update: 4 + 1 = 5 (desired + maxSurge)
- Min available pods during update: 4 - 1 = 3 (desired - maxUnavailable)
- Rollout sequence: Kubernetes can kill 1 old Pod and create 1 new Pod simultaneously, keeping between 3 and 5 Pods running at all times. Each new Pod must pass readiness before the next old Pod is terminated.
Example 2: Zero-Downtime (maxUnavailable: 0, maxSurge: 1)
spec:
replicas: 4
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
- Max total pods: 4 + 1 = 5
- Min available pods: 4 - 0 = 4 (capacity never drops)
- Rollout sequence: One new Pod is created. Only after it passes readiness is one old Pod terminated. Then the next new Pod is created. This is the slowest but safest strategy.
Example 3: Fast Rollout (maxUnavailable: 0, maxSurge: 100%)
spec:
replicas: 4
strategy:
rollingUpdate:
maxSurge: 100% # ceil(4 * 1.0) = 4
maxUnavailable: 0
- Max total pods: 4 + 4 = 8 (200% capacity)
- Min available pods: 4 - 0 = 4
- Rollout sequence: All 4 new Pods are created at once. As each passes readiness, a corresponding old Pod is terminated. This is essentially a blue-green deployment using the rolling update mechanism.
| Strategy | Peak Pods | Extra Compute | Rollout Speed |
|---|---|---|---|
| Default (25%/25%) | 5 | +25% | Moderate |
| Zero-downtime (0/1) | 5 | +25% | Slow |
| Fast (0/100%) | 8 | +100% | Fast |
| Recreate | 4 | +0% | Fastest (with downtime) |
If the Cluster Autoscaler needs to provision new nodes to accommodate the surge, your rollout will stall waiting for node readiness (typically 1-3 minutes on cloud providers). Factor this into your progressDeadlineSeconds.
Revision History Limit
By default, Kubernetes keeps 10 old ReplicaSets (spec.revisionHistoryLimit: 10) so you can roll back to previous versions. Each old ReplicaSet stores the complete Pod template of that revision in etcd, even when scaled to zero replicas.
| Setting | Behavior | Best For |
|---|---|---|
0 | No rollback possible — old ReplicaSets are deleted immediately | Not recommended |
2-5 | Keeps recent history, low etcd footprint | Most teams |
10 (default) | Comfortable history for long release cycles | Teams with infrequent deployments |
50+ | Wastes etcd storage, clutters kubectl get rs output | Not recommended |
At scale (hundreds of Deployments), high revisionHistoryLimit values contribute to etcd storage bloat. Each old ReplicaSet is a full API object stored in etcd.
GitOps recommendation: If you use ArgoCD or Flux, set revisionHistoryLimit: 2. Rollbacks are performed via git revert, not kubectl rollout undo, so you rarely need the Kubernetes-side revision history.
Progress Deadline Seconds
How long should Kubernetes wait for a Deployment to make progress before marking it as failed?
- Default: 600 seconds (10 minutes).
- Behavior: If the Deployment makes no progress (no new Pods becoming ready) for this duration, the controller sets the
Progressingcondition toFalsewith reasonProgressDeadlineExceeded.
# Check the deployment condition message
kubectl get deployment my-app -o jsonpath='{.status.conditions[?(@.type=="Progressing")].message}'
Common causes of a stuck rollout:
- ImagePullBackOff: Wrong image tag or missing registry credentials.
- Insufficient resources: The cluster does not have enough CPU/memory to schedule the new Pod. Check
kubectl describe podforFailedSchedulingevents. - Failing readiness probes: The new Pod starts but never passes its readiness check.
- ResourceQuota exceeded: The namespace quota forbids creating additional Pods.
kubectl rollout status deployment/my-app exits with code 1 when ProgressDeadlineExceeded is reached. Use this in CI/CD pipelines to automatically fail the deployment step:
kubectl rollout status deployment/my-app --timeout=300s || {
echo "Deployment failed, triggering rollback"
kubectl rollout undo deployment/my-app
exit 1
}
The Hidden Cost of Rolling Updates
When you set maxSurge: 100%, Kubernetes creates a full set of new Pods before deleting the old ones.
- Resource Spike: For a brief window, you need 200% capacity (old + new).
- Cloud Bill: If your cluster autoscaler spins up new nodes to accommodate this surge, you pay for those extra nodes.
- Mitigation: Use a lower
maxSurge(e.g., 25%) if you are budget-constrained, at the cost of a slower rollout.
2. Managing Rollouts
Check Status
kubectl rollout status deployment/my-app
Waits until the rollout finishes. Useful in CI/CD scripts!
Pause & Resume
You can pause a rollout to verify a "canary" set of pods before letting it finish.
kubectl rollout pause deployment/my-app
# ... verify the new version ...
kubectl rollout resume deployment/my-app
Rollback (The "Undo" Button)
If you deploy v2 and it's crashing, you can instantly revert.
kubectl rollout undo deployment/my-app
This updates the Deployment to use the previous ReplicaSet revision.
3. Deployment Patterns
Blue/Green (Not native, but possible)
A deployment strategy that ensures zero downtime by running two identical environments, one live ("Blue") and one new ("Green").
-
Deploy Blue (v1): Start with your initial version.
# blue-v1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-blue
labels:
app: my-app
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: my-app
version: v1
template:
metadata:
labels:
app: my-app
version: v1
spec:
containers:
- name: my-app
image: my-repo/my-app:v1.0
ports:
- containerPort: 80# service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-app-service
spec:
selector:
app: my-app
version: v1 # Initially points to blue
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer # Or ClusterIP/NodePortApply:
kubectl apply -f blue-v1-deployment.yaml -f service.yaml -
Deploy Green (v2): Deploy the new version in parallel. It will not receive traffic yet.
# green-v2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-green
labels:
app: my-app
version: v2
spec:
replicas: 3
selector:
matchLabels:
app: my-app
version: v2
template:
metadata:
labels:
app: my-app
version: v2
spec:
containers:
- name: my-app
image: my-repo/my-app:v2.0
ports:
- containerPort: 80Apply:
kubectl apply -f green-v2-deployment.yamlWait formy-app-greenpods to be healthy. -
Switch Traffic: Update the Service selector to point to the new version. This is an instant switch.
kubectl patch service my-app-service -p '{"spec":{"selector":{"version":"v2"}}}'Now all traffic goes to
my-app-green. -
Monitor & Cleanup: If v2 is stable, you can safely delete the
my-app-bluedeployment. If not, patch the service selector back toversion: v1.kubectl delete deployment my-app-blue
Canary (Native-ish)
A strategy where a new version (canary) is rolled out to a small subset of users, observed for stability, and then gradually rolled out to the entire user base.
-
Primary Deployment (v1): Your current stable version.
# primary-v1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-primary
labels:
app: my-app
version: v1 # Primary version
spec:
replicas: 9
selector:
matchLabels:
app: my-app
version: v1
template:
metadata:
labels:
app: my-app
version: v1
spec:
containers:
- name: my-app
image: my-repo/my-app:v1.0
ports:
- containerPort: 80Apply:
kubectl apply -f primary-v1-deployment.yaml -
Canary Deployment (v2): A small deployment of the new version.
# canary-v2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-canary
labels:
app: my-app
version: v2 # Canary version
spec:
replicas: 1 # Small percentage of traffic
selector:
matchLabels:
app: my-app
version: v2
template:
metadata:
labels:
app: my-app
version: v2
spec:
containers:
- name: my-app
image: my-repo/my-app:v2.0
ports:
- containerPort: 80Apply:
kubectl apply -f canary-v2-deployment.yaml -
Service: Both primary and canary deployments are targeted by the same service, which balances traffic between them.
# service.yaml (Ensure this exists and targets 'app: my-app')
apiVersion: v1
kind: Service
metadata:
name: my-app-service
spec:
selector:
app: my-app # Targets both v1 and v2 deployments
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer # Or ClusterIP/NodePortWith
primaryat 9 replicas andcanaryat 1 replica, the Service will naturally distribute traffic approximately 9:1. -
Monitor & Scale: Monitor the health and performance of the canary (v2).
- If stable, gradually scale up the canary deployment or perform a rolling update on the primary deployment with the new version and then delete the canary.
- If issues are found, simply scale down or delete the canary deployment.
4. Common Pitfalls
- Missing Probes: If you don't have Readiness Probes, Kubernetes will assume the new Pod is "Ready" as soon as the container starts. It will kill the old Pods immediately, potentially causing downtime if your app takes 10s to boot.
- Resource Quotas: If your Namespace has a strict quota, a
maxSurgeupdate might fail because the cluster forbids creating the extra temporary Pod. - Label Selector Immutability: You cannot change the label selector of an existing Deployment. You must delete and recreate it.
- Forgetting
revisionHistoryLimit: Leaving too many old ReplicaSets clutters etcd. Set this to a reasonable value (e.g., 5-10). - Not setting
progressDeadlineSeconds: Without a deadline, a stuck rollout (ImagePullBackOff, CrashLoopBackOff) will hang indefinitely. CI/CD pipelines won't know the deployment failed.
5. Recreate Strategy
When you need to avoid running two versions simultaneously (e.g., single-writer databases, or apps with incompatible schema migrations), use the Recreate strategy:
spec:
strategy:
type: Recreate
Behavior: All existing Pods are killed before new ones are created. This causes downtime but guarantees that only one version runs at a time.
Stateful Workloads & The Database Problem
Rolling updates assume your application is stateless. If your app connects to a SQL database, a rolling update runs v1 and v2 simultaneously.
- The Risk: If v2 runs a migration that renames a column, the still-running v1 Pods will crash.
- The Solution:
- Expand: Add the new column (nullable) in migration v1.
- Deploy: Roll out app v2 that writes to both columns.
- Contract: Remove the old column in a future deployment.
- Alternative: Use
initContainersor Kubernetes Jobs to run schema migrations before the application starts, but ensure they are backward-compatible.
6. Hands-On Exercise
# Create a deployment
kubectl create deployment web --image=nginx:1.25-alpine --replicas=3
# Trigger a rolling update
kubectl set image deployment/web nginx=nginx:1.27-alpine
# Watch the rollout
kubectl rollout status deployment/web
# View revision history
kubectl rollout history deployment/web
# Roll back to the previous version
kubectl rollout undo deployment/web
# Roll back to a specific revision
kubectl rollout undo deployment/web --to-revision=1
Interactive: Rolling Update Simulator
Practice rolling updates and rollbacks without a real cluster:
What's Next?
Now that you understand Deployments, explore:
- StatefulSets — For workloads that need stable identity and persistent storage
- Services — Expose your Deployment to network traffic
- Health Checks — Configure probes to ensure safe rollouts
- Progressive Delivery — Automated canary deployments with Argo Rollouts