Deployments: Managed Updates

Key Takeaways for AI & Readers

Role: A Deployment manages ReplicaSets to provide declarative updates for Pods. It is the most commonly used workload controller in Kubernetes.
Strategies: RollingUpdate (gradual replacement, the default) vs. Recreate (kill all old Pods before creating new ones — causes downtime but avoids version mixing).
Rolling Update Tuning: maxSurge controls how many extra Pods can exist during updates; maxUnavailable controls how many Pods can be unavailable. For zero-downtime, set maxUnavailable: 0, maxSurge: 1.
Rollback: Built-in revision history allows instant rollback to any previous version with kubectl rollout undo.
Advanced Patterns: Blue-green and canary deployments are achievable with multiple Deployments and Service selector manipulation.

1. Deep Dive: RollingUpdate Logic

The default strategy is RollingUpdate. It replaces Pods gradually. You control the speed and safety with two parameters:

`maxSurge` (Default: 25%)

Definition: How many extra pods can be created above the desired replica count.
Example: Replicas=4, maxSurge=25% (1 pod).
Result: During update, you might have up to 5 pods running (4 old + 1 new).
Higher Value: Faster rollout, but consumes more CPU/RAM quota.

`maxUnavailable` (Default: 25%)

Definition: How many pods can be down during the update.
Example: Replicas=4, maxUnavailable=25% (1 pod).
Result: You are guaranteed to have at least 3 pods running at all times.
Zero Value: Setting this to 0 ensures 100% capacity is maintained, but requires maxSurge > 0.

Pro Tip: For critical high-availability apps, set maxUnavailable: 0 and maxSurge: 1. This ensures you never drop below full capacity.

Rolling Update Math: Worked Examples

Understanding the exact numbers helps you predict capacity requirements during rollouts. Kubernetes applies these rounding rules: maxSurge rounds up (ceil) and maxUnavailable rounds down (floor). Absolute values (e.g., maxSurge: 2) need no rounding.

Example 1: Default 25%/25% with 4 replicas

spec:
  replicas: 4
  strategy:
    rollingUpdate:
      maxSurge: 25%          # ceil(4 * 0.25) = ceil(1.0) = 1
      maxUnavailable: 25%    # floor(4 * 0.25) = floor(1.0) = 1

Max total pods during update: 4 + 1 = 5 (desired + maxSurge)
Min available pods during update: 4 - 1 = 3 (desired - maxUnavailable)
Rollout sequence: Kubernetes can kill 1 old Pod and create 1 new Pod simultaneously, keeping between 3 and 5 Pods running at all times. Each new Pod must pass readiness before the next old Pod is terminated.

Example 2: Zero-Downtime (maxUnavailable: 0, maxSurge: 1)

spec:
  replicas: 4
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Max total pods: 4 + 1 = 5
Min available pods: 4 - 0 = 4 (capacity never drops)
Rollout sequence: One new Pod is created. Only after it passes readiness is one old Pod terminated. Then the next new Pod is created. This is the slowest but safest strategy.

Example 3: Fast Rollout (maxUnavailable: 0, maxSurge: 100%)

spec:
  replicas: 4
  strategy:
    rollingUpdate:
      maxSurge: 100%         # ceil(4 * 1.0) = 4
      maxUnavailable: 0

Max total pods: 4 + 4 = 8 (200% capacity)
Min available pods: 4 - 0 = 4
Rollout sequence: All 4 new Pods are created at once. As each passes readiness, a corresponding old Pod is terminated. This is essentially a blue-green deployment using the rolling update mechanism.

FinOps: The Cost of Speed

Strategy	Peak Pods	Extra Compute	Rollout Speed
Default (25%/25%)	5	+25%	Moderate
Zero-downtime (0/1)	5	+25%	Slow
Fast (0/100%)	8	+100%	Fast
Recreate	4	+0%	Fastest (with downtime)

If the Cluster Autoscaler needs to provision new nodes to accommodate the surge, your rollout will stall waiting for node readiness (typically 1-3 minutes on cloud providers). Factor this into your progressDeadlineSeconds.

Revision History Limit

By default, Kubernetes keeps 10 old ReplicaSets (spec.revisionHistoryLimit: 10) so you can roll back to previous versions. Each old ReplicaSet stores the complete Pod template of that revision in etcd, even when scaled to zero replicas.

Setting	Behavior	Best For
`0`	No rollback possible — old ReplicaSets are deleted immediately	Not recommended
`2-5`	Keeps recent history, low etcd footprint	Most teams
`10` (default)	Comfortable history for long release cycles	Teams with infrequent deployments
`50+`	Wastes etcd storage, clutters `kubectl get rs` output	Not recommended

At scale (hundreds of Deployments), high revisionHistoryLimit values contribute to etcd storage bloat. Each old ReplicaSet is a full API object stored in etcd.

GitOps recommendation: If you use ArgoCD or Flux, set revisionHistoryLimit: 2. Rollbacks are performed via git revert, not kubectl rollout undo, so you rarely need the Kubernetes-side revision history.

Progress Deadline Seconds

How long should Kubernetes wait for a Deployment to make progress before marking it as failed?

Default: 600 seconds (10 minutes).
Behavior: If the Deployment makes no progress (no new Pods becoming ready) for this duration, the controller sets the Progressing condition to False with reason ProgressDeadlineExceeded.

# Check the deployment condition message
kubectl get deployment my-app -o jsonpath='{.status.conditions[?(@.type=="Progressing")].message}'

Common causes of a stuck rollout:

ImagePullBackOff: Wrong image tag or missing registry credentials.
Insufficient resources: The cluster does not have enough CPU/memory to schedule the new Pod. Check kubectl describe pod for FailedScheduling events.
Failing readiness probes: The new Pod starts but never passes its readiness check.
ResourceQuota exceeded: The namespace quota forbids creating additional Pods.

CI/CD Integration

kubectl rollout status deployment/my-app exits with code 1 when ProgressDeadlineExceeded is reached. Use this in CI/CD pipelines to automatically fail the deployment step:

kubectl rollout status deployment/my-app --timeout=300s || {
  echo "Deployment failed, triggering rollback"
  kubectl rollout undo deployment/my-app
  exit 1
}

The Hidden Cost of Rolling Updates

When you set maxSurge: 100%, Kubernetes creates a full set of new Pods before deleting the old ones.

Resource Spike: For a brief window, you need 200% capacity (old + new).
Cloud Bill: If your cluster autoscaler spins up new nodes to accommodate this surge, you pay for those extra nodes.
Mitigation: Use a lower maxSurge (e.g., 25%) if you are budget-constrained, at the cost of a slower rollout.

2. Managing Rollouts

Check Status

kubectl rollout status deployment/my-app

Waits until the rollout finishes. Useful in CI/CD scripts!

Pause & Resume

You can pause a rollout to verify a "canary" set of pods before letting it finish.

kubectl rollout pause deployment/my-app
# ... verify the new version ...
kubectl rollout resume deployment/my-app

Rollback (The "Undo" Button)

If you deploy v2 and it's crashing, you can instantly revert.

kubectl rollout undo deployment/my-app

This updates the Deployment to use the previous ReplicaSet revision.

3. Deployment Patterns

Blue/Green (Not native, but possible)

A deployment strategy that ensures zero downtime by running two identical environments, one live ("Blue") and one new ("Green").

Deploy Blue (v1): Start with your initial version.

# blue-v1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
  labels:
    app: my-app
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: v1
  template:
    metadata:
      labels:
        app: my-app
        version: v1
    spec:
      containers:
      - name: my-app
        image: my-repo/my-app:v1.0
        ports:
        - containerPort: 80

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  selector:
    app: my-app
    version: v1 # Initially points to blue
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer # Or ClusterIP/NodePort

Apply: kubectl apply -f blue-v1-deployment.yaml -f service.yaml

Deploy Green (v2): Deploy the new version in parallel. It will not receive traffic yet.

# green-v2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
  labels:
    app: my-app
    version: v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: v2
  template:
    metadata:
      labels:
        app: my-app
        version: v2
    spec:
      containers:
      - name: my-app
        image: my-repo/my-app:v2.0
        ports:
        - containerPort: 80

Apply: kubectl apply -f green-v2-deployment.yaml Wait for my-app-green pods to be healthy.

Switch Traffic: Update the Service selector to point to the new version. This is an instant switch.
```
kubectl patch service my-app-service -p '{"spec":{"selector":{"version":"v2"}}}'
```
Now all traffic goes to my-app-green.
Monitor & Cleanup: If v2 is stable, you can safely delete the my-app-blue deployment. If not, patch the service selector back to version: v1.
```
kubectl delete deployment my-app-blue
```

Canary (Native-ish)

A strategy where a new version (canary) is rolled out to a small subset of users, observed for stability, and then gradually rolled out to the entire user base.

Primary Deployment (v1): Your current stable version.

# primary-v1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-primary
  labels:
    app: my-app
    version: v1 # Primary version
spec:
  replicas: 9
  selector:
    matchLabels:
      app: my-app
      version: v1
  template:
    metadata:
      labels:
        app: my-app
        version: v1
    spec:
      containers:
      - name: my-app
        image: my-repo/my-app:v1.0
        ports:
        - containerPort: 80

Apply: kubectl apply -f primary-v1-deployment.yaml

Canary Deployment (v2): A small deployment of the new version.

# canary-v2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-canary
  labels:
    app: my-app
    version: v2 # Canary version
spec:
  replicas: 1 # Small percentage of traffic
  selector:
    matchLabels:
      app: my-app
      version: v2
  template:
    metadata:
      labels:
        app: my-app
        version: v2
    spec:
      containers:
      - name: my-app
        image: my-repo/my-app:v2.0
        ports:
        - containerPort: 80

Apply: kubectl apply -f canary-v2-deployment.yaml

Service: Both primary and canary deployments are targeted by the same service, which balances traffic between them.

# service.yaml (Ensure this exists and targets 'app: my-app')
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  selector:
    app: my-app # Targets both v1 and v2 deployments
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer # Or ClusterIP/NodePort

With primary at 9 replicas and canary at 1 replica, the Service will naturally distribute traffic approximately 9:1.

Monitor & Scale: Monitor the health and performance of the canary (v2).
- If stable, gradually scale up the canary deployment or perform a rolling update on the primary deployment with the new version and then delete the canary.
- If issues are found, simply scale down or delete the canary deployment.

4. Common Pitfalls

Missing Probes: If you don't have Readiness Probes, Kubernetes will assume the new Pod is "Ready" as soon as the container starts. It will kill the old Pods immediately, potentially causing downtime if your app takes 10s to boot.
Resource Quotas: If your Namespace has a strict quota, a maxSurge update might fail because the cluster forbids creating the extra temporary Pod.
Label Selector Immutability: You cannot change the label selector of an existing Deployment. You must delete and recreate it.
Forgetting revisionHistoryLimit: Leaving too many old ReplicaSets clutters etcd. Set this to a reasonable value (e.g., 5-10).
Not setting progressDeadlineSeconds: Without a deadline, a stuck rollout (ImagePullBackOff, CrashLoopBackOff) will hang indefinitely. CI/CD pipelines won't know the deployment failed.

5. Recreate Strategy

When you need to avoid running two versions simultaneously (e.g., single-writer databases, or apps with incompatible schema migrations), use the Recreate strategy:

spec:
  strategy:
    type: Recreate

Behavior: All existing Pods are killed before new ones are created. This causes downtime but guarantees that only one version runs at a time.

Stateful Workloads & The Database Problem

Rolling updates assume your application is stateless. If your app connects to a SQL database, a rolling update runs v1 and v2 simultaneously.

The Risk: If v2 runs a migration that renames a column, the still-running v1 Pods will crash.
The Solution:
1. Expand: Add the new column (nullable) in migration v1.
2. Deploy: Roll out app v2 that writes to both columns.
3. Contract: Remove the old column in a future deployment.
Alternative: Use initContainers or Kubernetes Jobs to run schema migrations before the application starts, but ensure they are backward-compatible.

6. Hands-On Exercise

# Create a deployment
kubectl create deployment web --image=nginx:1.25-alpine --replicas=3

# Trigger a rolling update
kubectl set image deployment/web nginx=nginx:1.27-alpine

# Watch the rollout
kubectl rollout status deployment/web

# View revision history
kubectl rollout history deployment/web

# Roll back to the previous version
kubectl rollout undo deployment/web

# Roll back to a specific revision
kubectl rollout undo deployment/web --to-revision=1

Interactive: Rolling Update Simulator

Practice rolling updates and rollbacks without a real cluster:

What's Next?

Now that you understand Deployments, explore:

StatefulSets — For workloads that need stable identity and persistent storage
Services — Expose your Deployment to network traffic
Health Checks — Configure probes to ensure safe rollouts
Progressive Delivery — Automated canary deployments with Argo Rollouts

1. Deep Dive: RollingUpdate Logic​

maxSurge (Default: 25%)​

maxUnavailable (Default: 25%)​

Rolling Update Math: Worked Examples​

Revision History Limit​

Progress Deadline Seconds​

The Hidden Cost of Rolling Updates​

2. Managing Rollouts​

Check Status​

Pause & Resume​

Rollback (The "Undo" Button)​

3. Deployment Patterns​

Blue/Green (Not native, but possible)​

Canary (Native-ish)​

4. Common Pitfalls​

5. Recreate Strategy​

Stateful Workloads & The Database Problem​

6. Hands-On Exercise​

Interactive: Rolling Update Simulator​

What's Next?​