Skip to main content

Topology Spread Constraints

Key Takeaways for AI & Readers
  • High Availability Enforcement: TopologySpreadConstraints ensure Pods are evenly distributed across different topological domains (nodes, zones, regions) to prevent single points of failure. If an entire availability zone goes down, properly spread workloads continue serving from remaining zones.
  • Minimizing Skew: The maxSkew parameter defines the maximum allowed difference in pod count between the most-populated and least-populated topology domain. A maxSkew of 1 enforces near-perfect balance; higher values allow more flexibility in scheduling.
  • Granular Control with topologyKey: By specifying topologyKey (for example, topology.kubernetes.io/zone for zone-level spreading or kubernetes.io/hostname for node-level spreading), you dictate the axis along which Pods should be distributed.
  • Flexible Scheduling: The whenUnsatisfiable field offers two modes: DoNotSchedule (hard constraint -- pods stay Pending if the spread cannot be satisfied) and ScheduleAnyway (soft constraint -- the scheduler tries its best but schedules the pod regardless). This lets you balance between strict HA and scheduling flexibility.
  • Complement to Pod Anti-Affinity: TopologySpreadConstraints provide more granular control than Pod Anti-Affinity. Anti-affinity is binary (one pod per domain or zero), while topology spread allows multiple pods per domain as long as the skew across domains stays within bounds.

How do you ensure your application stays online when an entire data center or availability zone goes dark? Even if you have 10 replicas, the Kubernetes scheduler might place all of them on the same node or in the same availability zone (AZ). If that zone experiences an outage, all 10 replicas go down simultaneously.

TopologySpreadConstraints allow you to control how Pods are distributed across your cluster's topology, ensuring true high availability by spreading replicas across failure domains.

1. The Problem: Uneven Distribution

Without topology spread constraints, the Kubernetes scheduler optimizes for resource utilization and node fit, not for distribution across failure domains. Consider a cluster with three availability zones:

  • Zone A: 10 nodes, 80% utilized
  • Zone B: 5 nodes, 50% utilized
  • Zone C: 5 nodes, 50% utilized

When you deploy 6 replicas, the scheduler may place 4 in Zone B and 2 in Zone C (because those zones have the most available resources), leaving zero replicas in Zone A. If Zone B fails, you lose 4 of 6 replicas instantly.

The goal of topology spread is to minimize skew -- the difference in pod count between the most-populated and least-populated zones.

Topology Spread

maxSkew: 1 | whenUnsatisfiable: DoNotSchedule
AZ-A
0 Pods
AZ-B
0 Pods
AZ-C
0 Pods
Kubernetes ensures the difference between the minimum and maximum count in each zone is never greater than maxSkew.

2. Core Parameters

topologyKey

The topologyKey is a node label key that defines the topology domains. The scheduler groups nodes by the value of this label and distributes pods across the resulting groups.

Common topology keys:

topologyKeyWhat It Spreads AcrossUse Case
topology.kubernetes.io/zoneAvailability zonesZone-level HA (most common)
topology.kubernetes.io/regionCloud regionsMulti-region spreading
kubernetes.io/hostnameIndividual nodesNode-level spreading (similar to anti-affinity)
Custom label (e.g., rack)Custom domainsRack-aware scheduling in bare-metal clusters

maxSkew

The maximum degree to which pods may be unevenly distributed across topology domains. The skew is calculated as: maxPodCount(domain) - minPodCount(domain).

  • maxSkew: 1: The difference between the most-populated and least-populated domain can be at most 1. This enforces near-perfect balance. With 6 replicas across 3 zones, you get 2-2-2.
  • maxSkew: 2: Allows more imbalance. With 6 replicas across 3 zones, distributions like 3-2-1 are acceptable.
  • maxSkew: 3 or higher: Provides very loose spreading. Useful when you want a soft preference for distribution without strict enforcement.

whenUnsatisfiable

Determines what happens when the scheduler cannot place a pod while satisfying the maxSkew constraint.

  • DoNotSchedule (hard constraint): The pod stays in Pending state until a node becomes available that satisfies the skew constraint. Use this for critical production workloads where HA is non-negotiable.
  • ScheduleAnyway (soft constraint): The scheduler places the pod on the node that minimizes skew, even if the constraint would be violated. Use this when you prefer balanced distribution but cannot afford pods stuck in Pending.

labelSelector

Defines which pods count toward the skew calculation. Only pods matching this selector are considered when computing the current distribution. This is typically set to match the same labels as your Deployment's selector.

3. YAML Examples

Basic Zone-Level Spreading

# Spread pods evenly across availability zones
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
namespace: production
spec:
replicas: 6
selector:
matchLabels:
app: web-frontend
template:
metadata:
labels:
app: web-frontend
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone # Spread across AZs
whenUnsatisfiable: DoNotSchedule # Hard constraint
labelSelector:
matchLabels:
app: web-frontend
containers:
- name: frontend
image: myregistry.io/web-frontend:v3.1.0
resources:
requests:
cpu: "250m"
memory: "256Mi"

With this configuration, the scheduler distributes the 6 replicas as evenly as possible across zones: 2-2-2 in a three-zone cluster.

Combined Node and Zone Spreading

# Spread across both zones AND nodes for maximum distribution
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: production
spec:
replicas: 9
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
spec:
topologySpreadConstraints:
# First constraint: spread across zones
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: payment-service
# Second constraint: spread across nodes within each zone
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway # Soft for node-level
labelSelector:
matchLabels:
app: payment-service
containers:
- name: payment
image: myregistry.io/payment-service:v2.0.0
resources:
requests:
cpu: "500m"
memory: "512Mi"

This configuration first ensures pods are evenly distributed across zones (hard constraint), then tries to spread them across nodes within each zone (soft constraint). The result is maximum distribution: 3 pods per zone, each on a different node if possible.

Soft Spreading with ScheduleAnyway

# Best-effort spreading — prefer balance but don't block scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
name: logging-agent
namespace: monitoring
spec:
replicas: 4
selector:
matchLabels:
app: logging-agent
template:
metadata:
labels:
app: logging-agent
spec:
topologySpreadConstraints:
- maxSkew: 2 # Allow some imbalance
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway # Never block scheduling
labelSelector:
matchLabels:
app: logging-agent
containers:
- name: agent
image: myregistry.io/logging-agent:v1.5.0

4. Multiple Constraints

When multiple topologySpreadConstraints are specified, the scheduler evaluates all of them. A pod is only placed on a node that satisfies all constraints simultaneously (for DoNotSchedule constraints) or that minimizes the total skew (for ScheduleAnyway constraints).

If constraints conflict (for example, zone spreading requires placing the pod in Zone A, but node spreading requires placing it on a node in Zone B), DoNotSchedule constraints take priority and the pod stays Pending until a valid placement exists.

5. Interaction with Affinity Rules

TopologySpreadConstraints and affinity/anti-affinity rules can be used together, but understanding their interaction is essential.

With Node Affinity

Node affinity restricts which nodes are eligible. TopologySpreadConstraints then distribute pods among the eligible nodes.

# Combine node affinity with topology spread
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: ["compute-optimized"] # Only schedule on compute nodes
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: ml-inference

With Pod Anti-Affinity

Pod anti-affinity and topology spread can work together but may conflict. Anti-affinity says "do not place two pods on the same node," while topology spread says "keep zones balanced." If the only node that balances zones already has a pod, anti-affinity blocks placement and the pod stays Pending.

In general, prefer TopologySpreadConstraints over Pod Anti-Affinity for most spreading use cases. TopologySpreadConstraints are more flexible and predictable.

6. Comparison: TopologySpreadConstraints vs. Pod Anti-Affinity

AspectTopologySpreadConstraintsPod Anti-Affinity
GranularityControls the maximum skew (allows multiple pods per domain)Binary: one pod per domain or zero
Multi-domainCan spread across multiple topology levels simultaneouslyRequires separate affinity rules per level
Soft/HardDoNotSchedule (hard) or ScheduleAnyway (soft)required (hard) or preferred (soft)
Scaling behaviorHandles any replica count gracefullyBreaks when replicas > domains (e.g., 5 replicas on 3 nodes)
PerformanceEfficient scheduler evaluationCan be expensive for large clusters
IntroducedKubernetes 1.19 (stable in 1.24)Kubernetes 1.4

Pod Anti-Affinity is still useful when you truly need at most one pod per node (for DaemonSet-like patterns or stateful workloads). For all other spreading use cases, TopologySpreadConstraints are the preferred mechanism.

7. Default Cluster-Level Topology Spread

Starting in Kubernetes 1.24, you can configure default TopologySpreadConstraints at the cluster level via the kube-scheduler configuration. This ensures all workloads get basic spreading even if individual Deployments do not specify constraints.

# kube-scheduler configuration: default topology spread for all pods
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 5
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
defaultingType: List # Use these defaults for pods without constraints

When defaultingType is List, the specified constraints are applied to pods that do not define their own TopologySpreadConstraints. When set to System, Kubernetes applies built-in defaults (spread by zone and hostname with ScheduleAnyway).

8. Real-World HA Patterns

Three-AZ Production Deployment

The most common HA pattern: deploy replicas across three availability zones with strict zone spreading and soft node spreading.

# Production HA: 3 AZs, strict zone balance, soft node balance
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: production
spec:
replicas: 9 # 3 per zone in a 3-zone cluster
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-gateway
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api-gateway
containers:
- name: gateway
image: myregistry.io/api-gateway:v4.0.0
resources:
requests:
cpu: "500m"
memory: "512Mi"
# Ensure pods spread across nodes during disruptions
# PDB ensures at least 6 of 9 replicas are always running
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-gateway-pdb
namespace: production
spec:
minAvailable: 6 # At least 6 of 9 must be running
selector:
matchLabels:
app: api-gateway

Rack-Aware Bare-Metal Spreading

For bare-metal clusters, custom node labels define rack topology:

# Spread across racks in a bare-metal cluster
topologySpreadConstraints:
- maxSkew: 1
topologyKey: rack # Custom label: rack=rack-01, rack-02, etc.
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: database-proxy

9. Version History and Feature Gates

TopologySpreadConstraints were introduced as an alpha feature in Kubernetes 1.16, moved to beta in 1.18, and became generally available in Kubernetes 1.19. The minDomains field (which sets a minimum number of eligible domains) was added in 1.24 (beta) and reached GA in 1.30. The matchLabelKeys field (for automatic label matching) was added in 1.25 (alpha) and reached beta in 1.27, allowing the scheduler to use pod-template-hash automatically without explicit labelSelector configuration.

Common Pitfalls

  1. Uneven zone sizes: If Zone A has 10 nodes and Zone B has 2 nodes, maxSkew: 1 with DoNotSchedule can leave pods Pending because there are not enough nodes in Zone B. Use ScheduleAnyway or ensure zones have roughly equal capacity.
  2. Forgetting the labelSelector: Without labelSelector, the constraint applies to all pods in the namespace. This almost certainly is not what you want and will produce confusing scheduling behavior.
  3. Conflicting constraints: Multiple hard constraints that cannot be simultaneously satisfied result in pods stuck in Pending. Use kubectl describe pod to see scheduler events and identify which constraint is blocking placement.
  4. maxSkew too restrictive during scale-up: With maxSkew: 1 and DoNotSchedule, scaling from 3 to 4 replicas in a 3-zone cluster requires placing the 4th pod in the zone with the fewest pods. If that zone has no available capacity, the pod stays Pending. Use ScheduleAnyway for non-critical workloads.
  5. Not combining with PodDisruptionBudget: Topology spread only controls initial placement. During node drains or voluntary disruptions, pods may be evicted and rescheduled unevenly. PDBs ensure that enough replicas remain available during disruptions to maintain the HA posture.
  6. Ignoring minDomains: Without minDomains, the scheduler counts only domains that already have matching pods. If you have a 3-zone cluster but pods only exist in 2 zones, the third zone is ignored in skew calculations. Setting minDomains: 3 forces the scheduler to consider all zones.

What's Next?

  • Apply TopologySpreadConstraints to your critical production Deployments with maxSkew: 1 across availability zones.
  • Configure cluster-level default constraints in the kube-scheduler configuration to ensure all workloads get basic spreading.
  • Combine topology spread with PodDisruptionBudgets to maintain HA guarantees during node maintenance and voluntary disruptions.
  • Explore the minDomains field (GA in Kubernetes 1.30) to force the scheduler to consider all topology domains.
  • Use the matchLabelKeys field to automatically match pod-template-hash labels, simplifying topology spread for Deployments with rolling updates.
  • Monitor pod distribution with kubectl get pods -o wide and verify that replicas are actually spread across the expected zones and nodes.