DaemonSets: Per-Node Agents
- Scope: Ensures exactly one copy of a Pod runs on every (or a subset of) Node(s) in the cluster. You do not set a replica count -- the cluster topology determines how many Pods exist.
- Use Cases: Log collectors (Fluentd, Filebeat), monitoring agents (Prometheus Node Exporter, Datadog), CNI plugins (Calico, Cilium), and storage daemons (Ceph, GlusterFS).
- Automatic Scaling: Automatically adds a Pod when a new node joins the cluster and garbage-collects the Pod when a node is removed.
- Taints and Tolerations: DaemonSet Pods often tolerate taints that would repel normal workloads, allowing them to run on control plane nodes and other special-purpose nodes.
- Update Strategies: Supports
RollingUpdate(default) andOnDeletestrategies for managing rollouts of new Pod templates.
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod.
- As nodes are added to the cluster, Pods are added to them automatically.
- As nodes are removed from the cluster, those Pods are garbage collected.
- Deleting a DaemonSet cleans up all the Pods it created.
Unlike Deployments and ReplicaSets where you specify a replica count, a DaemonSet derives its Pod count from the number of eligible nodes in the cluster.
Use Cases
DaemonSets are the standard mechanism for deploying infrastructure-level agents that must run on every node:
- Cluster Storage Daemons: Running
glusterd,ceph, or CSI node plugins on each node to provide distributed storage. - Log Collection: Running
fluentd,fluent-bit,filebeat, orlogstashon every node to collect container logs and forward them to a central logging backend (Elasticsearch, Loki, Splunk). - Node Monitoring: Running
Prometheus Node Exporter,collectd,Datadog agent, orNew Relic agenton every node to collect system-level metrics (CPU, memory, disk, network). - CNI Plugins: Networking plugins like
Calico,Cilium, orWeave Netrun as DaemonSets to configure pod networking on each node. - Security Agents: Runtime security tools like Falco or Sysdig run as DaemonSets to monitor system calls on every node.
- GPU Device Plugins: NVIDIA device plugins run as DaemonSets on GPU-equipped nodes to expose GPU resources to the kubelet.
How Scheduling Works
The DaemonSet controller operates differently from the standard scheduler:
- The DaemonSet controller watches for node events (node added, node removed, node labels changed).
- For each eligible node, the controller checks whether a Pod matching its selector already exists.
- If no matching Pod exists on an eligible node, the controller creates one. If a Pod exists on an ineligible node, the controller deletes it.
The DaemonSet controller sets the .spec.nodeName field on each Pod it creates, which effectively bypasses the default Kubernetes scheduler. However, since Kubernetes 1.12, DaemonSets use the default scheduler by default (controlled by the ScheduleDaemonSetPods feature gate, which is enabled by default). This means DaemonSet Pods go through the normal scheduling pipeline and respect node affinity, taints, and tolerations.
nodeSelector
You can restrict a DaemonSet to a subset of nodes using nodeSelector:
spec:
template:
spec:
nodeSelector:
disk: ssd
This DaemonSet only runs on nodes labeled disk=ssd. When you add or remove that label from a node, the DaemonSet controller adds or removes the Pod accordingly.
nodeAffinity
For more flexible node selection, use nodeAffinity:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
This ensures the DaemonSet runs only on amd64 and arm64 nodes, which is useful in heterogeneous clusters.
Taints and Tolerations
By default, DaemonSet Pods respect node taints. A tainted node repels Pods that do not have a matching toleration. However, DaemonSet workloads (like log collectors and CNI plugins) typically need to run on every node, including control plane nodes and nodes with special taints.
The DaemonSet controller automatically adds the following tolerations to DaemonSet Pods:
| Toleration Key | Effect | Purpose |
|---|---|---|
node.kubernetes.io/not-ready | NoExecute | Keep running during node problems |
node.kubernetes.io/unreachable | NoExecute | Keep running when node is unreachable |
node.kubernetes.io/disk-pressure | NoSchedule | Run even under disk pressure |
node.kubernetes.io/memory-pressure | NoSchedule | Run even under memory pressure |
node.kubernetes.io/pid-pressure | NoSchedule | Run even under PID pressure |
node.kubernetes.io/unschedulable | NoSchedule | Run even when node is cordoned |
To run a DaemonSet on control plane nodes (which typically have the node-role.kubernetes.io/control-plane:NoSchedule taint), you must explicitly add a toleration:
spec:
template:
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
Full YAML Example: Monitoring Agent DaemonSet
Here is a complete DaemonSet manifest for running a Prometheus Node Exporter on every node:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: node-exporter
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
spec:
hostNetwork: true
hostPID: true
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
containers:
- name: node-exporter
image: prom/node-exporter:v1.7.0
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
mountPropagation: HostToContainer
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
Key observations:
hostNetwork: trueandhostPID: trueallow the exporter to collect host-level metrics directly.- Host paths are mounted read-only for security.
- The toleration allows this DaemonSet to run on control plane nodes.
hostPort: 9100exposes the metrics endpoint directly on the node IP, making it discoverable by Prometheus.- Low resource requests (50m CPU, 64Mi memory) reflect the lightweight nature of monitoring agents.
Update Strategies
DaemonSets support two update strategies via .spec.updateStrategy.type:
RollingUpdate (Default)
When you update the Pod template, the DaemonSet controller terminates old Pods and creates new ones node by node. The maxUnavailable field controls how many nodes can have their DaemonSet Pod down simultaneously during an update:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
With maxUnavailable: 1, only one node at a time loses its DaemonSet Pod during the rollout. You can also use a percentage (e.g., maxUnavailable: "25%") for large clusters where updating one node at a time would take too long.
The maxSurge field (available since Kubernetes 1.22) controls whether a new Pod is created before the old one is deleted, enabling zero-downtime DaemonSet updates:
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
OnDelete
With OnDelete, the DaemonSet controller does not automatically roll out updates. You must manually delete each Pod, and the controller recreates it with the updated template. This is useful when you want to control exactly when each node gets updated, which is common for critical infrastructure like CNI plugins.
spec:
updateStrategy:
type: OnDelete
DaemonSet vs. Deployment
| Feature | DaemonSet | Deployment |
|---|---|---|
| Replica count | One per eligible node (automatic) | Explicitly set via .spec.replicas |
| Scheduling | One Pod per node guaranteed | Scheduler decides placement (may colocate) |
| Node coverage | Runs on every node (or a filtered subset) | No per-node guarantee |
| Scaling | Scales with cluster size | Scales with replica count |
| Use case | Infrastructure agents, node-level daemons | Application workloads, APIs, web servers |
| Host access | Often uses hostNetwork, hostPath | Rarely needs host access |
When to use a DaemonSet: Your workload must run on every node (or a specific subset of nodes), and you need exactly one instance per node. Examples include log collectors, monitoring agents, and networking plugins.
When to use a Deployment: Your workload is an application that should scale based on load, not based on cluster size. The scheduler should decide where Pods land based on available resources.
Common Pitfalls
-
Forgetting tolerations for control plane nodes: If you want full cluster coverage (including control plane nodes), you must add explicit tolerations. Without them, DaemonSet Pods are not scheduled on tainted nodes.
-
Resource contention with host resources: DaemonSet Pods using
hostNetworkorhostPortoccupy actual node ports. If two DaemonSets try to bind the samehostPort, one will fail. Always use unique port numbers. -
Not setting resource requests: If you omit resource requests, the scheduler cannot account for DaemonSet Pods when making placement decisions for other workloads. This can lead to node overcommitment and OOM kills.
-
Ignoring
maxUnavailableduring updates: With the defaultmaxUnavailable: 1, updating a 100-node cluster takes a long time. For non-critical DaemonSets, consider settingmaxUnavailableto a percentage like"10%"to speed up rollouts. -
Using DaemonSets for application workloads: DaemonSets are not a substitute for Deployments. If your workload does not need per-node guarantees, use a Deployment with appropriate resource requests and let the scheduler optimize placement.
-
Forgetting that DaemonSet Pods count toward node resource limits: Every DaemonSet Pod consumes CPU and memory on the node. In large clusters with many DaemonSets, the aggregate overhead can be significant. Account for this when sizing your nodes.
Best Practices
- Set resource requests and limits: Even small agents add up. A monitoring agent using 100Mi of memory across 100 nodes consumes 10Gi of cluster memory. Set accurate requests so the scheduler can account for this overhead.
- Use
readOnlyRootFilesystem: true: For security, mount the container filesystem as read-only and useemptyDirorhostPathvolumes only where writes are needed. - Label your DaemonSet Pods consistently: Use labels like
app,component, andtierso that monitoring and log aggregation pipelines can easily identify DaemonSet workloads. - Use
priorityClassName: Set a high priority (e.g.,system-node-critical) for essential DaemonSets like CNI plugins and log collectors. This ensures they are not evicted when the node is under resource pressure. - Monitor DaemonSet rollout status: Use
kubectl rollout status daemonset/<name>to track update progress. Set up alerts if the number ofdesiredNumberScheduleddoes not matchnumberReadyfor an extended period. - Prefer
maxSurgefor zero-downtime updates: When running critical node agents, usemaxSurge: 1withmaxUnavailable: 0so a new Pod is running before the old one is removed.
What's Next?
- StatefulSets: Learn how StatefulSets provide stable identity and persistent storage for databases and distributed systems.
- ConfigMaps & Secrets: Inject configuration and credentials into your DaemonSet Pods without hardcoding them in the image.
- Scheduling & Affinity: Deep dive into nodeSelector, nodeAffinity, taints, tolerations, and pod topology spread constraints.
- Observability: Learn how to build monitoring and logging stacks that DaemonSets power.
- Pod Security: Understand security contexts, Pod Security Standards, and how to secure DaemonSet Pods that require privileged access.