Storage: PVs and PVCs
- Persistence Strategy: PVs and PVCs separate the physical storage (PV) from the user's request for storage (PVC), ensuring data survives container restarts and even Pod rescheduling across nodes.
- The Binding Handshake: A PVC "binds" to a PV only if the PV meets the requested capacity, access mode, and StorageClass requirements. Once bound, the relationship is one-to-one.
- Dynamic Provisioning: StorageClasses automate PV creation, eliminating the need for administrators to manually pre-provision physical disks. This is the standard approach in production.
- Access Modes: Understanding
ReadWriteOnce(single node),ReadOnlyMany(many nodes read-only),ReadWriteMany(shared read-write across nodes), andReadWriteOncePod(single pod) is critical for architectural planning. - Reclaim Policies: When a PVC is deleted, the
Retain,Delete, andRecyclepolicies determine what happens to the underlying PV and its data. - CSI Drivers: The Container Storage Interface is the standard plugin mechanism for connecting external storage systems to Kubernetes.
Containers are ephemeral. When they die, their data dies with them. To save data permanently (like a database), Kubernetes uses PersistentVolumes (PV) and PersistentVolumeClaims (PVC).
The Handshake Visualizer
- PV: A piece of storage in the cluster (Administrator created or dynamically provisioned).
- PVC: A request for storage (User created).
Try requesting storage below. Notice that a Claim will only Bind to a Volume if the Volume is big enough and available.
Key Concepts
- PersistentVolume (PV): A cluster-level resource representing a piece of physical storage. A PV has a lifecycle independent of any Pod. It can be backed by AWS EBS, NFS, a local disk, or any CSI-compatible storage system.
- PersistentVolumeClaim (PVC): A request for storage by a user. It is similar to a Pod -- Pods consume node resources (CPU, memory) and PVCs consume PV resources (capacity, access modes). A PVC binds to exactly one PV.
- StorageClass: A "template" that allows PVs to be dynamically provisioned when a PVC is created. StorageClasses define the provisioner (e.g.,
ebs.csi.aws.com), parameters (e.g., volume type), and reclaim policy.
PersistentVolume (PV) in Detail
A PV is created by an administrator or dynamically by a StorageClass. Here is a full PV manifest for an AWS EBS volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: ebs-pv-01
labels:
type: ebs
environment: production
spec:
capacity:
storage: 50Gi
volumeMode: Filesystem # Filesystem (default) or Block
accessModes:
- ReadWriteOnce # EBS can only attach to one node
persistentVolumeReclaimPolicy: Retain
storageClassName: gp3-encrypted
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0abcdef1234567890
fsType: ext4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
Key fields explained:
capacity.storage: The size of the volume. PVCs request a minimum capacity, and the PV must meet or exceed it.volumeMode: EitherFilesystem(mounted as a directory) orBlock(raw block device). Most workloads useFilesystem.accessModes: What read/write patterns are allowed (see the Access Modes section below).persistentVolumeReclaimPolicy: What happens when the PVC is deleted (see Reclaim Policies below).storageClassName: Links this PV to a StorageClass. A PVC must request the same StorageClass to bind to this PV.nodeAffinity: Constrains which nodes the PV can be accessed from. Required for zone-specific storage like EBS.
Here is an NFS-backed PV for shared file storage:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-shared-assets
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany # NFS supports multi-node access
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-shared
nfs:
server: 10.0.1.50
path: /exports/shared-assets
mountOptions:
- hard
- nfsvers=4.1
PersistentVolumeClaim (PVC) in Detail
A PVC is the user-facing object. Developers create PVCs to request storage without needing to know the underlying infrastructure details.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: databases
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-encrypted
resources:
requests:
storage: 50Gi
selector: # Optional: select a specific PV by label
matchLabels:
type: ebs
environment: production
When this PVC is created, Kubernetes searches for a PV that satisfies all of these conditions:
- The PV has at least
50Giof capacity. - The PV supports
ReadWriteOnceaccess. - The PV has
storageClassName: gp3-encrypted. - The PV's labels match the selector (if specified).
- The PV's status is
Available(not already bound to another PVC).
If no matching PV exists and a StorageClass with dynamic provisioning is configured, Kubernetes automatically creates a new PV.
StorageClass and Dynamic Provisioning
In production, you rarely create PVs manually. Instead, you define StorageClasses and let the cluster create PVs on demand.
Crucial: The Move to CSI (Container Storage Interface)
Older Kubernetes versions used "In-Tree" drivers (e.g., kubernetes.io/aws-ebs). These are deprecated/removed. You must use CSI drivers (e.g., ebs.csi.aws.com).
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com # Always use the CSI driver!
parameters:
type: gp3
encrypted: "true"
iops: "3000"
throughput: "125"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
Key fields:
provisioner: The CSI driver or in-tree plugin responsible for creating the volume. Examples includeebs.csi.aws.com,efs.csi.aws.com,pd.csi.storage.gke.io, anddisk.csi.azure.com.parameters: Provisioner-specific settings. For AWS EBS, this includes volume type, IOPS, throughput, and encryption.reclaimPolicy:Delete(default for dynamic provisioning) orRetain. Controls what happens to the PV when the PVC is deleted.allowVolumeExpansion: Whentrue, PVCs using this class can be resized by editing the PVC'sspec.resources.requests.storagefield.volumeBindingMode: Controls when volume binding and provisioning occurs. This is a critical setting explained below.
Volume Binding Modes
| Mode | Behavior | Use Case |
|---|---|---|
Immediate | PV is provisioned as soon as the PVC is created | Storage that is accessible from any zone |
WaitForFirstConsumer | PV is provisioned only when a Pod using the PVC is scheduled | Zone-specific storage like EBS, GCE PD |
Always use WaitForFirstConsumer for zone-aware storage. If you use Immediate with AWS EBS, the volume might be created in us-east-1a while the Pod gets scheduled to us-east-1b, causing a scheduling failure.
Access Modes
Kubernetes defines four access modes that describe how a volume can be mounted:
| Mode | Abbreviation | Description |
|---|---|---|
ReadWriteOnce | RWO | Mounted as read-write by a single node. Multiple Pods on the same node can share it. |
ReadOnlyMany | ROX | Mounted as read-only by many nodes simultaneously. |
ReadWriteMany | RWX | Mounted as read-write by many nodes simultaneously. |
ReadWriteOncePod | RWOP | Mounted as read-write by a single Pod only. Requires Kubernetes 1.29+ as stable. |
Not all storage backends support all access modes. Block storage (EBS, Azure Disk) typically only supports RWO. File storage (NFS, EFS, Azure Files) supports RWX.
Reclaim Policies
When a PVC is deleted, the reclaim policy on the PV determines what happens next:
| Policy | Behavior | Use Case |
|---|---|---|
Retain | PV is kept with its data intact. The PV status becomes Released and must be manually cleaned up before it can be reused. | Production databases, critical data |
Delete | The PV and its backing storage (e.g., the EBS volume) are both deleted. | Development, ephemeral data |
Recycle | Deprecated. Performs a basic rm -rf /thevolume/* on the volume. Use dynamic provisioning instead. | Legacy systems only |
Block vs. File Storage Deep Dive
Block Storage (ReadWriteOnce)
- Examples: AWS EBS, Azure Disk, GCE Persistent Disk, iSCSI.
- Performance: High throughput, low latency. Ideal for databases (PostgreSQL, MySQL, MongoDB).
- Limitation: Can only be attached to one node at a time. If your Pod moves to a different node, the disk must be detached and re-attached (typically takes 15-30 seconds).
volumeMode: Block: You can expose the raw block device to the Pod (no filesystem). Used by some databases and performance-sensitive applications that manage their own on-disk format.
File Storage (ReadWriteMany)
- Examples: NFS, AWS EFS, Azure Files, GlusterFS, CephFS.
- Performance: Slightly higher latency than block storage due to the network file system protocol overhead.
- Benefit: Can be shared by many Pods on many nodes simultaneously. Perfect for shared assets, content management, or any scenario where multiple Pods need concurrent read-write access.
CSI Drivers
The Container Storage Interface (CSI) is the standard for exposing storage systems to Kubernetes. CSI replaced the older in-tree volume plugins (which required changes to the core Kubernetes codebase).
Popular CSI drivers:
| Provider | CSI Driver | Storage Type |
|---|---|---|
| AWS EBS | ebs.csi.aws.com | Block (RWO) |
| AWS EFS | efs.csi.aws.com | File (RWX) |
| GCP PD | pd.csi.storage.gke.io | Block (RWO) |
| Azure Disk | disk.csi.azure.com | Block (RWO) |
| Azure Files | file.csi.azure.com | File (RWX) |
| Ceph RBD | rbd.csi.ceph.com | Block (RWO) |
| Ceph FS | cephfs.csi.ceph.com | File (RWX) |
CSI drivers are deployed as DaemonSets and Deployments in the cluster and handle volume create, attach, mount, snapshot, and expand operations.
Volume Snapshots
Volume snapshots let you create point-in-time copies of your volumes. This requires a CSI driver that supports snapshots and the VolumeSnapshot CRDs to be installed.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snapshot-2024-01-15
spec:
volumeSnapshotClassName: csi-aws-snapclass
source:
persistentVolumeClaimName: postgres-data
You can then restore a snapshot by creating a new PVC that references it:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-restored
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-encrypted
resources:
requests:
storage: 50Gi
dataSource:
name: postgres-snapshot-2024-01-15
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
Using PVCs in Pods
To use a PVC in a Pod, reference it in the volumes section and mount it into a container:
apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:16
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
subPath: pgdata # Avoid "lost+found" issue
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-data
The subPath field is important for databases like PostgreSQL. Without it, the root of the mounted volume (which may contain a lost+found directory on ext4 filesystems) is used as the data directory, causing initialization failures.
Expanding Volumes
If a StorageClass has allowVolumeExpansion: true, you can resize a PVC by editing its spec.resources.requests.storage:
kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
The underlying CSI driver expands the volume. For file systems, the expansion happens online (no downtime). For block storage, some CSI drivers require the Pod to be restarted for the filesystem resize to take effect.
You cannot shrink a PVC. The new size must always be greater than or equal to the current size.
Binding Rules
- Capacity: The PV must have at least as much storage as the PVC requested.
- Access Modes: The PV must support the access modes requested by the PVC.
- StorageClass: The PV and PVC must have the same
storageClassName. If the PVC omits this field, it uses the cluster's default StorageClass. - Selector: If the PVC has a
selector, only PVs with matching labels are considered. - Status: The PV must be in
Availablestatus (not alreadyBoundorReleased).
Common Pitfalls
-
Zone mismatch with
Immediatebinding: UsingvolumeBindingMode: Immediatewith zone-aware block storage (EBS, GCE PD) can cause the volume to be provisioned in a different zone than where the Pod is scheduled. Always useWaitForFirstConsumer. -
ReleasedPVs cannot be rebound: When a PVC is deleted and the PV hasRetainpolicy, the PV goes toReleasedstatus. You cannot bind a new PVC to aReleasedPV without manually clearing thespec.claimReffield. -
PostgreSQL
lost+foundissue: Mounting a PVC directly as the PostgreSQL data directory fails because the ext4 filesystem'slost+founddirectory interferes with initialization. UsesubPathto mount into a subdirectory. -
Forgetting
allowVolumeExpansion: If you need to resize volumes later and the StorageClass does not haveallowVolumeExpansion: true, the resize request will be rejected. Plan ahead when defining StorageClasses. -
Using
Recyclereclaim policy: TheRecyclepolicy is deprecated. Use dynamic provisioning withDeletepolicy and create new PVCs instead. -
Access mode confusion:
ReadWriteOncemeans one node, not one Pod. Multiple Pods on the same node can mount the same RWO volume. UseReadWriteOncePodif you need exclusive single-Pod access. -
StatefulSet PVC deletion: Deleting a StatefulSet does not delete its PVCs. This is by design to prevent data loss, but it means you must manually clean up PVCs when decommissioning a StatefulSet.
Best Practices
-
Always use dynamic provisioning in production. Pre-provisioned PVs are hard to manage at scale and do not support volume expansion or snapshots.
-
Set a default StorageClass. Mark exactly one StorageClass with
storageclass.kubernetes.io/is-default-class: "true"so PVCs without an explicit class still get provisioned. -
Use
WaitForFirstConsumerfor all zone-aware storage. This ensures the volume is created in the same zone as the consuming Pod. -
Enable
allowVolumeExpansionon all StorageClasses. There is no downside, and it avoids painful migrations when you need more space. -
Use
Retainfor production databases. Accidental PVC deletion should not destroy your data. Pair with regular volume snapshots for backup. -
Set resource requests accurately. Over-provisioning wastes money. Under-provisioning leads to out-of-space errors that can crash your application.
-
Use labels and selectors for static PVs to ensure PVCs bind to the correct volumes, especially in multi-tenant clusters.
-
Monitor PVC usage. Use tools like Prometheus with
kubelet_volume_stats_used_bytesandkubelet_volume_stats_capacity_bytesmetrics to set up alerts before volumes fill up.
What's Next?
- StatefulSets -- Learn how StatefulSets use
volumeClaimTemplatesto automatically create PVCs for stateful workloads like databases. - ConfigMaps & Secrets -- For non-persistent configuration data, ConfigMaps and Secrets can be mounted as volumes without needing PVs.
- Helm -- Use Helm charts to template StorageClass, PV, and PVC definitions across environments.
- Kustomize -- Use Kustomize overlays to customize storage sizes and classes per environment.