Skip to main content

Storage: PVs and PVCs

Key Takeaways for AI & Readers
  • Persistence Strategy: PVs and PVCs separate the physical storage (PV) from the user's request for storage (PVC), ensuring data survives container restarts and even Pod rescheduling across nodes.
  • The Binding Handshake: A PVC "binds" to a PV only if the PV meets the requested capacity, access mode, and StorageClass requirements. Once bound, the relationship is one-to-one.
  • Dynamic Provisioning: StorageClasses automate PV creation, eliminating the need for administrators to manually pre-provision physical disks. This is the standard approach in production.
  • Access Modes: Understanding ReadWriteOnce (single node), ReadOnlyMany (many nodes read-only), ReadWriteMany (shared read-write across nodes), and ReadWriteOncePod (single pod) is critical for architectural planning.
  • Reclaim Policies: When a PVC is deleted, the Retain, Delete, and Recycle policies determine what happens to the underlying PV and its data.
  • CSI Drivers: The Container Storage Interface is the standard plugin mechanism for connecting external storage systems to Kubernetes.

Containers are ephemeral. When they die, their data dies with them. To save data permanently (like a database), Kubernetes uses PersistentVolumes (PV) and PersistentVolumeClaims (PVC).

The Handshake Visualizer

  • PV: A piece of storage in the cluster (Administrator created or dynamically provisioned).
  • PVC: A request for storage (User created).

Try requesting storage below. Notice that a Claim will only Bind to a Volume if the Volume is big enough and available.

Loading...

Key Concepts

  1. PersistentVolume (PV): A cluster-level resource representing a piece of physical storage. A PV has a lifecycle independent of any Pod. It can be backed by AWS EBS, NFS, a local disk, or any CSI-compatible storage system.
  2. PersistentVolumeClaim (PVC): A request for storage by a user. It is similar to a Pod -- Pods consume node resources (CPU, memory) and PVCs consume PV resources (capacity, access modes). A PVC binds to exactly one PV.
  3. StorageClass: A "template" that allows PVs to be dynamically provisioned when a PVC is created. StorageClasses define the provisioner (e.g., ebs.csi.aws.com), parameters (e.g., volume type), and reclaim policy.

PersistentVolume (PV) in Detail

A PV is created by an administrator or dynamically by a StorageClass. Here is a full PV manifest for an AWS EBS volume:

apiVersion: v1
kind: PersistentVolume
metadata:
name: ebs-pv-01
labels:
type: ebs
environment: production
spec:
capacity:
storage: 50Gi
volumeMode: Filesystem # Filesystem (default) or Block
accessModes:
- ReadWriteOnce # EBS can only attach to one node
persistentVolumeReclaimPolicy: Retain
storageClassName: gp3-encrypted
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0abcdef1234567890
fsType: ext4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a

Key fields explained:

  • capacity.storage: The size of the volume. PVCs request a minimum capacity, and the PV must meet or exceed it.
  • volumeMode: Either Filesystem (mounted as a directory) or Block (raw block device). Most workloads use Filesystem.
  • accessModes: What read/write patterns are allowed (see the Access Modes section below).
  • persistentVolumeReclaimPolicy: What happens when the PVC is deleted (see Reclaim Policies below).
  • storageClassName: Links this PV to a StorageClass. A PVC must request the same StorageClass to bind to this PV.
  • nodeAffinity: Constrains which nodes the PV can be accessed from. Required for zone-specific storage like EBS.

Here is an NFS-backed PV for shared file storage:

apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-shared-assets
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany # NFS supports multi-node access
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-shared
nfs:
server: 10.0.1.50
path: /exports/shared-assets
mountOptions:
- hard
- nfsvers=4.1

PersistentVolumeClaim (PVC) in Detail

A PVC is the user-facing object. Developers create PVCs to request storage without needing to know the underlying infrastructure details.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: databases
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-encrypted
resources:
requests:
storage: 50Gi
selector: # Optional: select a specific PV by label
matchLabels:
type: ebs
environment: production

When this PVC is created, Kubernetes searches for a PV that satisfies all of these conditions:

  1. The PV has at least 50Gi of capacity.
  2. The PV supports ReadWriteOnce access.
  3. The PV has storageClassName: gp3-encrypted.
  4. The PV's labels match the selector (if specified).
  5. The PV's status is Available (not already bound to another PVC).

If no matching PV exists and a StorageClass with dynamic provisioning is configured, Kubernetes automatically creates a new PV.


StorageClass and Dynamic Provisioning

In production, you rarely create PVs manually. Instead, you define StorageClasses and let the cluster create PVs on demand.

Crucial: The Move to CSI (Container Storage Interface) Older Kubernetes versions used "In-Tree" drivers (e.g., kubernetes.io/aws-ebs). These are deprecated/removed. You must use CSI drivers (e.g., ebs.csi.aws.com).

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com # Always use the CSI driver!
parameters:
type: gp3
encrypted: "true"
iops: "3000"
throughput: "125"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Key fields:

  • provisioner: The CSI driver or in-tree plugin responsible for creating the volume. Examples include ebs.csi.aws.com, efs.csi.aws.com, pd.csi.storage.gke.io, and disk.csi.azure.com.
  • parameters: Provisioner-specific settings. For AWS EBS, this includes volume type, IOPS, throughput, and encryption.
  • reclaimPolicy: Delete (default for dynamic provisioning) or Retain. Controls what happens to the PV when the PVC is deleted.
  • allowVolumeExpansion: When true, PVCs using this class can be resized by editing the PVC's spec.resources.requests.storage field.
  • volumeBindingMode: Controls when volume binding and provisioning occurs. This is a critical setting explained below.

Volume Binding Modes

ModeBehaviorUse Case
ImmediatePV is provisioned as soon as the PVC is createdStorage that is accessible from any zone
WaitForFirstConsumerPV is provisioned only when a Pod using the PVC is scheduledZone-specific storage like EBS, GCE PD

Always use WaitForFirstConsumer for zone-aware storage. If you use Immediate with AWS EBS, the volume might be created in us-east-1a while the Pod gets scheduled to us-east-1b, causing a scheduling failure.


Access Modes

Kubernetes defines four access modes that describe how a volume can be mounted:

ModeAbbreviationDescription
ReadWriteOnceRWOMounted as read-write by a single node. Multiple Pods on the same node can share it.
ReadOnlyManyROXMounted as read-only by many nodes simultaneously.
ReadWriteManyRWXMounted as read-write by many nodes simultaneously.
ReadWriteOncePodRWOPMounted as read-write by a single Pod only. Requires Kubernetes 1.29+ as stable.

Not all storage backends support all access modes. Block storage (EBS, Azure Disk) typically only supports RWO. File storage (NFS, EFS, Azure Files) supports RWX.


Reclaim Policies

When a PVC is deleted, the reclaim policy on the PV determines what happens next:

PolicyBehaviorUse Case
RetainPV is kept with its data intact. The PV status becomes Released and must be manually cleaned up before it can be reused.Production databases, critical data
DeleteThe PV and its backing storage (e.g., the EBS volume) are both deleted.Development, ephemeral data
RecycleDeprecated. Performs a basic rm -rf /thevolume/* on the volume. Use dynamic provisioning instead.Legacy systems only

Block vs. File Storage Deep Dive

Block Storage (ReadWriteOnce)

  • Examples: AWS EBS, Azure Disk, GCE Persistent Disk, iSCSI.
  • Performance: High throughput, low latency. Ideal for databases (PostgreSQL, MySQL, MongoDB).
  • Limitation: Can only be attached to one node at a time. If your Pod moves to a different node, the disk must be detached and re-attached (typically takes 15-30 seconds).
  • volumeMode: Block: You can expose the raw block device to the Pod (no filesystem). Used by some databases and performance-sensitive applications that manage their own on-disk format.

File Storage (ReadWriteMany)

  • Examples: NFS, AWS EFS, Azure Files, GlusterFS, CephFS.
  • Performance: Slightly higher latency than block storage due to the network file system protocol overhead.
  • Benefit: Can be shared by many Pods on many nodes simultaneously. Perfect for shared assets, content management, or any scenario where multiple Pods need concurrent read-write access.

CSI Drivers

The Container Storage Interface (CSI) is the standard for exposing storage systems to Kubernetes. CSI replaced the older in-tree volume plugins (which required changes to the core Kubernetes codebase).

Popular CSI drivers:

ProviderCSI DriverStorage Type
AWS EBSebs.csi.aws.comBlock (RWO)
AWS EFSefs.csi.aws.comFile (RWX)
GCP PDpd.csi.storage.gke.ioBlock (RWO)
Azure Diskdisk.csi.azure.comBlock (RWO)
Azure Filesfile.csi.azure.comFile (RWX)
Ceph RBDrbd.csi.ceph.comBlock (RWO)
Ceph FScephfs.csi.ceph.comFile (RWX)

CSI drivers are deployed as DaemonSets and Deployments in the cluster and handle volume create, attach, mount, snapshot, and expand operations.


Volume Snapshots

Volume snapshots let you create point-in-time copies of your volumes. This requires a CSI driver that supports snapshots and the VolumeSnapshot CRDs to be installed.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snapshot-2024-01-15
spec:
volumeSnapshotClassName: csi-aws-snapclass
source:
persistentVolumeClaimName: postgres-data

You can then restore a snapshot by creating a new PVC that references it:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-restored
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-encrypted
resources:
requests:
storage: 50Gi
dataSource:
name: postgres-snapshot-2024-01-15
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io

Using PVCs in Pods

To use a PVC in a Pod, reference it in the volumes section and mount it into a container:

apiVersion: v1
kind: Pod
metadata:
name: postgres
spec:
containers:
- name: postgres
image: postgres:16
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
subPath: pgdata # Avoid "lost+found" issue
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-data

The subPath field is important for databases like PostgreSQL. Without it, the root of the mounted volume (which may contain a lost+found directory on ext4 filesystems) is used as the data directory, causing initialization failures.


Expanding Volumes

If a StorageClass has allowVolumeExpansion: true, you can resize a PVC by editing its spec.resources.requests.storage:

kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

The underlying CSI driver expands the volume. For file systems, the expansion happens online (no downtime). For block storage, some CSI drivers require the Pod to be restarted for the filesystem resize to take effect.

You cannot shrink a PVC. The new size must always be greater than or equal to the current size.


Binding Rules

  • Capacity: The PV must have at least as much storage as the PVC requested.
  • Access Modes: The PV must support the access modes requested by the PVC.
  • StorageClass: The PV and PVC must have the same storageClassName. If the PVC omits this field, it uses the cluster's default StorageClass.
  • Selector: If the PVC has a selector, only PVs with matching labels are considered.
  • Status: The PV must be in Available status (not already Bound or Released).

Common Pitfalls

  1. Zone mismatch with Immediate binding: Using volumeBindingMode: Immediate with zone-aware block storage (EBS, GCE PD) can cause the volume to be provisioned in a different zone than where the Pod is scheduled. Always use WaitForFirstConsumer.

  2. Released PVs cannot be rebound: When a PVC is deleted and the PV has Retain policy, the PV goes to Released status. You cannot bind a new PVC to a Released PV without manually clearing the spec.claimRef field.

  3. PostgreSQL lost+found issue: Mounting a PVC directly as the PostgreSQL data directory fails because the ext4 filesystem's lost+found directory interferes with initialization. Use subPath to mount into a subdirectory.

  4. Forgetting allowVolumeExpansion: If you need to resize volumes later and the StorageClass does not have allowVolumeExpansion: true, the resize request will be rejected. Plan ahead when defining StorageClasses.

  5. Using Recycle reclaim policy: The Recycle policy is deprecated. Use dynamic provisioning with Delete policy and create new PVCs instead.

  6. Access mode confusion: ReadWriteOnce means one node, not one Pod. Multiple Pods on the same node can mount the same RWO volume. Use ReadWriteOncePod if you need exclusive single-Pod access.

  7. StatefulSet PVC deletion: Deleting a StatefulSet does not delete its PVCs. This is by design to prevent data loss, but it means you must manually clean up PVCs when decommissioning a StatefulSet.


Best Practices

  1. Always use dynamic provisioning in production. Pre-provisioned PVs are hard to manage at scale and do not support volume expansion or snapshots.

  2. Set a default StorageClass. Mark exactly one StorageClass with storageclass.kubernetes.io/is-default-class: "true" so PVCs without an explicit class still get provisioned.

  3. Use WaitForFirstConsumer for all zone-aware storage. This ensures the volume is created in the same zone as the consuming Pod.

  4. Enable allowVolumeExpansion on all StorageClasses. There is no downside, and it avoids painful migrations when you need more space.

  5. Use Retain for production databases. Accidental PVC deletion should not destroy your data. Pair with regular volume snapshots for backup.

  6. Set resource requests accurately. Over-provisioning wastes money. Under-provisioning leads to out-of-space errors that can crash your application.

  7. Use labels and selectors for static PVs to ensure PVCs bind to the correct volumes, especially in multi-tenant clusters.

  8. Monitor PVC usage. Use tools like Prometheus with kubelet_volume_stats_used_bytes and kubelet_volume_stats_capacity_bytes metrics to set up alerts before volumes fill up.


What's Next?

  • StatefulSets -- Learn how StatefulSets use volumeClaimTemplates to automatically create PVCs for stateful workloads like databases.
  • ConfigMaps & Secrets -- For non-persistent configuration data, ConfigMaps and Secrets can be mounted as volumes without needing PVs.
  • Helm -- Use Helm charts to template StorageClass, PV, and PVC definitions across environments.
  • Kustomize -- Use Kustomize overlays to customize storage sizes and classes per environment.