Local Storage vs Network Storage
In Kubernetes, not all storage is equal. Network-attached storage (Ceph, NFS, AWS EBS) can be accessed from any node in the cluster. Your pod can run on node A while the volume lives somewhere else entirely — the storage layer handles the network transport.
Local storage is different. With local storage (like OpenEBS ZFS LocalPV), the data physically lives on a specific node’s disk. There is no network replication, no remote access. The pod and the PersistentVolume (PV) must be on the same node, without exception. If they aren’t, the pod will fail to start.
This single constraint is the root cause of most local storage headaches in Kubernetes.
How OpenEBS ZFS LocalPV Works
When you create a PersistentVolumeClaim (PVC) using the zfs-localpv StorageClass, the OpenEBS ZFS CSI driver:
- Picks a node that has a ZFS pool available
- Creates a ZFS dataset on that node
- Creates a PV bound to that PVC
- Automatically sets a
nodeAffinityon the PV:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: openebs.io/nodeid
operator: In
values:
- <node-name>
This nodeAffinity is what pins the PV to its node permanently. Kubernetes uses it to ensure any pod mounting that PV is scheduled on the same node. You don’t set this — the provisioner sets it for you after the volume is created.
The Trap: volumeBindingMode: Immediate
Here’s where things go wrong. StorageClasses have a volumeBindingMode field with two options:
Immediate(default): The PVC is provisioned the moment it’s created, before any pod is scheduledWaitForFirstConsumer: The PVC staysPendinguntil a pod that needs it is actually being scheduled
With Immediate, the sequence looks like this:
- PVC is created
- Provisioner immediately picks any available node with ZFS storage
- PV is created on that node, PVC goes
Bound - Pod tries to schedule with your
nodeSelectorpointing to a specific node - Conflict: pod wants node A, but PV’s nodeAffinity says node B
- PV’s nodeAffinity wins — pod gets scheduled on node B, your
nodeSelectoris ignored
Your carefully configured nodeSelector on the pod does nothing when the PVC is already bound to a PV on a different node. The PV’s nodeAffinity always takes precedence.
The Fix: WaitForFirstConsumer
With WaitForFirstConsumer, the sequence becomes correct:
- PVC is created → stays
Pending - Pod is scheduled → Kubernetes picks the right node based on your
nodeSelector - Provisioner sees which node the pod is going to → creates the PV on that node
- PVC goes
Bound, pod starts — both on the same intended node
This is the recommended binding mode for any local storage. Immediate is technically wrong for local storage because it makes pod placement non-deterministic.
To update your StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: zfs-localpv
provisioner: zfs.csi.openebs.io
parameters:
compression: "off"
dedup: "off"
fstype: zfs
poolname: primary
recordsize: 4k
thinprovision: "no"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer # <-- this is the key change
Is this change safe cluster-wide? Yes. Changing volumeBindingMode on an existing StorageClass only affects new PVC creations. Already-bound PVCs are completely unaffected. Existing workloads keep running without any disruption.
Pinning a Pod to a Node (for Local Storage)
To control which node your pod — and therefore your PV — lands on, add a nodeSelector to the pod spec. For OpenEBS ZFS LocalPV, the node label to use is openebs.io/nodeid:
# For a Helm chart like Bitnami Redis:
master:
nodeSelector:
openebs.io/nodeid: your-node-name
replica:
nodeSelector:
openebs.io/nodeid: your-node-name
Or using full nodeAffinity if you need more control:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: openebs.io/nodeid
operator: In
values:
- your-node-name
With WaitForFirstConsumer on the StorageClass, this nodeSelector is what the provisioner reads to decide where to create the PV.
How to Shift a PV from One Node to Another
Since PVs are pinned to their node by nodeAffinity, you cannot move a ZFS LocalPV volume to a different node in-place. The data is physically on the original node’s ZFS pool. The migration process is:
Step 1 — Update the pod’s nodeSelector
Change the nodeSelector in your deployment/StatefulSet to point to the target node. With WaitForFirstConsumer, new PVCs will be created on the correct node.
Step 2 — Back up the data (if needed)
If the PV has data you care about, back it up before proceeding. For stateless or cache workloads (like Redis used as a cache), this may not be necessary.
Step 3 — Delete the PVC
kubectl delete pvc <pvc-name> -n <namespace>
This also deletes the PV (if reclaimPolicy: Delete) and the underlying ZFS dataset on the old node.
Step 4 — Let it recreate
For a Deployment, the pod restart triggers PVC recreation. For a StatefulSet, you may need to delete the pod:
kubectl delete pod <pod-name> -n <namespace>
The StatefulSet controller recreates the pod, which triggers PVC creation, which (with WaitForFirstConsumer) creates the PV on the node your nodeSelector points to.
Step 5 — Verify
kubectl describe pvc <pvc-name> -n <namespace>
Check that the PV’s nodeAffinity now shows the correct target node.
Key Takeaways
- Pod and PV must be on the same node for local storage — this is a hard requirement, not a preference
- The provisioner sets nodeAffinity on the PV automatically — you don’t configure it, but it controls pod scheduling
volumeBindingMode: Immediateis wrong for local storage — it creates PVs before pods are scheduled, leading to node mismatchesvolumeBindingMode: WaitForFirstConsumeris the correct mode — provisioning waits for pod scheduling so the PV is always created on the right node- Changing
volumeBindingModeis safe cluster-wide — it only affects new PVCs, not existing ones - You cannot move a local PV — you can only delete and recreate it on the target node
nodeSelectoron the pod only works correctly withWaitForFirstConsumer— withImmediate, the PV’s nodeAffinity overrides it