Race condition for helper-pod when multiple pvc's are provisioned #154

stefan-kiss · 2020-11-11T09:12:59Z

Hello. I think i have uncovered a bug.
If you provision multiple pv's in rapid succession the helper pod will only run for the first one.

First here will mean the helper pod that first get's created might be or not the same as the same pv that should be created.

How to reproduce:
On a single node kubernetes cluster:

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

Instead of creating one pvc and one pod create 3 of them:
pvcs.yaml:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-path-pvc1
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-path-pvc2
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-path-pvc3
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi

pods.yaml:

---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test01
  namespace: default
spec:
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: local-path-pvc1
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test02
  namespace: default
spec:
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: local-path-pvc2
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test03
  namespace: default
spec:
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: local-path-pvc3

Then check the /opt/local-path-provisioner path on the node:

[skiss-dev2 ~]$ ls -la /opt/local-path-provisioner
total 0
drwxr-xr-x  5 root root 222 Nov 11 08:56 .
drwxr-xr-x. 5 root root  65 Nov 11 08:50 ..
drwxr-xr-x  2 root root   6 Nov 11 08:56 pvc-919db3fc-6c88-446e-b77c-01e7ae260289_default_local-path-pvc1
drwxrwxrwx  2 root root   6 Nov 11 08:56 pvc-95bc6533-cd29-4f89-adaf-7b740e311969_default_local-path-pvc2
drwxr-xr-x  2 root root   6 Nov 11 08:56 pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5_default_local-path-pvc3

As you can all 3 pv's are being created however only one has the correct set of permissions.

The provisioner logs:

I1111 08:56:10.202181       1 controller.go:1202] provision "default/local-path-pvc1" class "local-path": started
I1111 08:56:10.214691       1 controller.go:1202] provision "default/local-path-pvc2" class "local-path": started
time="2020-11-11T08:56:10Z" level=debug msg="config doesn't contain node master-worker-1487cf6df4b42f3c60ef, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead"
time="2020-11-11T08:56:10Z" level=info msg="Creating volume pvc-919db3fc-6c88-446e-b77c-01e7ae260289 at master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-919db3fc-6c88-446e-b77c-01e7ae260289_default_local-path-pvc1"
time="2020-11-11T08:56:10Z" level=info msg="create the helper pod helper-pod into local-path-storage"
I1111 08:56:10.221367       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc1", UID:"919db3fc-6c88-446e-b77c-01e7ae260289", APIVersion:"v1", ResourceVersion:"26463", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/local-path-pvc1"
I1111 08:56:10.403174       1 controller.go:1202] provision "default/local-path-pvc3" class "local-path": started
time="2020-11-11T08:56:10Z" level=debug msg="config doesn't contain node master-worker-1487cf6df4b42f3c60ef, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead"
time="2020-11-11T08:56:10Z" level=info msg="Creating volume pvc-95bc6533-cd29-4f89-adaf-7b740e311969 at master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-95bc6533-cd29-4f89-adaf-7b740e311969_default_local-path-pvc2"
time="2020-11-11T08:56:10Z" level=info msg="create the helper pod helper-pod into local-path-storage"
I1111 08:56:10.410110       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc2", UID:"95bc6533-cd29-4f89-adaf-7b740e311969", APIVersion:"v1", ResourceVersion:"26467", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/local-path-pvc2"
time="2020-11-11T08:56:10Z" level=debug msg="config doesn't contain node master-worker-1487cf6df4b42f3c60ef, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead"
time="2020-11-11T08:56:10Z" level=info msg="Creating volume pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5 at master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5_default_local-path-pvc3"
time="2020-11-11T08:56:10Z" level=info msg="create the helper pod helper-pod into local-path-storage"
I1111 08:56:10.420310       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc3", UID:"c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5", APIVersion:"v1", ResourceVersion:"26472", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/local-path-pvc3"
time="2020-11-11T08:56:25Z" level=info msg="Volume pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5 has been created on master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5_default_local-path-pvc3"
time="2020-11-11T08:56:25Z" level=info msg="Volume pvc-919db3fc-6c88-446e-b77c-01e7ae260289 has been created on master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-919db3fc-6c88-446e-b77c-01e7ae260289_default_local-path-pvc1"
time="2020-11-11T08:56:25Z" level=info msg="Volume pvc-95bc6533-cd29-4f89-adaf-7b740e311969 has been created on master-worker-1487cf6df4b42f3c60ef:/opt/local-path-provisioner/pvc-95bc6533-cd29-4f89-adaf-7b740e311969_default_local-path-pvc2"
time="2020-11-11T08:56:28Z" level=error msg="unable to delete the helper pod: pods \"helper-pod\" not found"
I1111 08:56:28.440603       1 controller.go:1284] provision "default/local-path-pvc1" class "local-path": volume "pvc-919db3fc-6c88-446e-b77c-01e7ae260289" provisioned
I1111 08:56:28.440662       1 controller.go:1301] provision "default/local-path-pvc1" class "local-path": succeeded
I1111 08:56:28.440687       1 volume_store.go:212] Trying to save persistentvolume "pvc-919db3fc-6c88-446e-b77c-01e7ae260289"
I1111 08:56:28.444032       1 controller.go:1284] provision "default/local-path-pvc3" class "local-path": volume "pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5" provisioned
I1111 08:56:28.444058       1 controller.go:1301] provision "default/local-path-pvc3" class "local-path": succeeded
I1111 08:56:28.444067       1 volume_store.go:212] Trying to save persistentvolume "pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5"
time="2020-11-11T08:56:28Z" level=error msg="unable to delete the helper pod: pods \"helper-pod\" not found"
I1111 08:56:28.444364       1 controller.go:1284] provision "default/local-path-pvc2" class "local-path": volume "pvc-95bc6533-cd29-4f89-adaf-7b740e311969" provisioned
I1111 08:56:28.444381       1 controller.go:1301] provision "default/local-path-pvc2" class "local-path": succeeded
I1111 08:56:28.444387       1 volume_store.go:212] Trying to save persistentvolume "pvc-95bc6533-cd29-4f89-adaf-7b740e311969"
I1111 08:56:28.456125       1 volume_store.go:219] persistentvolume "pvc-95bc6533-cd29-4f89-adaf-7b740e311969" saved
I1111 08:56:28.456388       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc2", UID:"95bc6533-cd29-4f89-adaf-7b740e311969", APIVersion:"v1", ResourceVersion:"26467", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-95bc6533-cd29-4f89-adaf-7b740e311969
I1111 08:56:28.457477       1 volume_store.go:219] persistentvolume "pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5" saved
I1111 08:56:28.457550       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc3", UID:"c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5", APIVersion:"v1", ResourceVersion:"26472", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-c3e09e1e-19a6-44ec-9cf3-843fe24fe1b5
I1111 08:56:28.459975       1 volume_store.go:219] persistentvolume "pvc-919db3fc-6c88-446e-b77c-01e7ae260289" saved
I1111 08:56:28.460001       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"local-path-pvc1", UID:"919db3fc-6c88-446e-b77c-01e7ae260289", APIVersion:"v1", ResourceVersion:"26463", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-919db3fc-6c88-446e-b77c-01e7ae260289

Note the two messages regarding unable to delete the helper pod. That's because it want created for them. However there is no creation error message because:

	// If it already exists due to some previous errors, the pod will be cleaned up later automatically
	// https://github.com/rancher/local-path-provisioner/issues/27
	logrus.Infof("create the helper pod %s into %s", helperPod.Name, p.namespace)
	_, err = p.kubeClient.CoreV1().Pods(p.namespace).Create(helperPod)
	if err != nil && !k8serror.IsAlreadyExists(err) {
		return err
	}

From my understanding it seems that all 3 requests for pv provision are sent very close and the pod being named the same cannot be created multiple times. The first request that get's trough creates the pod the rest fails silently. I'm not entierly sure why the path on the node exists in any case since the helper pod does not get called.

However it's pretty clear that only one helper pod runs at a time and only one custom provisioning code (such as the one that sets the permissions) is being run.

The text was updated successfully, but these errors were encountered:

xor22h · 2020-11-11T09:56:26Z

I got two of them with drwxrwxrwx and one with drwxr-xr-x. But, yes, they are inconsistent.

rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

To fix the race condition caused by reusing the helper pod name. rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

To fix the race condition caused by reusing the helper pod name. #154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker · 2020-11-12T02:18:11Z

@stefan-kiss Good catch!

Can you check if the image yasker/local-path-provisioner:57801fc-amd64 fixes the issue in your environment? Assuming you're using AMD64. You need to replace the images from the original yaml file.

stefan-kiss · 2020-11-12T06:41:23Z

Hello, yes, i tested and i get correct results.

For my project (since i needed to move forward very fast), i used for now the helper pod name (string from config) and the actual PV name (from functions arguments) , trim the result to 63 characters and remove '-' to be sure i don't hit the name/format problems.

But this approach (or any other that will use a unique pod) leaves open the possibility of having pods hanged. While there are only couple of scenarios where you could have this - like the provisioner pod getting killed while the helper pod is up it still can have an impact ...

I don't yet know how to fix this but i'm thinking of replacing the deferred delete pod function with a cleaup function that looks for all helper pods (by some label) and cleans them up if they are in 'Completed', or one of the failure statuses.
(well anyway ! 'Running')

P.S. thank you for a very fast response !

yasker · 2020-11-13T01:30:38Z

@stefan-kiss Label the helper pods and clean them up once completed sounds like a good idea. Though I probably won't have time to work on it until after the thanksgiving, due to the upcoming KubeCon and other projects. Feel free to raise a PR if you need it sooner.

stefan-kiss · 2020-11-13T17:22:12Z

Sure. I'll try ... Thanks !

unki · 2020-11-16T18:50:01Z

@yasker could you maybe already create a new release containing your fix #155 as v0.0.19?

Otherwise all new users will probably hit the same issue with the current v0.0.18 (or whenever it was introduced).
I was today upgrading from v0.0.14->v0.0.19 - without checking the issues here first - and 💥 ... :)

Big thanks in advance and for all your great work!

yasker · 2020-11-19T02:37:15Z

I've just merged #156 which uses PV name as the suffix for the helper pod. I will leave it there for a couple of days for testing, then I can release v0.0.19.

sgoodliff · 2020-11-26T15:34:14Z

Hi,

Im also hitting this so would like to see 0.0.19 ship when ready

thanks

Fixed the helper pod race condition for multiple PVCs. Issue: rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

Fixed the helper pod race condition for multiple PVCs. Issue: #154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker · 2020-11-30T20:39:03Z

v0.0.19 has been released. https://github.com/rancher/local-path-provisioner/releases/tag/v0.0.19

Notice the label clean up mechanism is not there, but since it's the same as before, we can treat the labeled clean up as an enhancement and close this bug for now. Feel free to file another bug for it.

Fixed the helper pod race condition for multiple PVCs. Issue: rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker added a commit to yasker/local-path-provisioner that referenced this issue Nov 12, 2020

Add a random suffix to the helper pod's name

6f6a89c

rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker added a commit to yasker/local-path-provisioner that referenced this issue Nov 12, 2020

Add a random suffix to the helper pod's name

00dfbf5

To fix the race condition caused by reusing the helper pod name. rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker mentioned this issue Nov 12, 2020

Add a random suffix to the helper pod's name #155

Merged

yasker added a commit to yasker/local-path-provisioner that referenced this issue Nov 12, 2020

Add a random suffix to the helper pod's name

57801fc

To fix the race condition caused by reusing the helper pod name. rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker added a commit that referenced this issue Nov 12, 2020

Add a random suffix to the helper pod's name

248707e

To fix the race condition caused by reusing the helper pod name. #154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker added a commit to yasker/local-path-provisioner that referenced this issue Nov 30, 2020

Local Path Provisioner v0.0.19 release

adef40c

Fixed the helper pod race condition for multiple PVCs. Issue: rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker mentioned this issue Nov 30, 2020

Local Path Provisioner v0.0.19 release #159

Merged

yasker added a commit that referenced this issue Nov 30, 2020

Local Path Provisioner v0.0.19 release

d45bdcf

Fixed the helper pod race condition for multiple PVCs. Issue: #154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

yasker closed this as completed Nov 30, 2020

fsdrw08 pushed a commit to fsdrw08/local-path-provisioner that referenced this issue Aug 12, 2021

Local Path Provisioner v0.0.19 release

659188a

Fixed the helper pod race condition for multiple PVCs. Issue: rancher#154 Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition for helper-pod when multiple pvc's are provisioned #154

Race condition for helper-pod when multiple pvc's are provisioned #154

stefan-kiss commented Nov 11, 2020 •

edited

Loading

xor22h commented Nov 11, 2020

yasker commented Nov 12, 2020

stefan-kiss commented Nov 12, 2020 •

edited

Loading

yasker commented Nov 13, 2020

stefan-kiss commented Nov 13, 2020

unki commented Nov 16, 2020

yasker commented Nov 19, 2020 •

edited

Loading

sgoodliff commented Nov 26, 2020

yasker commented Nov 30, 2020 •

edited

Loading

Race condition for helper-pod when multiple pvc's are provisioned #154

Race condition for helper-pod when multiple pvc's are provisioned #154

Comments

stefan-kiss commented Nov 11, 2020 • edited Loading

xor22h commented Nov 11, 2020

yasker commented Nov 12, 2020

stefan-kiss commented Nov 12, 2020 • edited Loading

yasker commented Nov 13, 2020

stefan-kiss commented Nov 13, 2020

unki commented Nov 16, 2020

yasker commented Nov 19, 2020 • edited Loading

sgoodliff commented Nov 26, 2020

yasker commented Nov 30, 2020 • edited Loading

stefan-kiss commented Nov 11, 2020 •

edited

Loading

stefan-kiss commented Nov 12, 2020 •

edited

Loading

yasker commented Nov 19, 2020 •

edited

Loading

yasker commented Nov 30, 2020 •

edited

Loading