-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velero not re-creating volume using storage class with Retain policy #5506
Comments
Looks like this is an environmental problem that the PV could not be provisioned by the CSI provisioner, could you try with the verification here, to make sure all the CSI functionalities work well. EKS 1.23 has introduced CSI migration and some other security changes, if you are using KMS, there are other steps in other to config the policies to both the service role and node group. |
Thank you for replying. I tested the use-case you linked and everything worked without issues. It was pretty similar to our use-case with the ReclaimPolicy on PVC set to Could you maybe please link the changes you mentioned in EKS? Specifically what other steps we need to take? I tried to google but I wasnt able to find anything that could seem to help. We are using new cluster created on 1.23 and we are not migrating any volumes from the old EBS storage class. Below is the current policy we are using for velero
|
Could you run below commands in your env: |
Running into the exact same issue. Restores fail if the storage class used has policy set to retain. Deterministically reproducible. Kindly prioritize this issue/bug. |
@raghavkaranth To help us troubleshoot, could you run below command after the problem happens and share us the output:
|
I have similar problem, but I'm using restic instead of CSI snapshots. If there is PVC data included in backup (proper restic volumes) everything works fine: Velero re-creates volume, but when there are only PVCs and PVs backed up it doesn't, restored PVC takes over backed PV (making original PVC "Lost") or if restore happens on another K8s cluster it can't mount volume to pod. So i can't backup and restore just pods, that have PVCs (without their data). Also kindly prioritize this issue/bug. |
@waclawikj |
@aschi1 @raghavkaranth For the original problem, could you help to collect the info mentioned above so that we can further troubleshoot? |
I guess all the problems mentioned in this issue thread are related to overwriting existing items during restore, which is not supported by Velero.
For CSI snapshot restore, besides deleting the PVC/PV, we also need to guarantee the VS/VSC/snapshot class doesn't exist or contains the correct info. |
@Lyndon-Li is this the expected behavior- https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1251 |
Hi,
We tried testing a DR scenario for our deployments and hit a problem with a PVC using storage class with
Retain
policy. We are using AWS EKS cluster running on kubernetes 1.23. We are using CSI drivers for the disks and snapshotting, velero plugin for csi and we havefeatures: EnableCSI
set in helm.I tried searching through issues but I did not find anything similar, only one discussion about missing roles to access KMS keys used for encryption of disks but even after adding those permissions our situation did not change.
Thank you for your help
What steps did you take and what happened:
velero: true
so that it is backed up by velerovelero restore create --from-backup velero-snapshot-every-hour-20221027124246
What did you expect to happen:
ContainerCreating
with error message belowThe following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2022-10-27-15-22-52.tar.gz
Anything else you would like to add:
At first we thought Velero is maybe expecting the EBS volume to stay in AWS console since its policy
Retain
but we disproved this by creating PVC with manifest below. When we created a PVC like this manually it was automatically created in AWS console and the pod started with the data restored. Afterwards, we compared it to the manifest that velero creates for the PVC after performing the restore and they were trying to do the same thing.When we tested the same behavior with PVC using storage class with
Delete
policy everything worked without issues. This leaves us wondering what could be the problem. We thought it might be missing permissions in AWS (we are using IRSA to pass the role to velero pod) but if disks withDelete
policy are working then it does not seem like a permission problemPVC manifest used to recreate the volume manually
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
):velero client config get features
):kubectl version
):/etc/os-release
): LinuxVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: