Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udev SR not working after upgrade to XCP-ng 8.3 #666

Open
n-buck opened this issue Oct 15, 2024 · 27 comments
Open

udev SR not working after upgrade to XCP-ng 8.3 #666

n-buck opened this issue Oct 15, 2024 · 27 comments

Comments

@n-buck
Copy link

n-buck commented Oct 15, 2024

Hi

After updating to 8.3, my SR did not work anymore.
I have created it like this:

mkdir /srv/NasDrives

ln -s /dev/sda2 /srv/NasDrives/sda

xe sr-create name-label="My Nas Storage" name-description="Nas Storage" type=udev content-type=disk device-config:location=/srv/NasDrives


xe sr-scan uuid=id

This results in disks with an unrecognized bus type and 0B size.
I am therefore not able to assign this sr to a vm.

Recreating the SR in a similar manor did not help as well.

@stormi
Copy link
Member

stormi commented Oct 29, 2024

Hi, and sorry for the late reply.

Has the situation evolved in the last two weeks?

Pinging @Wescoeur and @Nambrok for insight regarding the use of the udev driver.

@stormi
Copy link
Member

stormi commented Oct 29, 2024

And pinging @benjamreis regarding pass-through.

@Nambrok
Copy link

Nambrok commented Oct 29, 2024

The udev sm driver is not PCI passthrough, it makes tapdisk directly show the block device through a PV device to the VM. I will take a look to see if there is any changes in 8.3.

@n-buck
Copy link
Author

n-buck commented Nov 1, 2024

@stormi thank you for the reply.
The situation has not changed (downgraded to 8.2 again).

How would I test this again, is there a patch released, or would I install again from the iso, then update with yum?

@stormi
Copy link
Member

stormi commented Nov 4, 2024

No, no patch released that would address this kind of issue. If you can reproduce on a test host, or find a way for us to reproduce, we can try to find the cause.

@n-buck
Copy link
Author

n-buck commented Jan 4, 2025

Hello Stormi,
Sorry for the late reply.

Since this is my only set-up and I did not have the storage to backup my data, I was somewhat afraid to test it further on this system.

What I did:

  1. Run XCP-NG 8.2.1
  2. Create a new Storage-Repository:
    2.1 mkdir /srv/NasDrives
    2.2 xe sr-create name-label="My Nas Storage" name-description="Nas Storage" type=udev content-type=disk device-config:location=/srv/NasDrives
    2.3 ln -s /dev/sda2 /srv/NasDrives/sda ...
    2.4 xe sr-scan uuid=664fb621-a084-8c9a-9858-1041fe5f5a67
  3. Assign it to a VM
  4. Update to XCP-NG 8.3
    4.1 The Storage-Repository has no drives.
    4.2 Recreating a Storage Repository as described in step 2 did not work either
  5. Restore XCP-NG 8.2.1

Since this is also not a supported way to pass the drives to TrueNas, I am in the process to upgrade my system to passing the disks with a HBA.

I hope this is of any help, and thank you for an amazing product!

@jeremfg
Copy link

jeremfg commented Jan 17, 2025

I've just faced this exact issue. I've already commented here also.

My use case is a server that doesn't support IOMMU.

By the sound of it, I should be able to get it working if I install 8.2.1 instead, which would be an acceptable temporary workaround for me. All my other nodes are still running 8.2.1 anyway, this was my first foray into 8.3. I'll test out the downgrade in a few days.

EDIT: I've since made the move to 8.2.1. Making the exact same configuration worked on the older release, so there's definitely something broken specific to 8.3.

@stormi stormi changed the title Disk Passthrough not working after upgrade udev SR not working after upgrade to XCP-ng 8.3 Jan 17, 2025
@stormi
Copy link
Member

stormi commented Jan 17, 2025

Using XCP-ng 8.2.1 is a temporary solution, but it will only last while XCP-ng 8.2.1 is supported (well, you can continue to use it after it's EOL, but without security fixes).

The udev sm driver is not PCI passthrough, it makes tapdisk directly show the block device through a PV device to the VM. I will take a look to see if there is any changes in 8.3.

@Nambrok Did you reach a conclusion regarding this issue?

@Nambrok
Copy link

Nambrok commented Jan 17, 2025

@stormi Sorry, I haven't had time to look yet. I will be investigating when I have time.

@stormi
Copy link
Member

stormi commented Jan 27, 2025

@jeremfg could you give us the exact commands that you tried on XCP-ng 8.3 and that didn't work?

@blackliner
Copy link

blackliner commented Feb 13, 2025

Does this mean there is no way to pass through disks with 8.3? (PCI passthrough of an HBA doesn't count)

@jeremfg
Copy link

jeremfg commented Feb 13, 2025

@jeremfg could you give us the exact commands that you tried on XCP-ng 8.3 and that didn't work?

@stormi
Sorry I didn't see your message until now.
If you are willing to read some bash, this is the exact script I've been running to create the SR.

If you need to see how I use this code, you can find the reference here in function nas_storage_update.

@jeremfg
Copy link

jeremfg commented Feb 13, 2025

Does this mean there is no way to pass through disks with 8.3? (PCI passthrough of an HBA doesn't count)

@blackliner
Not sure how you define "disk" here. You can still for example create a LVM SR, create virtual disks on top of it, and attach those to a VM for example. The regular stuff still works.

But a whole disk as a udev? Nope, doesn't seem to work anymore. The simlinks to /dev/sd[a,b,c] don't show up in the SR. I had to revert back to the latest 8.2.1 for now.

@stormi
Copy link
Member

stormi commented Feb 17, 2025

So, @Nambrok will provide details if needed, but here's the situation I understood from talking with him:

  • The udev SR driver was never made for what users do with it, that is, assign a whole disk to a VM without doing pass-through. If I'm not mistaken, the udev driver is here mostly to handle removable devices. To my knowledge, this use of the udev driver to attach whole disks or partitions to VMs isn't documented in XCP-ng's official documentation either.
  • It was even less designed to assign a partition to a VM, and there even was a check to prevent it, but the check was easily fooled by renaming the symlink.
  • Upstream developers have improved the behaviour of the driver so that it would actually follow symlinks, rather than assuming that their name conveys a meaningful information. This makes it possible to use UUIDs, not just device names, and this is a nice improvement. However, in the process, this also means that the check that the symlink actually points to a device (not a partition) is now effective.
  • It so happens that various users have started relying on that "let's rename the symlink to fool the driver" behaviour, and despite the solution used was not made for this, it worked to some extent.
  • It still works with entire disks.

From here, how to go on?

  1. If there are other suitable solutions to your needs, avoid the udev driver to attach whole disks to VMs.
  2. If you really have to use the udev driver in this unsupported way, use entire disks, not partitions.
  3. If you really have to use partitions, and I don't recommend it, you could likely cheat by creating a loopback device for the disk partition and use this as a device for the SR (also ensuring it's correctly setup at boot).
  4. If we can understand the use case of associating disk partitions to VMs, maybe we'll find a way to implement something cleaner. Or will suggest a better solution.

@jeremfg
Copy link

jeremfg commented Feb 17, 2025

  • It still works with entire disks.

No, that's what broke for me, and makes a distinction between 8.2.1 and 8.3. I can no longer pass a whole disk, they don't show after a sr-scan. My symlinks point to /dev/sda, not sda1 or something.

Honestly, I had no ideas it was possible to pass a partition until your wrote all of this. The behavior described about following symlinks doesn't match my experience, quite the opposite. But I could be wrong in my interpretation, being familiar on the subject only as a back box and my own experiments.

True. It is and has always been a hack, not officially supported. So should it be supported? All i know is I've been reduced to using this hack, being choiceless due to the hardware I have.
Between this issue and the lack of support for nested virtualization, 8.3 is a no-go for me. And yes, granted, I'm not the target consumer either. I realize that.

@stormi
Copy link
Member

stormi commented Feb 17, 2025

(Regarding Nested Virt, proper support is still quite a long way ahead, but you'll probably be pleased to learn that we found a way to bring back to 8.3 a similar level of support of the nested virtualization as we had in 8.2.1, that is: partial, unsupported, but working for some use cases.)

@Nambrok
Copy link

Nambrok commented Feb 18, 2025

@jeremfg

No, that's what broke for me, and makes a distinction between 8.2.1 and 8.3. I can no longer pass a whole disk, they don't show after a sr-scan. My symlinks point to /dev/sda, not sda1 or something.

Do you still have a 8.3 host you could test on?
Could you try this on 8.2.1 and 8.3?

# python3
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append("/opt/xensource/sm/")
>>> import sysdevice
>>> sysdevice.stat("sda")
{'size': 239444426752, 'bus': 'SCSI', 'bus_path': '0:2:0:0', 'hwinfo': 'DELL model PERC H310 rev 2.12 type 0'}

Just with python in 8.2.1

@jeremfg
Copy link

jeremfg commented Feb 18, 2025

@Nambrok

Do you still have a 8.3 host you could test on?

Sadly no. I would have to take my home network offline for quite a while, manually migrate a few VMs, open the hardware to swap boot drives, etc... to make any tests with 8.3 again. In other words, it's something I might consider in the future, if there's a strong enough argument for it, but it'll be quite the hassle for me.

Still, here is what I'm getting on 8.2.1, where it works:

>>> sysdevice.stat("sda")
{'bus': 'SCSI', 'hwinfo': 'ATA model ST24000NM000C-3W rev SN02 type 0', 'bus_path': '0:0:0:0', 'size': 24000277250048L}

Based on the code of the stat() function I'm seeing, pretty sure it would output the same on 8.3 (assuming this part of sysdevice.py hasn't changed much in 8.3).

@Nambrok
Copy link

Nambrok commented Feb 18, 2025

Yes, it's supposed to work the same in 8.3 when you give it a base device.
If you try to give it a partition, e.g. sda1, this function will fail.
That's why I'm surprised a whole disk didn't work for you in 8.3.
Another thing must have changed then. I'll take another look at the history to see if something else could be responsible.
But from what I can remember last time I checked, this driver didn't have much changes that could explain this change of behavior between 8.2.1 and 8.3.
Thank you for the test.

@jeremfg
Copy link

jeremfg commented Feb 18, 2025

That's why I'm surprised a whole disk didn't work for you in 8.3.

Are you implying you've made a similar test yourself, and saw it working on 8.3? I was under the assumption it's pretty easy to reproduce, and would behave the same as me, whatever SATA drive you used.

@Nambrok
Copy link

Nambrok commented Feb 18, 2025

I only tried with a NVMe device since I don't have SATA driver available.
But it's more based on my understanding of this code.

@Nambrok
Copy link

Nambrok commented Feb 18, 2025

I'll try a few things to see soon.
The whole disk should work with this driver, I though it was just giving only a partition that wasn't working in 8.3 and this is normal.

@Nambrok
Copy link

Nambrok commented Feb 18, 2025

Yeah, I can confirm it does work a NVMe as I remembered.

# xe vdi-list sr-uuid=8a97c0dc-ec46-6ff6-eddd-8c738932457c 
uuid ( RO)                : c7209e1d-30e8-4692-81b9-5e8b30cd03eb
          name-label ( RW): Unrecognised bus type 
    name-description ( RW): model CT500T500SSD8
             sr-uuid ( RO): 8a97c0dc-ec46-6ff6-eddd-8c738932457c
        virtual-size ( RO): 500107862016
            sharable ( RO): false
           read-only ( RO): false

I'll try to see if I can stole a machine with a free SATA drive to test

@dockwell-jenner
Copy link

FWIW, this doesn't work for me on an fully patched 8.2.1 host with a USB-connected 8TB drive. Creating the SR works fine but the process always errors when scanning the SR with SR_BACKEND_FAILURE_46.

@olivierlambert
Copy link
Member

@dockwell-jenner hi! Can you be more specific on what doesn't work exactly, at which step?

@dockwell-jenner
Copy link

@olivierlambert this is what I see:

# mkdir /srv/pt
# ln -s /dev/sda /srv/pt/sda
# xe sr-create name-label="PT Storage" name-description="PT Storage" type=udev content-type=disk device-config:location=/srv/pt
<uuid is returned>
# xe sr-scan uuid=<uuid returned above>
Error code: SR_BACKEND_FAILURE_46
Error parameters: , The VDI is not available,

@dockwell-jenner
Copy link

After a recent patch and reboot of my 8.2.1 host, and a physical disconnect/reconnect of the USB-connected drive, the above issue I mentioned is no longer there. Just recording for prosperity,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants