Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU support #4

Open
Lissanro opened this issue Feb 4, 2024 · 5 comments
Open

Multi-GPU support #4

Lissanro opened this issue Feb 4, 2024 · 5 comments

Comments

@Lissanro
Copy link

Lissanro commented Feb 4, 2024

Since multiple experts can take a lot of VRAM, especially for SDXL, it would useful to have a way to choose which experts to load to which GPU (since GPU can have different VRAM each).

@Andrey36652
Copy link

@Lissanro wouldn't it be killed by pcie latency?

@Lissanro
Copy link
Author

Lissanro commented Feb 5, 2024

I think PCI-E latency is only relevant during training (not to mention it could be quite good if PCI-E 4.0 or PCI-E 5.0 with sufficient number of lanes is used, or NVLink in case of a pair 3090 cards).

For inference, PCI-E latency should not matter much, it is just independent experts doing their job once their fully loaded to the VRAM. This is how for example running Mixtral (8x7B MoE) is possible at 4-bit or higher quantization with 24GB cards - since it cannot fit in 24GB of a single card, it gets split across more than 1 GPU, and speed is comparable to running on a single GPU.

Potentially, it could be even better if parallelism across multiple GPUs is implemented (for a case when one expert is fully allocated at one GPU, and another expert at different GPU, and the gate network decided it needs to use both). In any case, even naive sequential implementation (to process experts one-by-one even if they are on different GPUs) is still better than crashing with OOM, and in terms of speed should be at least comparable to running on a single GPU with the higher VRAM.

@Warlord-K
Copy link
Contributor

Thanks for the suggestion, we are working on optimizing the memory usage, but feel free to create a PR for Multi-GPU usage.

@g29times
Copy link

g29times commented Feb 6, 2024

@Warlord-K Hi Admin, is there any possible that the homepage README file that tells the GPU needs or specifications?

@Warlord-K
Copy link
Contributor

@g29times I have added the GPU requirements, thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants