-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU support #4
Comments
@Lissanro wouldn't it be killed by pcie latency? |
I think PCI-E latency is only relevant during training (not to mention it could be quite good if PCI-E 4.0 or PCI-E 5.0 with sufficient number of lanes is used, or NVLink in case of a pair 3090 cards). For inference, PCI-E latency should not matter much, it is just independent experts doing their job once their fully loaded to the VRAM. This is how for example running Mixtral (8x7B MoE) is possible at 4-bit or higher quantization with 24GB cards - since it cannot fit in 24GB of a single card, it gets split across more than 1 GPU, and speed is comparable to running on a single GPU. Potentially, it could be even better if parallelism across multiple GPUs is implemented (for a case when one expert is fully allocated at one GPU, and another expert at different GPU, and the gate network decided it needs to use both). In any case, even naive sequential implementation (to process experts one-by-one even if they are on different GPUs) is still better than crashing with OOM, and in terms of speed should be at least comparable to running on a single GPU with the higher VRAM. |
Thanks for the suggestion, we are working on optimizing the memory usage, but feel free to create a PR for Multi-GPU usage. |
@Warlord-K Hi Admin, is there any possible that the homepage README file that tells the GPU needs or specifications? |
@g29times I have added the GPU requirements, thanks for the suggestion! |
Since multiple experts can take a lot of VRAM, especially for SDXL, it would useful to have a way to choose which experts to load to which GPU (since GPU can have different VRAM each).
The text was updated successfully, but these errors were encountered: