Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] HIP backend for AMD GPU support #3231

Closed
iotamudelta opened this issue Jan 31, 2024 · 4 comments
Closed

[RFC] HIP backend for AMD GPU support #3231

iotamudelta opened this issue Jan 31, 2024 · 4 comments

Comments

@iotamudelta
Copy link
Contributor

We would like to contribute a HIP backend to Faiss to support AMD GPUs. We have a working prototype that passes all unit tests on Navi hardware (6800XT, 7900XTX). The prototype features a statically hipified version of the existing CUDA backend with manual AMD specific changes (build system, PTX to amdgcn builtins, ...).

Assuming you are interested in this work, how would we go about upstreaming it best?

Would a static HIP backend (in the faiss::hip namespace) be preferred? If not, what architecture would preferable (e.g., overriding faiss::gpu)?

Unlike the CUDA backend, we ultimately need to support multiple "warp sizes" (wave fronts) at runtime - 32 for Navi and 64 for MI series. There are some questions pertaining to uses of kWarpSize that will not work out of the box (sizing shared memory, some of the replacements in cmake, static uses at dispatch sites, ...). We will need guidance on how such support would be architected and integrated best.

Lastly, we have done only minor performance analysis and/or tuning with the current prototype - are there any public benchmarks and/or particular protocols you have used for the other backends that we should use as a reference? We have used the GPU benchmark scripts with the SIFT data sets to assess performance so far.

As part of this, we're currently using a GPU_MAX_SELECTION_K of 1024 - what would a recommend protocol look like to decide for 1024 vs 2048 - we'd ideally like to use only one independent of HW/SW generation.

To browse the current prototype as a reference, please see https://github.com/iotamudelta/faiss/tree/wf32 .

@mdouze
Copy link
Contributor

mdouze commented Feb 1, 2024

FYI there is an open (but inactive) pull request for AMD:
#3126
Let's discuss how we can coordinate this effort.

@mdouze mdouze added the GPU label Feb 1, 2024
@iotamudelta
Copy link
Contributor Author

@mdouze Thanks for the pointer! I see your concerns in that PR, we'll talk internally how to address.

@iotamudelta
Copy link
Contributor Author

@mdouze We had some further discussions internally. @ItsPitt and I can maintain an AMD/HIP backend for now. AMD can contribute two servers for CI - same setup as the PyTorch CI.

I'd recommend closing #3126 as overcome by events. With that in mind, what's the preferred strategy to make this all happen - I assume we'll want multiple steps.

Current state of AMD/HIP is in https://github.com/iotamudelta/faiss/tree/rocm_support

@iotamudelta
Copy link
Contributor Author

Solved - ROCm support is integrated into tip of tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants