You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to contribute a HIP backend to Faiss to support AMD GPUs. We have a working prototype that passes all unit tests on Navi hardware (6800XT, 7900XTX). The prototype features a statically hipified version of the existing CUDA backend with manual AMD specific changes (build system, PTX to amdgcn builtins, ...).
Assuming you are interested in this work, how would we go about upstreaming it best?
Would a static HIP backend (in the faiss::hip namespace) be preferred? If not, what architecture would preferable (e.g., overriding faiss::gpu)?
Unlike the CUDA backend, we ultimately need to support multiple "warp sizes" (wave fronts) at runtime - 32 for Navi and 64 for MI series. There are some questions pertaining to uses of kWarpSize that will not work out of the box (sizing shared memory, some of the replacements in cmake, static uses at dispatch sites, ...). We will need guidance on how such support would be architected and integrated best.
Lastly, we have done only minor performance analysis and/or tuning with the current prototype - are there any public benchmarks and/or particular protocols you have used for the other backends that we should use as a reference? We have used the GPU benchmark scripts with the SIFT data sets to assess performance so far.
As part of this, we're currently using a GPU_MAX_SELECTION_K of 1024 - what would a recommend protocol look like to decide for 1024 vs 2048 - we'd ideally like to use only one independent of HW/SW generation.
@mdouze We had some further discussions internally. @ItsPitt and I can maintain an AMD/HIP backend for now. AMD can contribute two servers for CI - same setup as the PyTorch CI.
I'd recommend closing #3126 as overcome by events. With that in mind, what's the preferred strategy to make this all happen - I assume we'll want multiple steps.
We would like to contribute a HIP backend to Faiss to support AMD GPUs. We have a working prototype that passes all unit tests on Navi hardware (6800XT, 7900XTX). The prototype features a statically hipified version of the existing CUDA backend with manual AMD specific changes (build system, PTX to amdgcn builtins, ...).
Assuming you are interested in this work, how would we go about upstreaming it best?
Would a static HIP backend (in the faiss::hip namespace) be preferred? If not, what architecture would preferable (e.g., overriding faiss::gpu)?
Unlike the CUDA backend, we ultimately need to support multiple "warp sizes" (wave fronts) at runtime - 32 for Navi and 64 for MI series. There are some questions pertaining to uses of kWarpSize that will not work out of the box (sizing shared memory, some of the replacements in cmake, static uses at dispatch sites, ...). We will need guidance on how such support would be architected and integrated best.
Lastly, we have done only minor performance analysis and/or tuning with the current prototype - are there any public benchmarks and/or particular protocols you have used for the other backends that we should use as a reference? We have used the GPU benchmark scripts with the SIFT data sets to assess performance so far.
As part of this, we're currently using a GPU_MAX_SELECTION_K of 1024 - what would a recommend protocol look like to decide for 1024 vs 2048 - we'd ideally like to use only one independent of HW/SW generation.
To browse the current prototype as a reference, please see https://github.com/iotamudelta/faiss/tree/wf32 .
The text was updated successfully, but these errors were encountered: