-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm] initial port #3126
[ROCm] initial port #3126
Conversation
- set(CMAKE_CXX_STANDARD 17) - add_definitions(-DUSE_ROCM) - hipify.sh runs hipify-perl per file in parallel - hipify.sh only replaces the hipified file if it changes (avoids unnecessary rebuilds) - USE_ROCM section of faiss/gpu/utils/DeviceDefs.cuh - USE_ROCM section of faiss/gpu/utils/PtxUtils.cuh
Also put runtime errors in setBitfield for safeguard.
This should be handled by hipify-perl.
Currently broken build.
must instead query for warpSize of the current device
cannot use kWarpSize in host code
It is excellent news, that GPU Faiss is ported to AMD devices !! We need to validate the diff:
In the meantime, could you give a high-level overview of what's supported from Faiss on AMD GPUs: which index types are supported, any GPU vs. CPU benchmarks (even rough), install requirements... Thanks! |
@jeffdaily Did you have a chance to run any benchmarks or the goal was to ensure that Faiss just works? Thanks. |
This initial porting was functional only. All unit tests pass for CUDA/V100 (as of ee8aea9) and for AMD Radeon PRO W7900 (warp size 32) and AMD Instinct MI250X/MI250 (warp size 64). Do you have instructions for some benchmarks I could run? |
The recent clang-format changes to fix the ci/circleci:Format job causes a lot of whitespace changes that I was trying to avoid. |
@jeffdaily Thank you for this PR, it's great to see Faiss running on AMD devices! Can you clarify your intentions/objective re Faiss w/ ROCm? There are various degrees of results you could aim for:
Depending on your objectives and the level of investment that you intend to make, it would be worth discussing the best way to collaborate with AMD. (As an example, we have been working closely on Nvidia on their integration of RAFT with reasonable clarity on the points above.) |
@algoriddle I think the claim is at minimum (1), but closer to (2). This PR does not add support for the python bindings and it does not add any CI jobs. It does provide ROCm-specific build instructions, all tests were personally verified to pass on AMD Radeon™ Pro W7900, AMD Instinct™ MI250X, and NVIDIA V100 --- hopefully that covers the "no other platform is affected". We could add a CI job to this PR that performs a ROCm build of faiss, if desired, but for any testing we would need to identify some publicly-available resource with AMD GPUs. AMD is supporting customers that wanted this faiss support, so I would expect additional support going forward. |
I will review this later today or shortly after the holiday weekend. We are super excited to have AMD GPUs supported for Faiss, but I think our main concern is that this ends up becoming an orphaned feature which clutters up the code and is liable to code rot over time without both AMD (and Meta) pushing this all the way through to @algoriddle 's (4) on his list. I don't think this is worth accepting unless we have a plan to push this through to first class support personally, though Matthijs or Gergely might be ok with (2) or (3). To what degree would AMD be willing to help us get to (4) in the list above, beyond this PR? |
Superseded by #3462 |
Prior to building the ROCm port, run the hipify script to create the gpu-rocm directory.
All unit tests are passing for AMD GPUs featuring warp size 32 (gfx1100) and warp size 64 (gfx90a).
Though FAISS was written for warpSize == 32, it was possible to adapt the existing code for warpSize 64 using
if constexpr
in many places to separate the 32 vs 64 logic at compile-time. In some cases, this results in empty kernel implementations but this was necessary to avoid missing symbols at link time -- the code that launches these empty kernels performs a runtime check for the warpSize such that the empty kernels will never be launched. The kWarpSize constexpr should only be used in device code. For host code, it is necessary to query the current device's warpSize at runtime when calculating grid and block sizes.