Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: hip::host not working for NVIDIA Platforms #3748

Open
wme7 opened this issue Feb 14, 2025 · 1 comment
Open

[Issue]: hip::host not working for NVIDIA Platforms #3748

wme7 opened this issue Feb 14, 2025 · 1 comment

Comments

@wme7
Copy link

wme7 commented Feb 14, 2025

Problem Description

I'm working in porting a kaiju CUDA/C++17 project. To do this, I'm using an Nvidia-based workstation. Moreover, I'm using HIP 6.2.4 on top of a cudaToolKit 12.6, and to play safe, I'm using Ubuntu 24.04 as my OS. All packages installations went smoothly.

My target is to port my code on a AMD platform with ROCm 6.0.1 and a GPU architecture "gfx90a",

However, while doing this porting work, I started to notice that my build with HIP on my Nvidia platform always fail due to linkage errors. Investigating on the matter further, I started to suspect that the hip::host module is not doing its job as indicated in the HIP documentation: Consuming the HIP API in C++ code

I reported this issue on the discourse.cmake. But after inspecting the build trace and have a CMake maintainer being able to reproduce my issue, I suspect that the NVIDIA-backed implementation of HIP's cmake is to blame.

Here, I request an investigation of this issue.

Operating System

Ubuntu 24.04.1 LTS (Noble Numbat)

CPU

11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

GPU

NVIDIA Corporation GA100 [A100 SXM4 40GB] (rev a1)

ROCm Version

ROCm 6.2.4

ROCm Component

HIP

Steps to Reproduce

A full reproductive example is explained in this post on the discourse.cmake forum.

For the sake of completeness, I here also attach a full copy of the example to reproduce and study the issue:

devEvent_library.zip

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ /opt/rocm/bin/rocminfo --support
ROCk module is NOT loaded, possibly no GPU devices

Additional Information

I have verified that my example works well on AMD Platforms with MI200x GPUs.

Again, the issue is that when the present example is build on any NIVIDA Platform, the CMake build fail usually in linking process like no hip::host module exist in the process. Producing an output like:

$ cmake ..
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The HIP compiler identification is NVIDIA 12.6.85
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /usr/local/cuda-12.6/bin/nvcc - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
-- Configuring done (2.1s)
-- Generating done (0.0s)
-- Build files have been written to: /home/mdiaz/Depots/devLibrary/devEvent_library/build

$ make
[ 16%] Building HIP object CMakeFiles/Test.dir/library/library.cpp.o
[ 33%] Linking HIP shared library libTest.so
[ 33%] Built target Test
[ 50%] Building CXX object CMakeFiles/devEvent_library_clang.dir/main.cpp.o
In file included from /home/mdiaz/Depots/devLibrary/devEvent_library/library/library.h:6,
                 from /home/mdiaz/Depots/devLibrary/devEvent_library/main.cpp:1:
/home/mdiaz/Depots/devLibrary/devEvent_library_FAIL/library/common.h:16:10: fatal error: hip/hip_runtime.h: No such file or directory
   16 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/devEvent_library_clang.dir/build.make:76: CMakeFiles/devEvent_library_clang.dir/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/devEvent_library_clang.dir/all] Error 2
make: *** [Makefile:101: all] Error 2

Let me remark that this example builds and executes correctly on any AMD Platform, but fails to build on any NVIDIA Platform.

@ppanchad-amd
Copy link

Hi @wme7. Internal ticket has been created to investigate your issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants