AMD GPU support via llama.cpp HIPBLAS #92

jeromew · 2023-12-13T19:20:51Z

Hello,

First of all thank you for your work on llamafile it seems like a great idea to simplify model usage.

It seems from the readme that at this stage llamafile does not support AMD GPUs.
The cuda.c in llamafile backend seems dedicated to cuda while ggml-cuda.h in llama.cpp has a GGML_USE_HIPBLAS option for ROCm support. ROCm support is now officially supported by llama.cpp according to their README about hipBLAS

I understand that ROCm support was maybe not priority#1 for llamafile but I was wondering if you had already tried to use the HIPBLAS llama.cpp option and have some insights on the work that would need to be done in llamafile in order to add this GPU family target.

from what I understand, the llama.cpp would take care of the GPU side of things, and llamafile would need to be modified to JIT-compile llama.cpp with the correct flags and maybe need a specific toolchain for the compilation (At least ROCm SDK).

Thanks for sharing your experience on this

jart · 2023-12-13T20:00:26Z

I've never used an AMD GPU before, so I can't answer questions about them. However we're happy to consider your request that llamafile support them. I'd encourage anyone else who wants this to leave a comment saying so. That'll help us gauge the interest level and determine what to focus on.

franalbani · 2023-12-15T05:16:52Z

I would also like this!

Thanks! Your work is mind-blowing.

mildwood · 2023-12-17T15:24:16Z

I'm also interested in using ROCm as I don't want to pay double for Nvidia!

jesserizzo · 2023-12-17T15:35:04Z

Agreed, I'd also love AMD support. In my opinion it also fits with the general mission of this project. Don't let big tech companies get a monopoly on LLMs, and also don't let Nvidia get a monopoly on AI computing. I dunno. Thanks for all the hard work, this is great.

stlhood · 2023-12-18T02:05:44Z

Thanks for the suggestions, folks. We're going to look into this.

Any recommendations on a specific AMD card we should use for dev and testing? We could pick up an RX 7900 XTX (i.e. basically a RTX 4090 equivalent) but I want to make sure the model we pick is broadly representative in terms of hardware support.

lovenemesis · 2023-12-18T08:51:44Z

AMD officially only support ROCm on one or two consumer hardware level GPU, RX7900XTX being one of them, with limited Linux distribution.

However, by following the guide here on Fedora, I managed to get both RX 7800XT and the integrated GPU inside Ryzen 7840U running ROCm perfectly fine. Those are the mid and lower models of their RDNA3 lineup. So, I think it's fair to say all RDNA3 ones would work.

Judging from other ROCm related topics on PyTorch, it seems that people with RDNA2 (RX6XXX) series cards are the majority folks. Probably due to the competitive pricing after crypto mining boom?

jammm · 2023-12-18T09:37:52Z

Thanks for the suggestions, folks. We're going to look into this.

Any recommendations on a specific AMD card we should use for dev and testing? We could pick up an RX 7900 XTX (i.e. basically a RTX 4090 equivalent) but I want to make sure the model we pick is broadly representative in terms of hardware support.

Any RDNA3 card should work fine. It's just the "official" support that's been navi31 based GPUs.
You can also compile and run llama.cpp just fine on Windows using the HIP SDK. My primary OS is Windows and I could get this port running myself but I'm having difficulty setting up cosmos bash (getting cosmo++ permission denied errors). If someone can help me setup cosmos bash properly, I should be able to get llamafile up and running on RDNA3 GPUs within a couple hours or so.

Ideally a cmake based pipeline would be best for Windows support, but I'd understand if the makefile paradigm is what cosmopolitan's built on.

jammm · 2023-12-18T11:41:06Z

So I managed to get llamafile compiling ggml-cuda.so using HIP but it fails at runtime:

building ggml-cuda with nvcc -arch=native...
/usr/bin/hipcc -march=native --shared -use_fast_math -fPIC -O3 -march=native -mtune=native -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_HIPBLAS -o /home/rpr/.llamafile/ggml-cuda.so.zmifde /home/rpr/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
/home/rpr/.llamafile/ggml-cuda.so
libamdhip64.so.5: cannot enable executable stack as shared object requires: Invalid argument: failed to load library
warning: GPU offload not supported on this platform; GPU related options will be ignored
warning: you might need to install xcode (macos) or cuda (windows, linux, etc.) check the output above to see why support wasn't linked

normal dlopen() works fine with ggml-cuda.so. I'm not sure why cosmo_dlopen fails here. CC @jart any ideas?

Attaching strace log:
strace_llamafile_hip.txt

Using Ubuntu 22.04.2 LTS.

EDIT: Looks like this error goes away by setting the execstack bit on amdhip64 runtime library
sudo execstack -c /opt/rocm/lib/libamdhip64.so.5

EDIT EDIT: PR made at #122. Works fine on my navi31 machine running Ubuntu 22.04. I expect it to work fine on Windows as well, though it's not tested yet.

franalbani · 2023-12-18T17:31:59Z

Thanks for the suggestions, folks. We're going to look into this.

Any recommendations on a specific AMD card we should use for dev and testing? We could pick up an RX 7900 XTX (i.e. basically a RTX 4090 equivalent) but I want to make sure the model we pick is broadly representative in terms of hardware support.

I don't have enough experience to give advise, but I can contribute testing on a Thinkpad T495 with AMD Radeon Vega 10 Graphics.

batfasturd · 2023-12-21T22:08:34Z

I would also like this feature added since it is technically possible. I've used my 6750XT successfully with Llama.cpp and linux.

github12101 · 2023-12-21T22:18:17Z

User with Radeon 6800XT here, will be more than happy to test things out, in order to have Radeon GPU support. I am using Debian GNU/Linux.

jammm · 2023-12-22T08:48:23Z

@github12101 @batfasturd try this PR for linux? #122

I've compiled the binary. Let me attach it
llamafile_hip_linux.zip

github12101 · 2023-12-22T17:30:12Z

@github12101 @batfasturd try this PR for linux? #122

I've compiled the binary. Let me attach it llamafile_hip_linux.zip

My apologies but I don't know how to install and launch this. I tried to run it, but it throws an error. Any guideline/tutorial would be great to have.

jammm · 2023-12-22T17:41:27Z

@github12101 @batfasturd try this PR for linux? #122
I've compiled the binary. Let me attach it llamafile_hip_linux.zip

My apologies but I don't know how to install and launch this. I tried to run it, but it throws an error. Any guideline/tutorial would be great to have.

Can you share the error here?
In order for this to work, you need to have ROCm installed on your linux system. Is that installed?

jesserizzo · 2023-12-23T14:57:01Z

@github12101 @batfasturd try this PR for linux? #122

I've compiled the binary. Let me attach it llamafile_hip_linux.zip

Should this work on AMD 6000 gpus?
Edit: I tried it on my 6600 and there's no errors but it doesn't seem to be doing anything. I thought I read somewhere that it only works on 7000 gpus, but one of the project maintainers commented on this PR #122 that they were going to test on a 6800, so now I'm confused.

stlhood · 2024-01-02T02:36:44Z

Just a quick update that we now have an RDNA2 card, and an RDNA3 card is on the way. @jart is actively working on adding this support!

jart · 2024-01-05T20:45:25Z

I've just cut a 0.5 release which features outstanding AMD support for Windows users. Let me tell you about how cool it is. If I download and run the LLaVA llamafile, being certain to pass the -ngl 35 flag to turn GPU support on:

curl -L -o llava-v1.5-7b-q4.llamafile https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile
.\llava-v1.5-7b-q4.llamafile -ngl 35 --nocompile

Then it takes ~2 seconds to start up (on a cold start!) before a tab with the web ui pops up in my browser. I can then upload a picture and ask LLaVA a question, and I get a response back in a few seconds. That's a good deal, since I'm doing it with a $300 AMD Radeon RX 6800 graphics card.

Here's the best part. Support only depends on the graphics card driver. The HIP SDK is for developers so I think it's nice that we won't need to ask users to install it in order to use llamafile. You can if you want to. In which case llamafile will compile a better GPU module that links hipBLAS (instead of tinyBLAS) the first time you run your llamafile. That will cause inference to go faster. Although it takes about 30 seconds to run the clang++ command that comes with the ROCm SDK.

I'm going to have a Linux computer with the RDNA4 card @stlhood hood mentioned soon, probably by mid-month. We should be able to have excellent AMD support there too, although installing the AMD HIP tools will need to be a requirement, since the Linux and BSDs platforms don't have the same kind of binary friendliness.

* Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

bennmann · 2024-01-09T20:21:13Z

I've just cut a 0.5 release which features outstanding AMD support for Windows users. Let me tell you about how cool it is. If I download and run the LLaVA llamafile, being certain to pass the -ngl 35 flag to turn GPU support on:
curl -L -o llava-v1.5-7b-q4.llamafile https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile
.\llava-v1.5-7b-q4.llamafile -ngl 35 --nocompile

My 6900 XT on windows "just worked", pulling 143 watts and producing a nice llava 7b output. Thank you!!

{"timestamp":1704831422,"level":"INFO","function":"log_server_request","line":2741,"message":"request","remote_addr":"","remote_port":-1,"status":200,"method":"POST","path":"/completion","params":{}}
slot 0 released (153 tokens in cache)
slot 0 is processing [task id: 2]
slot 0 : in cache: 47 tokens | to process: 22 tokens
slot 0 : kv cache rm - [47, end)

print_timings: prompt eval time =      88.55 ms /    22 tokens (    4.02 ms per token,   248.46 tokens per second)
print_timings:        eval time =    7817.18 ms /   400 runs   (   19.54 ms per token,    51.17 tokens per second)
print_timings:       total time =    7905.73 ms
slot 0 released (470 tokens in cache)```

jart · 2024-01-09T23:19:52Z

Happy to hear it @bennmann!

Also there's more good news. I've just shipped a llamafile v0.6 that adds support for AMD GPUs on top of Linux too. Unlike Windows, Linux users need to install the AMD ROCm SDK. There's no prebuilt binary. llamafile will build your GPU support the first time you run your llamafile, using the hipcc compiler. I've tested it with a Radeon RX 7900 XTX. The v0.6 release also adds support for multiple GPUs. I know for certain it works with NVIDIA. I have a second Radeon coming in the mail so I'll be able to test that works with AMD too.

With that said, I think this issue is satisfactorily solved. Please report any issues or suboptimal experiences you have. The only real area of improvement I know we need at the moment, is that our tinyBLAS kernels don't go as fast on AMD as they do on NVIDIA where we developed them. We'll be looking into that soon. Note that this only impacts Windows users who haven't installed the HIP ROCm SDK on their computers. That's what you want, if your goal is to get maximum performance, since the hipBLAS library doesn't come with the video drivers that Windows installs.

Enjoy!

lovenemesis · 2024-01-10T02:49:17Z

I'm having issue getting GPU support on Fedora 39 with 0.6 release.

HSA_OVERRIDE_GFX_VERSION=11.0.0 ./llamafile-0.6 -ngl 35 -m mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf 
initializing gpu module...
note: won't compile AMD GPU support because $HIP_PATH/bin/clang++ is missing
prebuilt binary /zip/ggml-rocm.so not found
prebuilt binary /zip/ggml-cuda.so not found
fatal error: --n-gpu-layers 35 was passed but no gpus were found

Meanwhile, I do have clang++ and hipcc available in $PATH.

sudo rpm -ql hipcc clang
/usr/bin/hipcc
/usr/bin/hipcc.pl
/usr/bin/hipconfig
/usr/bin/hipconfig.pl
/usr/share/perl5/vendor_perl/hipvars.pm
/usr/bin/clang
/usr/bin/clang++
/usr/bin/clang++-17
/usr/bin/clang-17
/usr/bin/clang-cl
/usr/bin/clang-cpp
/usr/lib/.build-id
/usr/lib/.build-id/32
/usr/lib/.build-id/32/e94d93e9ba24c19eb5ffdd7288d637d7cda793
/usr/lib/.build-id/32/e94d93e9ba24c19eb5ffdd7288d637d7cda793.1
/usr/lib/.build-id/32/e94d93e9ba24c19eb5ffdd7288d637d7cda793.2
/usr/lib/.build-id/32/e94d93e9ba24c19eb5ffdd7288d637d7cda793.3
/usr/share/licenses/clang
/usr/share/licenses/clang/LICENSE.TXT
/usr/share/man/man1/clang++-17.1.gz
/usr/share/man/man1/clang++.1.gz
/usr/share/man/man1/clang-17.1.gz
/usr/share/man/man1/clang.1.gz

Anything else do I need?

AwesomeApple12 · 2024-01-10T07:10:20Z

I've just cut a 0.5 release which features outstanding AMD support for Windows users. Let me tell you about how cool it is. If I download and run the LLaVA llamafile, being certain to pass the -ngl 35 flag to turn GPU support on:
curl -L -o llava-v1.5-7b-q4.llamafile https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile
.\llava-v1.5-7b-q4.llamafile -ngl 35 --nocompile

Can confirm working really well with a 5700xt on windows 11.

jeromew · 2024-01-10T09:43:02Z

Can confirm it works really well with AMD Radeon RX 6700 XT on windows 11 (~36 tokens/sec versus ~5.6 tokens/sec on CPU only) ! Thank you for landing the AMD support !

Note that windows complained about it containing a trojan

Detected: Trojan:Win32/Sabsik.FL.A!ml
File: D:\user\dev\llava-v1.5-7b-q4.llamafile

I checked that the sha256 was equal to the one declared on hugginface 9c37a9a8e3f067dea8c028db9525b399fc53b267667ed9c2a60155b1aa75 and went through with it but that was a bit surprising. Am I the only one getting this warning ?

The parameters are "-ngl 35 --nocompile" for the tinyBLAS solution but what are the parameters if I install ROCm ?

Amine-Smahi · 2024-02-04T22:40:46Z

Is there a documentation somewhere to guide us to run llamafile on ubuntu with AMD gpu ?

Dark-Thoughts · 2024-02-19T14:00:24Z

Tried to get my 6650 XT to work under Nobara (Fedora based) by installing rocm-hip-sdk and got this error after I think it failed to properly build on first launch:

./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: hipInfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
llamafile_log_command: /usr/bin/rocminfo
llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=gfx1032 -march=native -mtune=native -use_fast_math -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/*****/.llamafile/ggml-rocm.so.ikigfn /home/*****/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q4_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
mul_mat_q5_K(
^
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q6_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
static __global__ void soft_max_f32(const float * x, const float * y, float * dst, const int ncols_par, const int nrows_y, const float scale) {
                       ^
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
14 warnings generated when compiling for gfx1032.
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
2 warnings generated when compiling for host.
link_cuda_dso: note: dynamically linking /home/*****/.llamafile/ggml-rocm.so
ggml_cuda_link: welcome to ROCm SDK with hipBLAS
link_cuda_dso: GPU support linked

rocBLAS error: Cannot read /opt/rocm-5.6.1/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Launching through the gpu again just gives me the last error now. No idea what else I'm missing or what I did wrong here but it's certainly not an easy experience under Linux with AMD gpus (previous it would just default to CPU mode, which is way too slow to be usable).

nameiwillforget · 2024-02-25T23:41:07Z

I'm getting a very similar bug:

[alex@Arch wizard]$ sh wizardcoder-python-34b-v1.0.Q5_K_M.llamafile -ngl 9999 
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++ does not exist
get_rocm_bin_path: note: hipInfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
llamafile_log_command: /opt/rocm/bin/rocminfo
llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=gfx1032 -march=native -mtune=native -use_fast_math -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/alex/.llamafile/ggml-rocm.so.r2whcc /home/alex/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
/home/alex/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/alex/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/alex/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q4_K(
    ^
/home/alex/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unabl
e to perform the requested transformation; the transformation might be disabled or specified a
s part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unabl
e to perform the requested transformation; the transformation might be disabled or specified a
s part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
mul_mat_q5_K(
^
/home/alex/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unabl
e to perform the requested transformation; the transformation might be disabled or specified a
s part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unabl
e to perform the requested transformation; the transformation might be disabled or specified a
s part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q6_K(
    ^
/home/alex/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unabl
e to perform the requested transformation; the transformation might be disabled or specified a
s part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
static __global__ void soft_max_f32(const float * x, const float * y, float * dst, const int n
cols_par, const int nrows_y, const float scale) {
                       ^
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/alex/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unab
le to perform the requested transformation; the transformation might be disabled or specified 
as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
14 warnings generated when compiling for gfx1032.
/home/alex/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/alex/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
2 warnings generated when compiling for host.
link_cuda_dso: note: dynamically linking /home/alex/.llamafile/ggml-rocm.so
wizardcoder-python-34b-v1.0.Q5_K_M.llamafile: /usr/src/debug/hip-runtime-amd/clr-rocm-6.0.0/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
cosmoaddr2line /home/alex/.local/bin/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile 7fe973ea932c 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0 7fe973e1a0e0

0x00007fe973ea932c: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0
0x00007fe973e1a0e0: ?? ??:0

10008004-10008006 rw-pa-       3x automap 192kB w/ 64kB hole
10008008-10008011 rw-pa-      10x automap 640kB w/ 14gB hole
10040060-10098eec r--s-- 364'173x automap 22gB w/ 96tB hole
6fd00004-6fd0000c rw-paF       9x zipos 576kB w/ 64gB hole
6fe00004-6fe00004 rw-paF       1x g_fds 64kB
# 22gB total mapped memory
/home/alex/.local/bin/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile -m wizardcoder-python-34b-v1.0.Q5_K_M.gguf -c 0 -ngl 9999 
Aborted (core dumped)

Trubador · 2025-01-22T23:35:19Z

I got it working out of the box with an AMD Radeon 6800M mobile GPU :)
It seems to run nicely and 10 times faster than CPU.

jart added the enhancement label Dec 13, 2023

jart assigned stlhood Dec 13, 2023

jart added request to lend support and removed enhancement labels Dec 13, 2023

jammm mentioned this issue Dec 18, 2023

Support ROCm #122

Merged

stlhood assigned jart Jan 2, 2024

jart closed this as completed Jan 9, 2024

lovenemesis mentioned this issue Jan 10, 2024

AMD gfx1103 laptop GPU returning HIPBLAS_STATUS_UNKNOWN #188

Closed

jart added the amd label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU support via llama.cpp HIPBLAS #92

AMD GPU support via llama.cpp HIPBLAS #92

jeromew commented Dec 13, 2023

jart commented Dec 13, 2023

franalbani commented Dec 15, 2023

mildwood commented Dec 17, 2023

jesserizzo commented Dec 17, 2023

stlhood commented Dec 18, 2023

lovenemesis commented Dec 18, 2023

jammm commented Dec 18, 2023 •

edited

Loading

jammm commented Dec 18, 2023 •

edited

Loading

franalbani commented Dec 18, 2023

batfasturd commented Dec 21, 2023

github12101 commented Dec 21, 2023

jammm commented Dec 22, 2023

github12101 commented Dec 22, 2023

jammm commented Dec 22, 2023

jesserizzo commented Dec 23, 2023 •

edited

Loading

stlhood commented Jan 2, 2024

jart commented Jan 5, 2024

bennmann commented Jan 9, 2024 •

edited

Loading

jart commented Jan 9, 2024

lovenemesis commented Jan 10, 2024

AwesomeApple12 commented Jan 10, 2024 •

edited

Loading

jeromew commented Jan 10, 2024

Amine-Smahi commented Feb 4, 2024

Dark-Thoughts commented Feb 19, 2024

nameiwillforget commented Feb 25, 2024 •

edited

Loading

Trubador commented Jan 22, 2025

AMD GPU support via llama.cpp HIPBLAS #92

AMD GPU support via llama.cpp HIPBLAS #92

Comments

jeromew commented Dec 13, 2023

jart commented Dec 13, 2023

franalbani commented Dec 15, 2023

mildwood commented Dec 17, 2023

jesserizzo commented Dec 17, 2023

stlhood commented Dec 18, 2023

lovenemesis commented Dec 18, 2023

jammm commented Dec 18, 2023 • edited Loading

jammm commented Dec 18, 2023 • edited Loading

franalbani commented Dec 18, 2023

batfasturd commented Dec 21, 2023

github12101 commented Dec 21, 2023

jammm commented Dec 22, 2023

github12101 commented Dec 22, 2023

jammm commented Dec 22, 2023

jesserizzo commented Dec 23, 2023 • edited Loading

stlhood commented Jan 2, 2024

jart commented Jan 5, 2024

bennmann commented Jan 9, 2024 • edited Loading

jart commented Jan 9, 2024

lovenemesis commented Jan 10, 2024

AwesomeApple12 commented Jan 10, 2024 • edited Loading

jeromew commented Jan 10, 2024

Amine-Smahi commented Feb 4, 2024

Dark-Thoughts commented Feb 19, 2024

nameiwillforget commented Feb 25, 2024 • edited Loading

Trubador commented Jan 22, 2025

jammm commented Dec 18, 2023 •

edited

Loading

jammm commented Dec 18, 2023 •

edited

Loading

jesserizzo commented Dec 23, 2023 •

edited

Loading

bennmann commented Jan 9, 2024 •

edited

Loading

AwesomeApple12 commented Jan 10, 2024 •

edited

Loading

nameiwillforget commented Feb 25, 2024 •

edited

Loading