You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Hamel, you must have heard of llama.cpp, I saw your benchmarks on 03_inference.ipynb, but I couldn't see any mention of llama.cpp there. I believe it can run on the same GPU. I don't have that fancy GPU like that so I can't readily benchmark in the same way. TheBloke has this format: https://huggingface.co/TheBloke/Llama-2-7B-GGUF
Maybe you didn't consider it the same class of tool? But it can run a server also, including a OAI style HTTP API.
I found these benchmarks, which show MLC ahead of Llama.cpp, but I wonder if they had been setup to use GPU correctly. It might be worthwhile comparing to latest. mlc-ai/mlc-llm#15 (comment) https://github.com/mlc-ai/llm-perf-bench
Anyway thanks for your analysis, pivotal stuff! @hamelsmu
The text was updated successfully, but these errors were encountered:
Hi Hamel, you must have heard of llama.cpp, I saw your benchmarks on 03_inference.ipynb, but I couldn't see any mention of llama.cpp there. I believe it can run on the same GPU. I don't have that fancy GPU like that so I can't readily benchmark in the same way. TheBloke has this format: https://huggingface.co/TheBloke/Llama-2-7B-GGUF
Maybe you didn't consider it the same class of tool? But it can run a server also, including a OAI style HTTP API.
I found these benchmarks, which show MLC ahead of Llama.cpp, but I wonder if they had been setup to use GPU correctly. It might be worthwhile comparing to latest.
mlc-ai/mlc-llm#15 (comment)
https://github.com/mlc-ai/llm-perf-bench
Anyway thanks for your analysis, pivotal stuff!
@hamelsmu
The text was updated successfully, but these errors were encountered: