Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama.cpp benchmarks #10

Closed
lukestanley opened this issue Oct 5, 2023 · 0 comments
Closed

Llama.cpp benchmarks #10

lukestanley opened this issue Oct 5, 2023 · 0 comments

Comments

@lukestanley
Copy link

lukestanley commented Oct 5, 2023

Hi Hamel, you must have heard of llama.cpp, I saw your benchmarks on 03_inference.ipynb, but I couldn't see any mention of llama.cpp there. I believe it can run on the same GPU. I don't have that fancy GPU like that so I can't readily benchmark in the same way. TheBloke has this format: https://huggingface.co/TheBloke/Llama-2-7B-GGUF
Maybe you didn't consider it the same class of tool? But it can run a server also, including a OAI style HTTP API.
I found these benchmarks, which show MLC ahead of Llama.cpp, but I wonder if they had been setup to use GPU correctly. It might be worthwhile comparing to latest.
mlc-ai/mlc-llm#15 (comment)
https://github.com/mlc-ai/llm-perf-bench
Anyway thanks for your analysis, pivotal stuff!
@hamelsmu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants