Llama.cpp benchmarks #10

lukestanley · 2023-10-05T09:50:38Z

Hi Hamel, you must have heard of llama.cpp, I saw your benchmarks on 03_inference.ipynb, but I couldn't see any mention of llama.cpp there. I believe it can run on the same GPU. I don't have that fancy GPU like that so I can't readily benchmark in the same way. TheBloke has this format: https://huggingface.co/TheBloke/Llama-2-7B-GGUF
Maybe you didn't consider it the same class of tool? But it can run a server also, including a OAI style HTTP API.
I found these benchmarks, which show MLC ahead of Llama.cpp, but I wonder if they had been setup to use GPU correctly. It might be worthwhile comparing to latest.
mlc-ai/mlc-llm#15 (comment)
https://github.com/mlc-ai/llm-perf-bench
Anyway thanks for your analysis, pivotal stuff!
@hamelsmu

hamelsmu closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama.cpp benchmarks #10

Llama.cpp benchmarks #10

lukestanley commented Oct 5, 2023 •

edited

Loading

Llama.cpp benchmarks #10

Llama.cpp benchmarks #10

Comments

lukestanley commented Oct 5, 2023 • edited Loading

lukestanley commented Oct 5, 2023 •

edited

Loading