Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: llama.cpp gRPC C++ backend #1170

Merged
merged 23 commits into from
Oct 16, 2023
Merged

feat: llama.cpp gRPC C++ backend #1170

merged 23 commits into from
Oct 16, 2023

Conversation

mudler
Copy link
Owner

@mudler mudler commented Oct 13, 2023

Description

This PR fixes #1154, fixes #1017 and fixes #944

It provides a C++ gRPC server to use in place of the go-llama binding. note it is much more flexible as now the Makefile controls directly the commit/branch/tag of llama.cpp. There are of course lots of optimizations that can be done, but this provides the first steps.

There are still some TODOs some required for a feature parity with the golang backend:

  • Expose lora parameters
  • Embeddings (feat parity)
  • Speculative sampling (feat parity)
  • Parallel sampling
  • Additionally expose things like logits bias and an infill endpoint

What supposedly should work:

Everything, except loras. Including also CUDA/etc support

Next:

  • Merge it
  • Merge and unblock ci: GPU tests #1116 so we test with GPUs automatically too
  • expose lora (now it needs also a scale factor)
  • move golang backends inside backends (nearby the cpp one)
  • move extra inside backends
  • make it feature-parity with the current llama backend
  • sunset the current llama in favor of this one
  • rename llama-stable to llama-ggml to indicate support for old ggml file format (prior to gguf)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler
Copy link
Owner Author

mudler commented Oct 14, 2023

reminder to myself: create issues for follow-ups that are not tackled here

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the llama_cpp_grpc branch 3 times, most recently from 17ca6d2 to 18547ae Compare October 15, 2023 13:25
@mudler mudler force-pushed the llama_cpp_grpc branch 3 times, most recently from 8bc90ab to c46cc0c Compare October 15, 2023 14:47
@mudler mudler added the enhancement New feature or request label Oct 16, 2023
@mudler mudler merged commit 1286942 into master Oct 16, 2023
@mudler mudler deleted the llama_cpp_grpc branch October 16, 2023 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant