feat: llama.cpp gRPC C++ backend #1170

mudler · 2023-10-13T17:29:22Z

Description

This PR fixes #1154, fixes #1017 and fixes #944

It provides a C++ gRPC server to use in place of the go-llama binding. note it is much more flexible as now the Makefile controls directly the commit/branch/tag of llama.cpp. There are of course lots of optimizations that can be done, but this provides the first steps.

There are still some TODOs some required for a feature parity with the golang backend:

Expose lora parameters
Embeddings (feat parity)
Speculative sampling (feat parity)
Parallel sampling
Additionally expose things like logits bias and an infill endpoint

What supposedly should work:

Everything, except loras. Including also CUDA/etc support

Merge it
Merge and unblock ci: GPU tests #1116 so we test with GPUs automatically too
expose lora (now it needs also a scale factor)
move golang backends inside backends (nearby the cpp one)
move extra inside backends
make it feature-parity with the current llama backend
sunset the current llama in favor of this one
rename llama-stable to llama-ggml to indicate support for old ggml file format (prior to gguf)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler · 2023-10-14T16:27:30Z

reminder to myself: create issues for follow-ups that are not tackled here

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def

mudler added 3 commits October 13, 2023 19:46

wip: llama.cpp c++ gRPC server

5a517e3

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

make it work, attach it to the build process

89eddda

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

update deps

bed4417

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the llama_cpp_grpc branch from 2971356 to bed4417 Compare October 13, 2023 17:47

fix: add protobuf dep

15d5ab4

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the llama_cpp_grpc branch from ef84992 to 15d5ab4 Compare October 13, 2023 21:25

try fix protobuf on cmake

a9b1014

mudler force-pushed the llama_cpp_grpc branch from aa158bd to a9b1014 Compare October 13, 2023 21:44

cmake: workarounds

620184d

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the llama_cpp_grpc branch from d34f9ac to 620184d Compare October 14, 2023 08:25

add packages

bab9125

mudler force-pushed the llama_cpp_grpc branch from 21a8246 to bab9125 Compare October 14, 2023 08:41

mudler added 9 commits October 14, 2023 11:52

cmake: use fixed version of grpc

c4eb8fd

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

cmake(grpc): install locally

c200ea7

install grpc

72e00b1

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

install required deps for grpc on debian bullseye

3f9baf9

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

debug

7b5b5b8

debug

41c0db3

Fixups

17c49d0

no need to install cmake manually

8a73218

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

ci: fixup macOS

e62042a

use brew whenever possible

3ef28c2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the llama_cpp_grpc branch from f3fd6b1 to 3ef28c2 Compare October 14, 2023 17:15

mudler added 3 commits October 14, 2023 19:34

macOS fixups

836bd66

debug

3b5bdbc

fix container build

b1e0083

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the llama_cpp_grpc branch 3 times, most recently from 17ca6d2 to 18547ae Compare October 15, 2023 13:25

mudler force-pushed the llama_cpp_grpc branch 3 times, most recently from 8bc90ab to c46cc0c Compare October 15, 2023 14:47

workaround

944f86d

mudler force-pushed the llama_cpp_grpc branch from c46cc0c to 944f86d Compare October 15, 2023 16:33

try mac

ee3a464

https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def

mudler force-pushed the llama_cpp_grpc branch from afbcbb6 to ee3a464 Compare October 16, 2023 09:45

mudler added the enhancement New feature or request label Oct 16, 2023

Disable temp. arm64 docker image builds

1fb9fa6

mudler merged commit 1286942 into master Oct 16, 2023

mudler deleted the llama_cpp_grpc branch October 16, 2023 19:46

ggerganov mentioned this pull request Oct 17, 2023

add a grpc server for embedding and completion ggml-org/llama.cpp#1774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: llama.cpp gRPC C++ backend #1170

feat: llama.cpp gRPC C++ backend #1170

mudler commented Oct 13, 2023 •

edited

Loading

mudler commented Oct 14, 2023

feat: llama.cpp gRPC C++ backend #1170

feat: llama.cpp gRPC C++ backend #1170

Conversation

mudler commented Oct 13, 2023 • edited Loading

mudler commented Oct 14, 2023

mudler commented Oct 13, 2023 •

edited

Loading