Skip to content

Commit

Permalink
chore: update llama.cpp repo url to ggml-org (#3857)
Browse files Browse the repository at this point in the history
Signed-off-by: Wei Zhang <kweizh@tabbyml.com>
  • Loading branch information
zwpaper authored Feb 17, 2025
1 parent 3f46c8f commit 55e4ab4
Show file tree
Hide file tree
Showing 7 changed files with 8 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[submodule "crates/llama-cpp-server/llama.cpp"]
path = crates/llama-cpp-server/llama.cpp
url = https://github.com/ggerganov/llama.cpp
url = https://github.com/ggml-org/llama.cpp.git
2 changes: 1 addition & 1 deletion MODEL_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The **chat_template** field is optional. When it is present, it is assumed that

### ggml/

This directory contains binary files used by the [llama.cpp](https://github.com/ggerganov/llama.cpp) inference engine.
This directory contains binary files used by the [llama.cpp](https://github.com/ggml-org/llama.cpp) inference engine.
Tabby utilizes GGML for inference on `cpu`, `cuda` and `metal` devices.

Tabby saves GGUF model files in the format `model-{index}-of-{count}.gguf`, following the llama.cpp naming convention.
Expand Down
2 changes: 1 addition & 1 deletion ci/package-win.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ OUTPUT_NAME=${OUTPUT_NAME:-tabby_x86_64-windows-msvc-cuda117}
NAME=llama-${LLAMA_CPP_VERSION}-bin-win-${LLAMA_CPP_PLATFORM}
ZIP_FILE=${NAME}.zip

curl https://github.com/ggerganov/llama.cpp/releases/download/${LLAMA_CPP_VERSION}/${ZIP_FILE} -L -o ${ZIP_FILE}
curl https://github.com/ggml-org/llama.cpp/releases/download/${LLAMA_CPP_VERSION}/${ZIP_FILE} -L -o ${ZIP_FILE}
unzip ${ZIP_FILE} -d ${OUTPUT_NAME}

pushd ${OUTPUT_NAME}
Expand Down
4 changes: 2 additions & 2 deletions crates/http-api-bindings/src/embedding/llama.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ pub struct LlamaCppEngine {
// Llama.cpp has updated the endpoint from `/embedding` to `/embeddings`,
// and wrapped both the response and embedding in an array from b4357.
//
// Ref: https://github.com/ggerganov/llama.cpp/pull/10861
// Ref: https://github.com/ggml-org/llama.cpp/pull/10861
before_b4356: bool,

client: reqwest::Client,
Expand Down Expand Up @@ -70,7 +70,7 @@ impl Embedding for LlamaCppEngine {
//
// This serves as a temporary solution to attempt the request up to three times.
//
// Track issue: https://github.com/ggerganov/llama.cpp/issues/11411
// Track issue: https://github.com/ggml-org/llama.cpp/issues/11411
let strategy = ExponentialBackoff::from_millis(100).map(jitter).take(3);
let response = RetryIf::spawn(
strategy,
Expand Down
2 changes: 1 addition & 1 deletion experimental/model-converter/update-llama-model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ if [ -z "${ACCESS_TOKEN}" ]; then
fi

prepare_llama_cpp() {
git clone https://github.com/ggerganov/llama.cpp.git
git clone https://github.com/ggml-org/llama.cpp.git
pushd llama.cpp

git checkout 6961c4bd0b5176e10ab03b35394f1e9eab761792
Expand Down
2 changes: 1 addition & 1 deletion website/docs/administration/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ You can configure how Tabby connects with LLM models by editing the `~/.tabby/co
- **Chat Model**: The Chat model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
- **Embedding Model**: The Embedding model is used to generate embeddings for text data, by default Tabby uses the `Nomic-Embed-Text` model.

Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.
Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggml-org/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.

Below is an example of how to configure the model settings in the `~/.tabby/config.toml` file:

Expand Down
2 changes: 1 addition & 1 deletion website/docs/references/models-http-api/llama.cpp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import Collapse from '@site/src/components/Collapse';

# llama.cpp

[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.
[llama.cpp](https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.

## Chat model

Expand Down

0 comments on commit 55e4ab4

Please sign in to comment.