chore: update llama.cpp repo url to ggml-org (#3857)

Signed-off-by: Wei Zhang <kweizh@tabbyml.com>
TabbyML · Feb 17, 2025 · 55e4ab4 · 55e4ab4
1 parent 3f46c8f
commit 55e4ab4
Show file tree

Hide file tree

Showing 7 changed files with 8 additions and 8 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,3 @@
 [submodule "crates/llama-cpp-server/llama.cpp"]
 	path = crates/llama-cpp-server/llama.cpp
-	url = https://github.com/ggerganov/llama.cpp
+	url = https://github.com/ggml-org/llama.cpp.git
diff --git a/MODEL_SPEC.md b/MODEL_SPEC.md
@@ -28,7 +28,7 @@ The **chat_template** field is optional. When it is present, it is assumed that
 
 ### ggml/
 
-This directory contains binary files used by the [llama.cpp](https://github.com/ggerganov/llama.cpp) inference engine.
+This directory contains binary files used by the [llama.cpp](https://github.com/ggml-org/llama.cpp) inference engine.
 Tabby utilizes GGML for inference on `cpu`, `cuda` and `metal` devices.
 
 Tabby saves GGUF model files in the format `model-{index}-of-{count}.gguf`, following the llama.cpp naming convention.

diff --git a/ci/package-win.sh b/ci/package-win.sh
@@ -13,7 +13,7 @@ OUTPUT_NAME=${OUTPUT_NAME:-tabby_x86_64-windows-msvc-cuda117}
 NAME=llama-${LLAMA_CPP_VERSION}-bin-win-${LLAMA_CPP_PLATFORM}
 ZIP_FILE=${NAME}.zip
 
-curl https://github.com/ggerganov/llama.cpp/releases/download/${LLAMA_CPP_VERSION}/${ZIP_FILE} -L -o ${ZIP_FILE}
+curl https://github.com/ggml-org/llama.cpp/releases/download/${LLAMA_CPP_VERSION}/${ZIP_FILE} -L -o ${ZIP_FILE}
 unzip ${ZIP_FILE} -d ${OUTPUT_NAME}
 
 pushd ${OUTPUT_NAME}

diff --git a/crates/http-api-bindings/src/embedding/llama.rs b/crates/http-api-bindings/src/embedding/llama.rs
@@ -15,7 +15,7 @@ pub struct LlamaCppEngine {
     // Llama.cpp has updated the endpoint from `/embedding` to `/embeddings`,
     // and wrapped both the response and embedding in an array from b4357.
     //
-    // Ref: https://github.com/ggerganov/llama.cpp/pull/10861
+    // Ref: https://github.com/ggml-org/llama.cpp/pull/10861
     before_b4356: bool,
 
     client: reqwest::Client,
@@ -70,7 +70,7 @@ impl Embedding for LlamaCppEngine {
         //
         // This serves as a temporary solution to attempt the request up to three times.
         //
-        // Track issue: https://github.com/ggerganov/llama.cpp/issues/11411
+        // Track issue: https://github.com/ggml-org/llama.cpp/issues/11411
         let strategy = ExponentialBackoff::from_millis(100).map(jitter).take(3);
         let response = RetryIf::spawn(
             strategy,

diff --git a/experimental/model-converter/update-llama-model.sh b/experimental/model-converter/update-llama-model.sh
@@ -13,7 +13,7 @@ if [ -z "${ACCESS_TOKEN}" ]; then
 fi
 
 prepare_llama_cpp() {
-  git clone https://github.com/ggerganov/llama.cpp.git
+  git clone https://github.com/ggml-org/llama.cpp.git
   pushd llama.cpp
 
   git checkout 6961c4bd0b5176e10ab03b35394f1e9eab761792

diff --git a/website/docs/administration/model.md b/website/docs/administration/model.md
@@ -6,7 +6,7 @@ You can configure how Tabby connects with LLM models by editing the `~/.tabby/co
 - **Chat Model**: The Chat model is adept at producing conversational replies and is broadly compatible with OpenAI's standards.
 - **Embedding Model**: The Embedding model is used to generate embeddings for text data, by default Tabby uses the `Nomic-Embed-Text` model.
 
-Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggerganov/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.
+Each of the model types can be configured with either a local model or a remote model provider. For local models, Tabby will initiate a subprocess (powered by [llama.cpp](https://github.com/ggml-org/llama.cpp)) and connect to the model via an HTTP API. For remote models, Tabby will connect directly to the model provider's API.
 
 Below is an example of how to configure the model settings in the `~/.tabby/config.toml` file:
 

diff --git a/website/docs/references/models-http-api/llama.cpp.mdx b/website/docs/references/models-http-api/llama.cpp.mdx
@@ -2,7 +2,7 @@ import Collapse from '@site/src/components/Collapse';
 
 # llama.cpp
 
-[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.
+[llama.cpp](https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.
 
 ## Chat model