Releases · xhedit/llama-cpp-conv

merge test (#2)

* feat: add support for KV cache quantization options (#1307)

* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* fix: Changed local API doc references to hosted (#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (#1295)

Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes #1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes #1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: lawfordp2017 <lawfordp@gmail.com>
Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com>
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 35

06 Apr 23:41

github-actions

dev-cu123

bf766bd

dev-cu123

merge test (#2)

* feat: add support for KV cache quantization options (#1307)

* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* fix: Changed local API doc references to hosted (#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (#1295)

Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes #1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes #1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: lawfordp2017 <lawfordp@gmail.com>
Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com>
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 8

06 Apr 23:40

github-actions

dev-cu122

bf766bd

dev-cu122

merge test (#2)

* feat: add support for KV cache quantization options (#1307)

* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* fix: Changed local API doc references to hosted (#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (#1295)

Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes #1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes #1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: lawfordp2017 <lawfordp@gmail.com>
Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com>
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 8

06 Apr 23:40

github-actions

dev-cu121

bf766bd

dev-cu121

merge test (#2)

* feat: add support for KV cache quantization options (#1307)

* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>

* fix: Changed local API doc references to hosted (#1317)

* chore: Bump version

* fix: last tokens passing to sample_repetition_penalties function (#1295)

Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>

* feat: Update llama.cpp

* fix: segfault when logits_all=False. Closes #1319

* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)

* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name

* feat: Update llama.cpp

* chore: Bump version

* fix(ci): use correct script name

* docs: LLAMA_CUBLAS -> LLAMA_CUDA

* docs: Add docs explaining how to install pre-built wheels.

* docs: Rename cuBLAS section to CUDA

* fix(docs): incorrect tool_choice example (#1330)

* feat: Update llama.cpp

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314

* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314

* feat: Update llama.cpp

* fix: Always embed metal library. Closes #1332

* feat: Update llama.cpp

* chore: Bump version

---------

Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: lawfordp2017 <lawfordp@gmail.com>
Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com>
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: xhedit/llama-cpp-conv

recook-metal

recook-cu124

recook-cu123

recook-cu122

recook-cu121

recook

dev-metal

dev-cu123

dev-cu122

dev-cu121