llama : add `llama_vocab`, functions -> methods, naming #11110

ggerganov · 2025-01-06T14:14:07Z

This PR refactors struct llama_model and struct llama_vocab related functionality. Moved the tensor data loading to src/llama-model.cpp. The src/llama.cpp now contains primarily graph build logic. struct llama_vocab is now public in the llama API and the respective calls use it instead of struct llama_model. Improved naming consistency in the public API.

Sub-PRs

API changes

Multiple naming changes in the llama_context and llama_model API. Old names have been deprecated.

Add struct llama_vocab to the API
llama_n_vocab() now accepts struct llama_vocab instead of struct llama_model
llama_sampler_init_dry() now accepts struct llama_vocab instead of struct llama_model
The tokenization API now accepts struct llama_vocab instead of struct llama_model
The sampler API now accepts struct llama_vocab instead of struct llama_model
Update API names for improved consistency:

    // before
    LLAMA_API int32_t llama_n_ctx_train(const struct llama_model * model);
    LLAMA_API int32_t llama_n_embd     (const struct llama_model * model);
    LLAMA_API int32_t llama_n_layer    (const struct llama_model * model);
    LLAMA_API int32_t llama_n_head     (const struct llama_model * model);

    LLAMA_API int32_t llama_n_vocab    (const struct llama_vocab * vocab);

    // after
    LLAMA_API int32_t llama_model_n_ctx_train(const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_embd     (const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_layer    (const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_head     (const struct llama_model * model);

    LLAMA_API int32_t llama_vocab_n_tokens   (const struct llama_vocab * vocab);

   ...

Adapter API

Multiple naming changes in this API. Skipped the deprecation phase.

Rename struct llama_control_vector -> struct llama_adapter_cvec
Rename struct llama_lora_adapter -> struct llama_adapter_lora
llama_lora_adapter_[verb](ctx, ...) -> llama_[verb]_adapter_lora(ctx, ...)

Migration instructions

Adapting user code to the changes is fairly straightforward:

Change functions to use the new names
Call llama_model_get_vocab(model) where the old API required llama_model and the new API requires llama_vocab

ggerganov · 2025-01-07T15:56:17Z

I'm stumped by this error in the CI:

https://github.com/ggerganov/llama.cpp/actions/runs/12654124026/job/35261257507?pr=11110#step:8:355

Not sure what is causing it. @slaren Do you have any guess what could be the issue?

slaren · 2025-01-07T17:28:10Z

Looks like a compiler bug, but I cannot reproduce it locally.

slaren · 2025-01-07T19:16:21Z

I think it's safe to assume that this is a compiler bug. It only happens with clang on Windows, and only on release builds. Changing the generator from "ninja multi-config" to "ninja" fixes it for the arm build, but not with hip or sycl. The destructors that it complains about seem to be the maps in llama_vocab, it may be some failure trying while trying to inline the calls.

ggerganov · 2025-01-07T21:31:53Z

Thanks!

ggml-ci

ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com>

ggerganov · 2025-01-10T09:48:26Z

This should be ready to merge. These are 5 PRs refactoring llama_vocab and llama_model with the main goals being to make the implementation more decoupled and avoid intermediate _impl functions when we can have those as methods of the respective classes. The llama_vocab is now also exposed through the public API because I think long term it would be useful to have this separation between the model and the vocabulary.

This set of PRs, together with #11167 and #11174 all introduce API changes, so it would make sense to merge them together to avoid multiple breaking changes.

After this is merged, I will attempt a similar refactoring for llama_context and llama_kv_cache.

slaren

Looks good, I think it is very good to settle on C++ OOP style rather than the current mix of C and C++. Some suggestions for further refactoring:

Move the code from llama_model_load_from_file to the llama_model constructor
Move llama_vocab::load to the constructor
struct with private members is not very common, might be better to use class
I don't see the point of declaring empty destructors, they can be removed or explicitly set to default. Very few classes need a destructor.
At some point we should abstract eveything needed to model an architecture to a single class (such that each architecture is a subclass of this class)
After that, llm_type should probably be removed entirely, and each architecture should have its own enum if needed, with a function to return the type as a string (which by default could be "<arch> <params>")

src/llama-vocab.cpp

ggml-ci

* llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci]

include/llama.h

ggml-ci

LostRuins · 2025-01-16T12:40:19Z

src/llama-vocab.cpp

            LLAMA_LOG_WARN(
                "%s: Added a BOS token to the prompt as specified by the model but the prompt "
                "also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. "
                "Are you sure this is what you want?\n", __FUNCTION__);
        }
-        if (vocab.tokenizer_add_eos && output.size() >= 2 && *(output.end()-2) == vocab.special_eos_id) {
+        if (vocab.get_add_bos() && output.size() >= 2 && *(output.end()-2) == vocab.token_eos()) {


vocab.get_add_bos()

probably a typo @ggerganov ?

It's ok - it means "get the add_bos flag". The "get" is necessary to disambiguate from the action of adding a BOS vocab.add_bos().

@ggerganov what I mean is that you are checking for eos, but using vocab.get_add_bos(). Should it not be vocab.get_add_eos() instead.

Oh yes. Thanks for spotting - fixing.

ggml-org/llama.cpp#11110

* llama : functions -> methods (ggml-org#11110) * llama : add struct llama_vocab to the API (ggml-org#11156) ggml-ci * hparams : move vocab params to llama_vocab (ggml-org#11159) ggml-ci * vocab : more pimpl (ggml-org#11165) ggml-ci * vocab : minor tokenization optimizations (ggml-org#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (ggml-org#11167) ggml-ci * llama : update API names to use correct prefix (ggml-org#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (ggml-org#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (ggml-org#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>

ggerganov force-pushed the gg/llama-refactor-7 branch 3 times, most recently from 287e8c2 to 4d27597 Compare January 7, 2025 13:14

github-actions bot added the devops improvements to build systems and github actions label Jan 7, 2025

ggerganov force-pushed the gg/llama-refactor-7 branch from b7f2c02 to be9a25f Compare January 8, 2025 13:03

ggerganov marked this pull request as ready for review January 8, 2025 20:32

ggerganov force-pushed the gg/llama-refactor-7 branch from 403dee8 to aefcffa Compare January 9, 2025 12:34

This was referenced Jan 9, 2025

llama : add struct llama_vocab to the API #11156

Merged

vocab : minor tokenization optimizations #11160

Merged

lora : update API names #11167

Merged

ggerganov force-pushed the gg/llama-refactor-7 branch from c89e808 to bfe781a Compare January 10, 2025 08:25

ggerganov requested a review from ngxson as a code owner January 10, 2025 09:21

ggerganov added 4 commits January 10, 2025 11:24

llama : functions -> methods (#11110)

609ec7e

llama : add struct llama_vocab to the API (#11156)

c725f69

ggml-ci

hparams : move vocab params to llama_vocab (#11159)

45aab64

ggml-ci

vocab : more pimpl (#11165)

a857dc5

ggml-ci

ggerganov force-pushed the gg/llama-refactor-7 branch from 1d1f264 to a857dc5 Compare January 10, 2025 09:26

vocab : minor tokenization optimizations (#11160)

aeeb942

ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com>

github-actions bot added testing Everything test related android Issues specific to Android examples python python script changes server labels Jan 10, 2025

ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Jan 10, 2025

ggerganov requested a review from slaren January 10, 2025 09:48

ggerganov mentioned this pull request Jan 10, 2025

llama : update API names to use correct prefix #11174

Merged

slaren approved these changes Jan 10, 2025

View reviewed changes

src/llama-vocab.cpp Outdated Show resolved Hide resolved

ggerganov added 2 commits January 11, 2025 16:39

lora : update API names (#11167)

6efee8c

ggml-ci

llama : update API names to use correct prefix (#11174)

6df37bc

* llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci]

ggerganov commented Jan 11, 2025

View reviewed changes

include/llama.h Outdated Show resolved Hide resolved

vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)

6540935

ggml-ci

ggerganov force-pushed the gg/llama-refactor-7 branch from d8ded9f to 6540935 Compare January 11, 2025 15:52

ggerganov mentioned this pull request Jan 12, 2025

changelog : libllama API #9289

Open

vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)

cbea4ba

ggml-ci

ggerganov changed the title ~~llama : functions -> methods~~ llama : add llama_vocab, functions -> methods, naming Jan 12, 2025

ggerganov merged commit afa8a9e into master Jan 12, 2025
56 of 57 checks passed

ggerganov deleted the gg/llama-refactor-7 branch January 12, 2025 09:32

LostRuins reviewed Jan 16, 2025

View reviewed changes

This was referenced Jan 17, 2025

vocab : fix double-eos check #11273

Merged

llama : refactor llama_kv_cache, llama_context and llm_build_context #11213

Draft

tobiasvonderheidt added a commit to tobiasvonderheidt/hips that referenced this pull request Jan 31, 2025

Changes in llama.cpp API

4cae85c

ggml-org/llama.cpp#11110

tobiasvonderheidt added a commit to tobiasvonderheidt/hips that referenced this pull request Jan 31, 2025

Changes in llama.cpp API

3f286ed

ggml-org/llama.cpp#11110

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add `llama_vocab`, functions -> methods, naming #11110

llama : add `llama_vocab`, functions -> methods, naming #11110

ggerganov commented Jan 6, 2025 •

edited

Loading

ggerganov commented Jan 7, 2025

slaren commented Jan 7, 2025

slaren commented Jan 7, 2025

ggerganov commented Jan 7, 2025

ggerganov commented Jan 10, 2025 •

edited

Loading

slaren left a comment

LostRuins Jan 16, 2025

ggerganov Jan 16, 2025 •

edited

Loading

LostRuins Jan 17, 2025

ggerganov Jan 17, 2025

llama : add llama_vocab, functions -> methods, naming #11110

llama : add llama_vocab, functions -> methods, naming #11110

Conversation

ggerganov commented Jan 6, 2025 • edited Loading

Sub-PRs

API changes

Adapter API

Migration instructions

ggerganov commented Jan 7, 2025

slaren commented Jan 7, 2025

slaren commented Jan 7, 2025

ggerganov commented Jan 7, 2025

ggerganov commented Jan 10, 2025 • edited Loading

slaren left a comment

Choose a reason for hiding this comment

LostRuins Jan 16, 2025

Choose a reason for hiding this comment

ggerganov Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

LostRuins Jan 17, 2025

Choose a reason for hiding this comment

ggerganov Jan 17, 2025

Choose a reason for hiding this comment

llama : add `llama_vocab`, functions -> methods, naming #11110

llama : add `llama_vocab`, functions -> methods, naming #11110

ggerganov commented Jan 6, 2025 •

edited

Loading

ggerganov commented Jan 10, 2025 •

edited

Loading

ggerganov Jan 16, 2025 •

edited

Loading