Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : add llama_vocab, functions -> methods, naming #11110

Merged
merged 9 commits into from
Jan 12, 2025

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jan 6, 2025

This PR refactors struct llama_model and struct llama_vocab related functionality. Moved the tensor data loading to src/llama-model.cpp. The src/llama.cpp now contains primarily graph build logic. struct llama_vocab is now public in the llama API and the respective calls use it instead of struct llama_model. Improved naming consistency in the public API.

Sub-PRs

API changes

Multiple naming changes in the llama_context and llama_model API. Old names have been deprecated.

  • Add struct llama_vocab to the API

  • llama_n_vocab() now accepts struct llama_vocab instead of struct llama_model

  • llama_sampler_init_dry() now accepts struct llama_vocab instead of struct llama_model

  • The tokenization API now accepts struct llama_vocab instead of struct llama_model

  • The sampler API now accepts struct llama_vocab instead of struct llama_model

  • Update API names for improved consistency:

    // before
    LLAMA_API int32_t llama_n_ctx_train(const struct llama_model * model);
    LLAMA_API int32_t llama_n_embd     (const struct llama_model * model);
    LLAMA_API int32_t llama_n_layer    (const struct llama_model * model);
    LLAMA_API int32_t llama_n_head     (const struct llama_model * model);

    LLAMA_API int32_t llama_n_vocab    (const struct llama_vocab * vocab);

    // after
    LLAMA_API int32_t llama_model_n_ctx_train(const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_embd     (const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_layer    (const struct llama_model * model);
    LLAMA_API int32_t llama_model_n_head     (const struct llama_model * model);

    LLAMA_API int32_t llama_vocab_n_tokens   (const struct llama_vocab * vocab);

   ...

Adapter API

Multiple naming changes in this API. Skipped the deprecation phase.

  • Rename struct llama_control_vector -> struct llama_adapter_cvec
  • Rename struct llama_lora_adapter -> struct llama_adapter_lora
  • llama_lora_adapter_[verb](ctx, ...) -> llama_[verb]_adapter_lora(ctx, ...)

Migration instructions

Adapting user code to the changes is fairly straightforward:

  • Change functions to use the new names
  • Call llama_model_get_vocab(model) where the old API required llama_model and the new API requires llama_vocab

@ggerganov ggerganov force-pushed the gg/llama-refactor-7 branch 3 times, most recently from 287e8c2 to 4d27597 Compare January 7, 2025 13:14
@ggerganov
Copy link
Member Author

I'm stumped by this error in the CI:

https://github.com/ggerganov/llama.cpp/actions/runs/12654124026/job/35261257507?pr=11110#step:8:355

Not sure what is causing it. @slaren Do you have any guess what could be the issue?

@slaren
Copy link
Member

slaren commented Jan 7, 2025

Looks like a compiler bug, but I cannot reproduce it locally.

@github-actions github-actions bot added the devops improvements to build systems and github actions label Jan 7, 2025
@slaren
Copy link
Member

slaren commented Jan 7, 2025

I think it's safe to assume that this is a compiler bug. It only happens with clang on Windows, and only on release builds. Changing the generator from "ninja multi-config" to "ninja" fixes it for the arm build, but not with hip or sycl. The destructors that it complains about seem to be the maps in llama_vocab, it may be some failure trying while trying to inline the calls.

@ggerganov
Copy link
Member Author

Thanks!

@ggerganov ggerganov force-pushed the gg/llama-refactor-7 branch from 1d1f264 to a857dc5 Compare January 10, 2025 09:26
ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>
@github-actions github-actions bot added testing Everything test related android Issues specific to Android examples python python script changes server labels Jan 10, 2025
@ggerganov ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Jan 10, 2025
@ggerganov
Copy link
Member Author

ggerganov commented Jan 10, 2025

This should be ready to merge. These are 5 PRs refactoring llama_vocab and llama_model with the main goals being to make the implementation more decoupled and avoid intermediate _impl functions when we can have those as methods of the respective classes. The llama_vocab is now also exposed through the public API because I think long term it would be useful to have this separation between the model and the vocabulary.

This set of PRs, together with #11167 and #11174 all introduce API changes, so it would make sense to merge them together to avoid multiple breaking changes.

After this is merged, I will attempt a similar refactoring for llama_context and llama_kv_cache.

@ggerganov ggerganov requested a review from slaren January 10, 2025 09:48
Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think it is very good to settle on C++ OOP style rather than the current mix of C and C++. Some suggestions for further refactoring:

  • Move the code from llama_model_load_from_file to the llama_model constructor
  • Move llama_vocab::load to the constructor
  • struct with private members is not very common, might be better to use class
  • I don't see the point of declaring empty destructors, they can be removed or explicitly set to default. Very few classes need a destructor.
  • At some point we should abstract eveything needed to model an architecture to a single class (such that each architecture is a subclass of this class)
  • After that, llm_type should probably be removed entirely, and each architecture should have its own enum if needed, with a function to return the type as a string (which by default could be "<arch> <params>")

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]
@ggerganov ggerganov changed the title llama : functions -> methods llama : add llama_vocab, functions -> methods, naming Jan 12, 2025
@ggerganov ggerganov merged commit afa8a9e into master Jan 12, 2025
56 of 57 checks passed
@ggerganov ggerganov deleted the gg/llama-refactor-7 branch January 12, 2025 09:32
LLAMA_LOG_WARN(
"%s: Added a BOS token to the prompt as specified by the model but the prompt "
"also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. "
"Are you sure this is what you want?\n", __FUNCTION__);
}
if (vocab.tokenizer_add_eos && output.size() >= 2 && *(output.end()-2) == vocab.special_eos_id) {
if (vocab.get_add_bos() && output.size() >= 2 && *(output.end()-2) == vocab.token_eos()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vocab.get_add_bos()

probably a typo @ggerganov ?

Copy link
Member Author

@ggerganov ggerganov Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok - it means "get the add_bos flag". The "get" is necessary to disambiguate from the action of adding a BOS vocab.add_bos().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov what I mean is that you are checking for eos, but using vocab.get_add_bos(). Should it not be vocab.get_add_eos() instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. Thanks for spotting - fixing.

tobiasvonderheidt added a commit to tobiasvonderheidt/hips that referenced this pull request Jan 31, 2025
tobiasvonderheidt added a commit to tobiasvonderheidt/hips that referenced this pull request Jan 31, 2025
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
* llama : functions -> methods (ggml-org#11110)

* llama : add struct llama_vocab to the API (ggml-org#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (ggml-org#11159)

ggml-ci

* vocab : more pimpl (ggml-org#11165)

ggml-ci

* vocab : minor tokenization optimizations (ggml-org#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (ggml-org#11167)

ggml-ci

* llama : update API names to use correct prefix (ggml-org#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (ggml-org#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (ggml-org#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
* llama : functions -> methods (ggml-org#11110)

* llama : add struct llama_vocab to the API (ggml-org#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (ggml-org#11159)

ggml-ci

* vocab : more pimpl (ggml-org#11165)

ggml-ci

* vocab : minor tokenization optimizations (ggml-org#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (ggml-org#11167)

ggml-ci

* llama : update API names to use correct prefix (ggml-org#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (ggml-org#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (ggml-org#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Issues specific to Android breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. devops improvements to build systems and github actions examples python python script changes server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants