tensor 'tok_embeddings.weight' has wrong size in model file #14

shaunabanana · 2023-03-11T14:10:15Z

When trying to run the 13B model the following output is given:

main: seed = 1678543550
llama_model_load: loading model from './models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './models/13B/ggml-model-q4_0.bin'

I have followed the commands in the readme to quantize the model, i.e.:

python3 convert-pth-to-ggml.py models/13B/ 1
./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2

I am using a M1 MacBook Pro. Any thoughts on how to resolve this issue?

The text was updated successfully, but these errors were encountered:

shaunabanana · 2023-03-11T14:16:26Z

I just realized that I had been using binaries (quantize and main) compiled from a previous version. Recompiling solved the issue. Thank you for your awesome work!

Update macOS, better instructions, streaming output

Trim partial stopping strings when not streaming and move multibyte check.

* Update LICENSE with our copyright notice * Update README.md * fix readme anchor * Update README.md

shaunabanana closed this as completed Mar 11, 2023

gjmulder added the build Compilation issues label Mar 15, 2023

abetlen pushed a commit to abetlen/llama.cpp that referenced this issue Apr 10, 2023

Merge pull request ggml-org#14 from hypnopump/update_macos

dc679bf

Update macOS, better instructions, streaming output

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this issue Jun 2, 2023

Merge pull request ggml-org#14 from anon998/do-completion-update

f5d5e70

Trim partial stopping strings when not streaming and move multibyte check.

chsasank pushed a commit to chsasank/llama.cpp that referenced this issue Dec 20, 2023

Update LICENSE and TODOs in README (ggml-org#14)

e3b4b85

* Update LICENSE with our copyright notice * Update README.md * fix readme anchor * Update README.md

acbits mentioned this issue Feb 25, 2025

Regression. Unable to run any model. CRASH!!! #12075

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor 'tok_embeddings.weight' has wrong size in model file #14

tensor 'tok_embeddings.weight' has wrong size in model file #14

shaunabanana commented Mar 11, 2023

shaunabanana commented Mar 11, 2023

tensor 'tok_embeddings.weight' has wrong size in model file #14

tensor 'tok_embeddings.weight' has wrong size in model file #14

Comments

shaunabanana commented Mar 11, 2023

shaunabanana commented Mar 11, 2023