Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-cli misbehaving (changed?) #12036

Open
0wwafa opened this issue Feb 23, 2025 · 2 comments
Open

llama-cli misbehaving (changed?) #12036

0wwafa opened this issue Feb 23, 2025 · 2 comments

Comments

@0wwafa
Copy link

0wwafa commented Feb 23, 2025

I have a colab notebook here to quantize and the test models:
https://colab.research.google.com/drive/1TcyGL60GQzsxEHu-Xlos5u8bb_6SxMa3

The simple test has always been this line:

prompt="""
Tell me the difference between thinking in humans and in LLMs.
"""
m=f'{model_name}.{q_type}.gguf'
!./build/bin/llama-cli --ignore-eos -c 4096 -m /content/$m -t $(nproc) -ngl 999 -p "User: Hi\nBot:Hi\nUser: {prompt}\nBot:"

Usually, after the initialization the models start answering. (and then even continuing on their own... which is fine).

Now ( b4762 ) instead it does this:

build: 4762 (af7747c9) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 39 key-value pairs and 464 tensors from /content/gemma-2-Ifable-9B.q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 2 Ifable 9B
llama_model_loader: - kv   3:                       general.organization str              = Ifable
llama_model_loader: - kv   4:                           general.basename str              = gemma-2-Ifable
llama_model_loader: - kv   5:                         general.size_label str              = 9B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                      general.dataset.count u32              = 1
llama_model_loader: - kv   8:                     general.dataset.0.name str              = Gutenberg Dpo v0.1
llama_model_loader: - kv   9:                  general.dataset.0.version str              = v0.1
llama_model_loader: - kv  10:             general.dataset.0.organization str              = Jondurbin
llama_model_loader: - kv  11:                 general.dataset.0.repo_url str              = https://huggingface.co/jondurbin/gute...
llama_model_loader: - kv  12:                      gemma2.context_length u32              = 8192
llama_model_loader: - kv  13:                    gemma2.embedding_length u32              = 3584
llama_model_loader: - kv  14:                         gemma2.block_count u32              = 42
llama_model_loader: - kv  15:                 gemma2.feed_forward_length u32              = 14336
llama_model_loader: - kv  16:                gemma2.attention.head_count u32              = 16
llama_model_loader: - kv  17:             gemma2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  18:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  19:                gemma2.attention.key_length u32              = 256
llama_model_loader: - kv  20:              gemma2.attention.value_length u32              = 256
llama_model_loader: - kv  21:              gemma2.attn_logit_softcapping f32              = 50.000000
llama_model_loader: - kv  22:             gemma2.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  23:            gemma2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  27:                      tokenizer.ggml.scores arr[f32,256000]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  28:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  31:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  33:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  34:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  35:                    tokenizer.chat_template str              = {{ '<bos>' }}{% if messages[0]['role'...
llama_model_loader: - kv  36:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  37:               general.quantization_version u32              = 2
llama_model_loader: - kv  38:                          general.file_type u32              = 7
llama_model_loader: - type  f32:  169 tensors
llama_model_loader: - type  f16:    1 tensors
llama_model_loader: - type q8_0:  294 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 9.95 GiB (9.25 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 217
load: token to piece cache size = 1.6014 MB
print_info: arch             = gemma2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 8192
print_info: n_embd           = 3584
print_info: n_layer          = 42
print_info: n_head           = 16
print_info: n_head_kv        = 8
print_info: n_rot            = 256
print_info: n_swa            = 4096
print_info: n_embd_head_k    = 256
print_info: n_embd_head_v    = 256
print_info: n_gqa            = 2
print_info: n_embd_k_gqa     = 2048
print_info: n_embd_v_gqa     = 2048
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 14336
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 8192
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 9B
print_info: model params     = 9.24 B
print_info: general.name     = Gemma 2 Ifable 9B
print_info: vocab type       = SPM
print_info: n_vocab          = 256000
print_info: n_merges         = 0
print_info: BOS token        = 2 '<bos>'
print_info: EOS token        = 1 '<eos>'
print_info: EOT token        = 107 '<end_of_turn>'
print_info: UNK token        = 3 '<unk>'
print_info: PAD token        = 0 '<pad>'
print_info: LF token         = 227 '<0x0A>'
print_info: EOG token        = 1 '<eos>'
print_info: EOG token        = 107 '<end_of_turn>'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 42 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 43/43 layers to GPU
load_tensors:   CPU_Mapped model buffer size = 10186.44 MiB
....................................................................................
llama_init_from_model: n_seq_max     = 1
llama_init_from_model: n_ctx         = 4096
llama_init_from_model: n_ctx_per_seq = 4096
llama_init_from_model: n_batch       = 2048
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 0
llama_init_from_model: freq_base     = 10000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 42, can_shift = 1
llama_kv_cache_init:        CPU KV buffer size =  1344.00 MiB
llama_init_from_model: KV self size  = 1344.00 MiB, K (f16):  672.00 MiB, V (f16):  672.00 MiB
llama_init_from_model:        CPU  output buffer size =     0.98 MiB
llama_init_from_model:        CPU compute buffer size =   514.00 MiB
llama_init_from_model: graph nodes  = 1690
llama_init_from_model: graph splits = 1
common_init_from_params: added <eos> logit bias = -inf
common_init_from_params: added <end_of_turn> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
main: chat template example:
<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model


system_info: n_threads = 2 (n_threads_batch = 2) / 2 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

main: interactive mode on.
sampler seed: 3895428166
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> 

Am I doing something wrong?

Note:
if I use b4000 everything works as usual.

@0wwafa
Copy link
Author

0wwafa commented Feb 23, 2025

For reference, this is the output I get with the same model using b4000:

build: 4000 (c02e5ab2) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 39 key-value pairs and 464 tensors from /content/gemma-2-Ifable-9B.q8q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 2 Ifable 9B
llama_model_loader: - kv   3:                       general.organization str              = Ifable
llama_model_loader: - kv   4:                           general.basename str              = gemma-2-Ifable
llama_model_loader: - kv   5:                         general.size_label str              = 9B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                      general.dataset.count u32              = 1
llama_model_loader: - kv   8:                     general.dataset.0.name str              = Gutenberg Dpo v0.1
llama_model_loader: - kv   9:                  general.dataset.0.version str              = v0.1
llama_model_loader: - kv  10:             general.dataset.0.organization str              = Jondurbin
llama_model_loader: - kv  11:                 general.dataset.0.repo_url str              = https://huggingface.co/jondurbin/gute...
llama_model_loader: - kv  12:                      gemma2.context_length u32              = 8192
llama_model_loader: - kv  13:                    gemma2.embedding_length u32              = 3584
llama_model_loader: - kv  14:                         gemma2.block_count u32              = 42
llama_model_loader: - kv  15:                 gemma2.feed_forward_length u32              = 14336
llama_model_loader: - kv  16:                gemma2.attention.head_count u32              = 16
llama_model_loader: - kv  17:             gemma2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  18:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  19:                gemma2.attention.key_length u32              = 256
llama_model_loader: - kv  20:              gemma2.attention.value_length u32              = 256
llama_model_loader: - kv  21:              gemma2.attn_logit_softcapping f32              = 50.000000
llama_model_loader: - kv  22:             gemma2.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  23:            gemma2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  24:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  25:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  26:                      tokenizer.ggml.tokens arr[str,256000]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  27:                      tokenizer.ggml.scores arr[f32,256000]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  28:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  29:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  30:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  31:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  32:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  33:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  34:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  35:                    tokenizer.chat_template str              = {{ '<bos>' }}{% if messages[0]['role'...
llama_model_loader: - kv  36:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  37:               general.quantization_version u32              = 2
llama_model_loader: - kv  38:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  169 tensors
llama_model_loader: - type q8_0:    1 tensors
llama_model_loader: - type q4_K:  252 tensors
llama_model_loader: - type q6_K:   42 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 217
llm_load_vocab: token to piece cache size = 1.6014 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma2
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 3584
llm_load_print_meta: n_layer          = 42
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_swa            = 4096
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 2
llm_load_print_meta: n_embd_k_gqa     = 2048
llm_load_print_meta: n_embd_v_gqa     = 2048
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 9B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 9.24 B
llm_load_print_meta: model size       = 5.57 GiB (5.17 BPW) 
llm_load_print_meta: general.name     = Gemma 2 Ifable 9B
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_print_meta: EOG token        = 1 '<eos>'
llm_load_print_meta: EOG token        = 107 '<end_of_turn>'
llm_load_print_meta: max token length = 48
llm_load_tensors: offloading 42 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: CPU_Mapped model buffer size =  5700.31 MiB
.....................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =  1344.00 MiB
llama_new_context_with_model: KV self size  = 1344.00 MiB, K (f16):  672.00 MiB, V (f16):  672.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.98 MiB
llama_new_context_with_model:        CPU compute buffer size =   514.00 MiB
llama_new_context_with_model: graph nodes  = 1690
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 2

system_info: n_threads = 2 (n_threads_batch = 2) / 2 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 

sampler seed: 172490367
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1

User: Hi
Bot:Hi
User: 
Tell me the difference between thinking in humans and in LLMs.

Bot:Here's a breakdown of the key differences between human thinking and how Large Language Models (LLMs) like me "think":

**Human Thinking:**

* **Biological & Subconscious:**  Rooted in complex neural networks in the brain, much of human thought is subconscious, emergent, and influenced by emotions, experiences, and bodily sensations.
* **Intuitive & Creative:** Humans excel at making leaps of logic,  
llama_perf_sampler_print:    sampling time =      36.64 ms /   116 runs   (    0.32 ms per token,  3166.11 tokens per second)
llama_perf_context_print:        load time =   30829.81 ms
llama_perf_context_print: prompt eval time =   14709.99 ms /    29 tokens (  507.24 ms per token,     1.97 tokens per second)
llama_perf_context_print:        eval time =   70962.65 ms /    86 runs   (  825.15 ms per token,     1.21 tokens per second)
llama_perf_context_print:       total time =   85905.09 ms /   115 tokens
Interrupted by user

@DarkTyger
Copy link

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release -j$(nproc)
wget https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf -O gemma-7b-q4.gguf
./build/bin/llama-cli -m gemma-7b-q4.gguf -c 1024 -p "Once upon a time"

Output:

main: llama threadpool init, n_threads = 6
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
main: chat template example:
<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model


system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

main: interactive mode on.
sampler seed: 3615269160
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 1024
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 1024, n_batch = 2048, n_predict = -1, n_keep = 1

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

>

Is the documentation outdated?

https://github.com/ggml-org/llama.cpp/blob/master/examples/main/README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants