terminate running deepseek models with gbnf grammars #4206

54rt1n · 2023-11-24T18:25:49Z

Prerequisites

On b1557

Expected Behavior

The model should generate output as normal, as defined in the grammar file. This appears to only impact deepseek, as llama variants and yi run fine.

Current Behavior

terminate called after throwing an instance of 'std::out_of_range'
what(): _Map_base::at

Environment and Context

AMD Ryzen 7 3700X 8-Core Processor
0a:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070] (rev a1)
Linux 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC
NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

The example below is ./main -n -1 -c 8192 -ngl 0 --repeat_penalty 1.2 --color -i --mirostat 2 -m ../llama/gguf/deepseek-coder-6.7b-instruct.Q8_0.gguf --grammar-file grammar/any_text.gbnf --prompt Test

any_text.gbnf:

root ::= ([^\n]+ "\n")+

Failure Logs

The error happens immediately:

...
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 6.67 GiB (8.50 BPW)
llm_load_print_meta: general.name   = deepseek-ai_deepseek-coder-6.7b-instruct
llm_load_print_meta: BOS token = 32013 '<｜begin▁of▁sentence｜>'
llm_load_print_meta: EOS token = 32021 '<|EOT|>'
llm_load_print_meta: PAD token = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: LF token  = 126 'Ä'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 6830.87 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 100000.0
llama_new_context_with_model: freq_scale = 0.25
llama_new_context_with_model: kv self size  = 4096.00 MiB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 555.07 MiB
llama_new_context_with_model: VRAM scratch buffer: 552.00 MiB
llama_new_context_with_model: total VRAM used: 552.00 MiB (model: 0.00 MiB, context: 552.00 MiB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling:
        repeat_last_n = 64, repeat_penalty = 1.200, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 2, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 8192, n_batch = 512, n_predict = -1, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

Testterminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at

The text was updated successfully, but these errors were encountered:

shroominic · 2023-12-01T17:38:36Z

I experienced the same

maziyarpanahi · 2024-02-18T19:11:45Z

Me too, I just cannot seem to convert and quantized https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct

Tried on a latest main branch and still fails with:

terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

BattlehubCode · 2024-03-03T14:43:18Z

Same here, tried the latest main branch. main stops working after the first user prompt. I use:

./main -m "deepseek-coder-6.7b-instruct.Q5_K_S.gguf" --grammar-file "grammars/c.gbnf" --prompt "You are an AI programming assistant, utilizing the DeepSeek Coder model, and you only answer questions related to computer science.\n" --in-prefix "### Instruction:\n" --in-suffix "### Response:\n" -r "<|EOT|>\n" -i --interactive-first

ggerganov · 2024-03-04T07:59:43Z

Deepseek models are not supported at this time. See #5464

github-actions · 2024-04-19T01:07:09Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

54rt1n added the bug-unconfirmed label Nov 24, 2023

wsxiaoys mentioned this issue Mar 27, 2024

Error in Tabby deployment - llama_cpp_bindings::llama: crates/llama-cpp-bindings/src/llama.rs TabbyML/tabby#1666

Closed

github-actions bot added the stale label Apr 4, 2024

github-actions bot closed this as completed Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminate running deepseek models with gbnf grammars #4206

terminate running deepseek models with gbnf grammars #4206

54rt1n commented Nov 24, 2023

shroominic commented Dec 1, 2023

maziyarpanahi commented Feb 18, 2024

BattlehubCode commented Mar 3, 2024 •

edited

Loading

ggerganov commented Mar 4, 2024

github-actions bot commented Apr 19, 2024

terminate running deepseek models with gbnf grammars #4206

terminate running deepseek models with gbnf grammars #4206

Comments

54rt1n commented Nov 24, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

shroominic commented Dec 1, 2023

maziyarpanahi commented Feb 18, 2024

BattlehubCode commented Mar 3, 2024 • edited Loading

ggerganov commented Mar 4, 2024

github-actions bot commented Apr 19, 2024

BattlehubCode commented Mar 3, 2024 •

edited

Loading