Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate running deepseek models with gbnf grammars #4206

Closed
54rt1n opened this issue Nov 24, 2023 · 5 comments
Closed

terminate running deepseek models with gbnf grammars #4206

54rt1n opened this issue Nov 24, 2023 · 5 comments

Comments

@54rt1n
Copy link

54rt1n commented Nov 24, 2023

Prerequisites

On b1557

Expected Behavior

The model should generate output as normal, as defined in the grammar file. This appears to only impact deepseek, as llama variants and yi run fine.

Current Behavior

terminate called after throwing an instance of 'std::out_of_range'
what(): _Map_base::at

Environment and Context

AMD Ryzen 7 3700X 8-Core Processor
0a:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070] (rev a1)
Linux 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC
NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

The example below is ./main -n -1 -c 8192 -ngl 0 --repeat_penalty 1.2 --color -i --mirostat 2 -m ../llama/gguf/deepseek-coder-6.7b-instruct.Q8_0.gguf --grammar-file grammar/any_text.gbnf --prompt Test

any_text.gbnf:

root ::= ([^\n]+ "\n")+

Failure Logs

The error happens immediately:

...
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 6.67 GiB (8.50 BPW)
llm_load_print_meta: general.name   = deepseek-ai_deepseek-coder-6.7b-instruct
llm_load_print_meta: BOS token = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 32021 '<|EOT|>'
llm_load_print_meta: PAD token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token  = 126 'Ä'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 6830.87 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 100000.0
llama_new_context_with_model: freq_scale = 0.25
llama_new_context_with_model: kv self size  = 4096.00 MiB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 555.07 MiB
llama_new_context_with_model: VRAM scratch buffer: 552.00 MiB
llama_new_context_with_model: total VRAM used: 552.00 MiB (model: 0.00 MiB, context: 552.00 MiB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling:
        repeat_last_n = 64, repeat_penalty = 1.200, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 2, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 8192, n_batch = 512, n_predict = -1, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

Testterminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
@shroominic
Copy link

I experienced the same

@maziyarpanahi
Copy link

Me too, I just cannot seem to convert and quantized https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct

Tried on a latest main branch and still fails with:

terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

@BattlehubCode
Copy link

BattlehubCode commented Mar 3, 2024

Same here, tried the latest main branch. main stops working after the first user prompt. I use:

./main -m "deepseek-coder-6.7b-instruct.Q5_K_S.gguf" --grammar-file "grammars/c.gbnf" --prompt "You are an AI programming assistant, utilizing the DeepSeek Coder model, and you only answer questions related to computer science.\n" --in-prefix "### Instruction:\n" --in-suffix "### Response:\n" -r "<|EOT|>\n" -i --interactive-first

@ggerganov
Copy link
Member

Deepseek models are not supported at this time. See #5464

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants