Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Req: Deepseek-coder-33b-instruct Prompt Template #1082

Closed
MSZ-MGS opened this issue Nov 10, 2023 · 9 comments · Fixed by #1083
Closed

Req: Deepseek-coder-33b-instruct Prompt Template #1082

MSZ-MGS opened this issue Nov 10, 2023 · 9 comments · Fixed by #1083

Comments

@MSZ-MGS
Copy link
Contributor

MSZ-MGS commented Nov 10, 2023

It seems very promising Llm, please define its prompt template.

Screenshot_20231111_023807_Chrome

Website for more info:
https://deepseekcoder.github.io/

Link to huggingface:
https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct

@pseudotensor
Copy link
Collaborator

Sure done, thanks for suggestion.

@MSZ-MGS
Copy link
Contributor Author

MSZ-MGS commented Nov 11, 2023

Sure done, thanks for suggestion.

Thank you @pseudotensor I receive below error(attached text file).
Notes:

HOWEVER, it worked with Llamacpp 0.2.14 using the same installation method from H2oGPT docs

What do you think?! Does it make sense ?
CoderError.txt

@pseudotensor
Copy link
Collaborator

pseudotensor commented Nov 11, 2023

The HF model works for me.

For GGUF you have:

ERROR: byte not found in vocab: '
'
Windows fatal exception: access violation

This sounds like a bug in llama.cpp or llama_cpp_python in handling the file.

For me I run:

python generate.py --base_model=TheBloke/deepseek-coder-33B-instruct-GGUF --max_seq_len=4096 --max_new_tokens=2048 --prompt_type=deepseek_coder

and I get the same thing:

ERROR: byte not found in vocab: '
'
Fatal Python error: Segmentation fault

Current thread 0x00007f83baa29740 (most recent call first):
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/llama_cpp_cuda/llama_cpp.py", line 498 in llama_load_model_from_file
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/llama_cpp_cuda/llama.py", line 357 in __init__
  File "/home/jon/h2ogpt/src/gpt4all_llm.py", line 363 in validate_environment
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/load/serializable.py", line 97 in __init__
  File "/home/jon/h2ogpt/src/gpt4all_llm.py", line 175 in get_llm_gpt4all
  File "/home/jon/h2ogpt/src/gpt4all_llm.py", line 27 in get_model_tokenizer_gpt4all
  File "/home/jon/h2ogpt/src/gen.py", line 2043 in get_model
  File "/home/jon/h2ogpt/src/gen.py", line 1809 in get_model_retry
  File "/home/jon/h2ogpt/src/gen.py", line 1487 in main
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 691 in _CallAndUpdateTrace
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 475 in _Fire
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 141 in Fire
  File "/home/jon/h2ogpt/src/utils.py", line 65 in H2O_Fire
  File "/home/jon/h2ogpt/generate.py", line 12 in entrypoint_main
  File "/home/jon/h2ogpt/generate.py", line 16 in <module>

So they probably fixed something.

But this issue is still open: abetlen/llama-cpp-python#840

@pseudotensor
Copy link
Collaborator

I tried latest from jllllll for same version but latest 0.2.17 and it still fails in same way. Which exact link did you use?

@MSZ-MGS
Copy link
Contributor Author

MSZ-MGS commented Nov 11, 2023

I tried latest from jllllll for same version but latest 0.2.17 and it still fails in same way. Which exact link did you use?

I have compiled it using your code with changing Llama version:

pip uninstall -y llama-cpp-python
set LLAMA_CUBLAS=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python==0.2.14 --no-cache-dir --verbose

@pseudotensor
Copy link
Collaborator

pseudotensor commented Nov 11, 2023

Yes with https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda-0.2.14+cu118-cp310-cp310-manylinux_2_31_x86_64.whl

for whatever reason that doesn't fail in same way. Instead I get OOM.

CUDA error 2 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:7624: out of memory
current device: 0

With 14 from jlll doing:

CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=TheBloke/deepseek-coder-6.7B-instruct-GGUF --max_seq_len=4096 --max_new_tokens=2048 --prompt_type=deepseek_coder

gives:

image

So the OOM is expected on my 24GB board I guess for the 33B model, but the vocab error is odd and 14 from jjlll fixes or maybe if you recompile yourself fixes.

I actually don't expect it's jllll's fault, since 14 worked for me. I'm guessing llama_cpp_python or llama.cpp teams are not not stable in their code changes. I suspect jlll is using same commands all the time.

@MSZ-MGS
Copy link
Contributor Author

MSZ-MGS commented Nov 14, 2023

@pseudotensor llama-cpp-python 0.2.18 is working fine. Anything between that and 0.2.14 are not working.
Not sure if going to 0.2.18 will add any value, this is for your kind info.

@pseudotensor
Copy link
Collaborator

Thanks. .14 was messed up too, the responses were all wrong for GGUF models. .18 is back to normal, thanks.

@pseudotensor
Copy link
Collaborator

0.18 is bad unless one builds directly.

abetlen/llama-cpp-python#912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants