diff --git a/examples/code_generation/codegen/README.md b/examples/code_generation/codegen/README.md index ee842be55e3a..cf6497caa750 100644 --- a/examples/code_generation/codegen/README.md +++ b/examples/code_generation/codegen/README.md @@ -106,7 +106,7 @@ python codegen_server.py ##### 配置参数说明 在codegen_server.py中配置如下参数: -- `model_name_or_path`:模型名,默认为 "Salesforce/codegen-2B-mono" +- `model_name_or_path`:模型名,默认为 "Salesforce/codegen-350M-mono" - `device`:运行设备,默认为"gpu" - `temperature`:解码参数temperature,默认为0.5 - `top_k`:解码参数top_k,默认为10 @@ -114,7 +114,7 @@ python codegen_server.py - `repetition_penalty`:解码重复惩罚项,默认为1.0 - `min_length`:生成的最小长度,默认为0 - `max_length`:生成的最大长度,默认为16 -- `decode_strategy`:解码策略,默认为"sampling" +- `decode_strategy`:解码策略,默认为"greedy_search" - `load_state_as_np`:以numpy格式加载模型参数,可节省显存,默认为True - `use_faster`:是否使用Fastergeneration,可加速推理,默认为True - `use_fp16_decoding`:是否使用fp16推理,可节省显存和加速推理,默认为True @@ -165,7 +165,16 @@ print(result) - 如果使用FasterGeneration,需要设置[codegen_server.py](#配置参数说明)中`use_faster=True`,第一次推理会涉及到编译,会耗费一些时间。FasterGeneration的环境依赖参考[这里](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/ops/README.md#%E4%BD%BF%E7%94%A8%E7%8E%AF%E5%A2%83%E8%AF%B4%E6%98%8E)。 - 如果要使用自己训练好的模型,可以设置[codegen_server.py](#配置参数说明)中`model_name_or_path`为本地模型路径。 - 如果要从本地访问服务器,上述的`127.0.0.1`需要换成服务器的对外IP。 - +- 如果出现下方的提示和报错,则说明FasterGeneration没有启动成功,需要定位下失败的原因。或者也可设置`use_faster=False`,不启动FasterGeneration加速,但推理速度会较慢。 +```shell + FasterGeneration is not available, and the original version would be used instead. +``` +```shell + RuntimeError: (NotFound) There are no kernels which are registered in the unsqueeze2 operator. + [Hint: Expected kernels_iter != all_op_kernels.end(), but received kernels_iter == all_op_kernels.end().] (at /home/Paddle/paddle/fluid/imperative/prepared_operator.cc:341) + [operator < unsqueeze2 > error] +``` +- 本代码也支持插件[fauxpilot](https://marketplace.visualstudio.com/items?itemName=Venthe.fauxpilot),感谢[@linonetwo](https://github.com/linonetwo)测试。`settings.json`中配置"fauxpilot.server": "http://服务器ip:8978/v1/engines" ## 训练定制 @@ -307,3 +316,4 @@ hello_world() ## References - Nijkamp, Erik, et al. "A conversational paradigm for program synthesis." arXiv preprint arXiv:2203.13474 (2022). - [https://github.com/features/copilot/](https://github.com/features/copilot/) +- [https://github.com/AndPuQing/Papilot](https://github.com/AndPuQing/Papilot) diff --git a/examples/code_generation/codegen/requirements.txt b/examples/code_generation/codegen/requirements.txt index 37e5ae958c12..ae00f4799fa1 100644 --- a/examples/code_generation/codegen/requirements.txt +++ b/examples/code_generation/codegen/requirements.txt @@ -3,4 +3,5 @@ pydantic==1.9.1 python-dotenv==0.20.0 sse_starlette==0.10.3 uvicorn==0.17.6 -openai==0.8.0 \ No newline at end of file +openai==0.8.0 +regex==2022.6.2 \ No newline at end of file diff --git a/faster_generation/README.md b/faster_generation/README.md index 58cb2a6f4f99..dc156550f72e 100644 --- a/faster_generation/README.md +++ b/faster_generation/README.md @@ -43,25 +43,25 @@ FasterGeneration的高性能解码相比原版generate方法加速明显,并 - torch version 1.10.0+cu113 - transformers version 4.12.5 -**BART** (bart-base, batch_size=4, max_length=32) +### **BART** (bart-base, batch_size=4, max_length=32)

-**GPT** (gpt2, batch_size=4, max_length=32) +### **GPT** (gpt2, batch_size=4, max_length=32)

-**OPT** (opt, batch_size=4, max_length=32) +### **OPT** (opt, batch_size=4, max_length=32)

-**CodeGen:** +### **CodeGen:** * 环境和超参 - Platform: Tesla V100-SXM2-32GB - CUDA 10.1