Skip to content

Commit 61edffe

Browse files
author
will.yang
committed
release v1.1.1
1 parent 71773f0 commit 61edffe

15 files changed

+229
-104
lines changed

README.md

+66-11
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Description
2+
23
RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:
34
<center class="half">
45
<div style="background-color:#ffffff;">
@@ -14,33 +15,87 @@
1415
- RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.
1516

1617
# Support Platform
17-
- RK3588 Series
18-
- RK3576 Series
18+
19+
- RK3588 Series
20+
- RK3576 Series
1921

2022
# Support Models
21-
- [X] [LLAMA models](https://huggingface.co/meta-llama)
22-
- [X] [TinyLLAMA models](https://huggingface.co/TinyLlama)
23-
- [X] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen)
24-
- [X] [Phi models](https://huggingface.co/models?search=microsoft/phi)
25-
- [X] [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405)
26-
- [X] [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
27-
- [X] [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c)
28-
- [X] [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f)
23+
24+
- [x] [LLAMA models](https://huggingface.co/meta-llama)
25+
- [x] [TinyLLAMA models](https://huggingface.co/TinyLlama)
26+
- [x] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen)
27+
- [x] [Phi models](https://huggingface.co/models?search=microsoft/phi)
28+
- [x] [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405)
29+
- [x] [Gemma models](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
30+
- [x] [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c)
31+
- [x] [MiniCPM models](https://huggingface.co/collections/openbmb/minicpm-65d48bf958302b9fd25b698f)
32+
33+
# Model Performance Benchmark
34+
35+
| model | dtype | seqlen | max_context | new_tokens | TTFT(ms) | Tokens/s | memory(G) | platform |
36+
|:-------------- |:---------- |:------:|:-----------:|:----------:|:--------:|:--------:|:---------:|:--------:|
37+
| TinyLLAMA-1.1B | w4a16 | 64 | 320 | 256 | 345.00 | 21.10 | 0.77 | RK3576 |
38+
| | w4a16_g128 | 64 | 320 | 256 | 410.00 | 18.50 | 0.8 | RK3576 |
39+
| | w8a8 | 64 | 320 | 256 | 140.46 | 24.21 | 1.25 | RK3588 |
40+
| | w8a8_g512 | 64 | 320 | 256 | 195.00 | 20.08 | 1.29 | RK3588 |
41+
| Qwen2-1.5B | w4a16 | 64 | 320 | 256 | 512.00 | 14.40 | 1.75 | RK3576 |
42+
| | w4a16_g128 | 64 | 320 | 256 | 550.00 | 12.75 | 1.76 | RK3576 |
43+
| | w8a8 | 64 | 320 | 256 | 206.00 | 16.46 | 2.47 | RK3588 |
44+
| | w8a8_g128 | 64 | 320 | 256 | 725.00 | 7.00 | 2.65 | RK3588 |
45+
| Phi-3-3.8B | w4a16 | 64 | 320 | 256 | 975.00 | 6.60 | 2.16 | RK3576 |
46+
| | w4a16_g128 | 64 | 320 | 256 | 1180.00 | 5.85 | 2.23 | RK3576 |
47+
| | w8a8 | 64 | 320 | 256 | 516.00 | 7.44 | 3.88 | RK3588 |
48+
| | w8a8_g512 | 64 | 320 | 256 | 610.00 | 6.13 | 3.95 | RK3588 |
49+
| ChatGLM3-6B | w4a16 | 64 | 320 | 256 | 1168.00 | 4.62 | 3.86 | RK3576 |
50+
| | w4a16_g128 | 64 | 320 | 256 | 1582.56 | 3.82 | 3.96 | RK3576 |
51+
| | w8a8 | 64 | 320 | 256 | 800.00 | 4.95 | 6.69 | RK3588 |
52+
| | w8a8_g128 | 64 | 320 | 256 | 2190.00 | 2.70 | 7.18 | RK3588 |
53+
| Gemma2-2B | w4a16 | 64 | 320 | 256 | 628.00 | 8.00 | 3.63 | RK3576 |
54+
| | w4a16_g128 | 64 | 320 | 256 | 776.20 | 7.40 | 3.63 | RK3576 |
55+
| | w8a8 | 64 | 320 | 256 | 342.29 | 9.67 | 4.84 | RK3588 |
56+
| | w8a8_g128 | 64 | 320 | 256 | 1055.00 | 5.49 | 5.14 | RK3588 |
57+
| InternLM2-1.8B | w4a16 | 64 | 320 | 256 | 475.00 | 13.30 | 1.59 | RK3576 |
58+
| | w4a16_g128 | 64 | 320 | 256 | 572.00 | 11.95 | 1.62 | RK3576 |
59+
| | w8a8 | 64 | 320 | 256 | 205.97 | 15.66 | 2.38 | RK3588 |
60+
| | w8a8_g512 | 64 | 320 | 256 | 298.00 | 12.66 | 2.45 | RK3588 |
61+
| MiniCPM3-4B | w4a16 | 64 | 320 | 256 | 1397.00 | 4.80 | 2.7 | RK3576 |
62+
| | w4a16_g128 | 64 | 320 | 256 | 1645.00 | 4.39 | 2.8 | RK3576 |
63+
| | w8a8 | 64 | 320 | 256 | 702.18 | 6.15 | 4.65 | RK3588 |
64+
| | w8a8_g128 | 64 | 320 | 256 | 1691.00 | 3.42 | 5.06 | RK3588 |
65+
| llama3-8B | w4a16 | 64 | 320 | 256 | 1607.98 | 3.60 | 5.63 | RK3576 |
66+
| | w4a16_g128 | 64 | 320 | 256 | 2010.00 | 3.00 | 5.76 | RK3576 |
67+
| | w8a8 | 64 | 320 | 256 | 1128.00 | 3.79 | 9.21 | RK3588 |
68+
| | w8a8_g512 | 64 | 320 | 256 | 1281.35 | 3.05 | 9.45 | RK3588 |
69+
70+
- This performance data were collected based on the maximum CPU and NPU frequencies of each platform with version 1.1.0.
71+
- The script for setting the frequencies is located in the scripts directory.
2972

3073
# Download
74+
3175
You can download the latest package, docker image, example, documentation, and platform-tool from [RKLLM_SDK](https://console.zbox.filez.com/l/RJJDmB), fetch code: rkllm
3276

3377
# Note
3478

35-
The modifications in version 1.1.0 are significant, making it incompatible with older version models. Please use the latest toolchain for model conversion and inference.
79+
- The modifications in version 1.1 are significant, making it incompatible with older version models. Please use the latest toolchain for model conversion and inference.
80+
81+
- The supported Python versions are:
82+
83+
- Python 3.8
84+
85+
- Python 3.10
86+
87+
- Latest version: [ <u>v1.1.1](https://github.com/airockchip/rknn-llm/releases/tag/release-v1.1.1)</u>
3688

3789
# RKNN Toolkit2
90+
3891
If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:
3992

4093
https://github.com/airockchip/rknn-toolkit2
4194

4295
# CHANGELOG
96+
4397
## v1.1.0
98+
4499
- Support group-wise quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
45100
- Support joint inference with LoRA model loading
46101
- Support storage and preloading of prompt cache.

doc/Rockchip_RKLLM_SDK_CN_1.1.0.pdf

100755100644
-1.3 KB
Binary file not shown.

doc/Rockchip_RKLLM_SDK_EN_1.1.0.pdf

100755100644
-1.3 KB
Binary file not shown.

rkllm-runtime/examples/rkllm_server_demo/README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ Before running the demo, you need to prepare the following files:
88
### Build
99
You can run the demo with the only command:
1010
```bash
11-
# Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
12-
./build_rkllm_server_flask.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588 --npu_num 3
11+
# Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
12+
./build_rkllm_server_flask.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588
1313
```
1414
### Access with API
1515
After building the RKLLM-Server-Flask, You can use ‘chat_api_flask.py’ to access the RKLLM-Server-Flask and get the answser of RKLLM models.
@@ -20,8 +20,8 @@ Attention: you should check the IP address of the board with 'ifconfig' command
2020
### Build
2121
You can run the demo with the only command:
2222
```bash
23-
# Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
24-
./build_rkllm_server_gradio.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588 --npu_num 3
23+
# Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
24+
./build_rkllm_server_gradio.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588
2525
```
2626
### Access the Server
2727
After running the demo, You can access the RKLLM-Server-Gradio with two ways:

rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_flask.sh

+4-8
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@
33
#*****************************************************************************************#
44
# This script is an automated setup script for the RKLLM-Server-Flask service.
55
# Users can run this script to automate the deployment of the RKLLM-Server-Flask service on a Linux board.
6-
# Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
7-
# example: ./build_rkllm_server_flask.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588 --npu_num 3
6+
# Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
7+
# example: ./build_rkllm_server_flask.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588
88
#*****************************************************************************************#
99

1010
LORA_PATH=""
1111
PROMPT_FILE_PATH=""
1212

1313
# Function to display help
1414
function show_help {
15-
echo "Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]"
15+
echo "Usage: ./build_rkllm_server_flask.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]"
1616
}
1717

1818
# Parse command-line options
@@ -30,10 +30,6 @@ while [[ $# -gt 0 ]]; do
3030
TARGET_PLATFORM="$2"
3131
shift 2
3232
;;
33-
--npu_num)
34-
NPU_CORE_COUNT="$2"
35-
shift 2
36-
;;
3733
--lora_model_path)
3834
LORA_PATH="$2"
3935
shift 2
@@ -93,7 +89,7 @@ cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so ./rkllm_server/lib/
9389
adb push ./rkllm_server $WORKING_PATH
9490

9591
#################### Enter the board terminal and start the server service. ####################
96-
CMD="python3 flask_server.py --rkllm_model_path $MODEL_PATH --target_platform $TARGET_PLATFORM --num_npu_core $NPU_CORE_COUNT"
92+
CMD="python3 flask_server.py --rkllm_model_path $MODEL_PATH --target_platform $TARGET_PLATFORM"
9793
if [[ -n "$LORA_PATH" ]]; then
9894
CMD="$CMD --lora_model_path $LORA_PATH"
9995
fi

rkllm-runtime/examples/rkllm_server_demo/build_rkllm_server_gradio.sh

+4-8
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@
33
#*****************************************************************************************#
44
# This script is an automated setup script for the RKLLM-Server-Gradio service.
55
# Users can run this script to automate the deployment of the RKLLM-Server-Gradio service on a Linux board.
6-
# Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
7-
# example: ./build_rkllm_server_gradio.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588 --npu_num 3
6+
# Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_model_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]
7+
# example: ./build_rkllm_server_gradio.sh --workshop /user/data --model_path /user/data/model.rkllm --platform rk3588
88
#*****************************************************************************************#
99

1010
LORA_PATH=""
1111
PROMPT_FILE_PATH=""
1212

1313
# Function to display help
1414
function show_help {
15-
echo "Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] --npu_num [NPU Core Count] [--lora_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]"
15+
echo "Usage: ./build_rkllm_server_gradio.sh --workshop [RKLLM-Server Working Path] --model_path [Absolute Path of Converted RKLLM Model on Board] --platform [Target Platform: rk3588/rk3576] [--lora_path [Lora Model Path]] [--prompt_cache_path [Prompt Cache File Path]]"
1616
}
1717

1818
# Parse command-line options
@@ -30,10 +30,6 @@ while [[ $# -gt 0 ]]; do
3030
TARGET_PLATFORM="$2"
3131
shift 2
3232
;;
33-
--npu_num)
34-
NPU_CORE_COUNT="$2"
35-
shift 2
36-
;;
3733
--lora_model_path)
3834
LORA_PATH="$2"
3935
shift 2
@@ -92,7 +88,7 @@ cp ../../runtime/Linux/librkllm_api/aarch64/librkllmrt.so ./rkllm_server/lib/
9288
adb push ./rkllm_server $WORKING_PATH
9389

9490
#################### Enter the board terminal and start the server service. ####################
95-
CMD="python3 gradio_server.py --rkllm_model_path $MODEL_PATH --target_platform $TARGET_PLATFORM --num_npu_core $NPU_CORE_COUNT"
91+
CMD="python3 gradio_server.py --rkllm_model_path $MODEL_PATH --target_platform $TARGET_PLATFORM"
9692

9793
if [[ -n "$LORA_PATH" ]]; then
9894
CMD="$CMD --lora_model_path $LORA_PATH"

0 commit comments

Comments
 (0)