Skip to content

Latest commit

 

History

History
51 lines (45 loc) · 1.95 KB

README.md

File metadata and controls

51 lines (45 loc) · 1.95 KB

SmartServe.Server

Usage

Start a server:

  • --model-folder(optional): The folder of model workspace. When receving a request specifying the model, it search firstly in this directory. For example, when I specify --model-folder model, it will search models in the directoy ./model if it exists. When I specify the model folder with such a request:
    {
        "model": "llama3.1-8b-instruct",
        "prompt": "Hello"
    }
    it will search the directory ./model/llama3.1-8b-instruct at first and then llama3.1-8b-instruct if the former one does not exist.
  • --lib-folder(optional): The folder containing QNN libraries. If not specified, it will be assigned with model-folder/qnn_libs by default.
  • --host(optional): The IP address the server listen
  • --port(optional): The IP port the server listen
./build/bin/server --model-folder model --host 127.0.0.1 --port 8080

Test the server simply:

  • Completion
    curl --request POST \
        --url http://localhost:8080/completion \
        --header "Content-Type: application/json" \
        --data '{"prompt": "Once upon a time", "max_tokens": 128, "model": "model"}'
  • Chat
    curl --request POST \
        --url http://localhost:8080/v1/chat/completions \
        --header "Content-Type: application/json" \
        --data '{"messages": [{"role":"user", "content":"Once upon a time"}], "model": "llama3.1-8b-q8"}'
  • Streamly chat
    curl --request POST \
        --url http://localhost:8080/v1/chat/completions \
        --header "Content-Type: application/json" \
        --data '{"stream": true, "messages": [{"role":"user", "content":"Once upon a time"}], "model": "llama3.1-8b-q8"}'

OpenAI API

SmartServe.server support part of OpenAI API:

  • /v1/completions
  • /v1/chat/completions
  • /v1/models

As for the detail please refer to the API documentation