-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sglang benchmark test #476
Conversation
Restructure ./build_tools directory for integration tests, Move most export/startup functions for shortfin to utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware that this is still a draft, however, some things to consider while getting it into a shape that can be merged.
Move export/compile to conftest, Parametrize benchmark test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first comments. I think there was an agreement to add a new top-level folder @ScottTodd?
I moved the integration and benchmark tests out of |
Remove quotation marks
The token cannot be accessed from an outside PR / PR from a fork. Thus, if dropping the |
Gotcha, that makes sense. Removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine to land and to iterate on as needed.
Description
Create a nightly workflow for SGLang Benchmark test that enables running a Shortfin server and benchmarking from SGLang, using the
bench_serving
script.bench_serving
InvocationsThe bench_serving script is ran with various
request-rate
arguments:--request-rate 1
--output-file <tmp_dir>/shortfin_10_1.jsonl--request-rate 2
--output-file <tmp_dir>/shortfin_10_1.jsonl--request-rate 4
--output-file <tmp_dir>/shortfin_10_1.jsonl--request-rate 8
--output-file <tmp_dir>/shortfin_10_1.jsonl--request-rate 16
--output-file <tmp_dir>/shortfin_10_1.jsonl--request-rate 32
--output-file <tmp_dir>/shortfin_10_1.jsonlAfter the test is finished running, we upload the html output from pytest to gh-pages. The subdirectory is set to
./llm/sglang
, so the results should be accessible from the browser at/llm/sglang/index.html
in gh-pages.This also includes a refactor of the existing integration test. Most of the methods for downloading a model/tokenizer, exporting to mlir, compiling to vmfb, and starting a shortfin server have been moved to a
utils.py
file.