-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add streaming tool use #1884
base: main
Are you sure you want to change the base?
Conversation
@abetlen The tests all pass, but the macOS ones were terminated after a timeout. I think this is because of a lack of CPU and or memory resources because the tests run fine on my macOS machine. |
I would love to see this merged! Actually there are quite a lot of good pull requests here that i would like to see merged... But this one is top priority! |
Update: I rebased on the latest |
2506581
to
b4f8fde
Compare
Update: I rebased on the latest |
Worked well for me, would you mind rebasing to the latest commit to allow for tool streaming with Qwen models? |
This PR upgrades the
chatml-function-calling
chat handler with support for streaming tool use and fixes #1883, #1869, and #1756, among other improvements.Changes:
a. ✨ If no system message is supplied, add an empty system message to hold the tool metadata.
b. ✨ Add function descriptions to the system message so that tool use is better informed (fixes chatml-function-callling not adding tool description to the prompt. #1869).
c. ✨ Replace
print
statements relating to JSON grammars withRuntimeWarning
warnings.d. ✅ Add tests with fairly broad coverage of the different scenarios.
a. ✨ Add support for more than one function call by making this a special case of "Automatic tool choice" with a single tool (subsumes Support parallel function calls with tool_choice #1503).
a. ✨ Use user-defined
stop
andmax_tokens
.b. 🐛 Replace incorrect use of follow-up grammar with user-defined grammar.
a. ✨ Add support for streaming the function calls (fixes Feature request: add support for streaming tool use #1883).
b. ✨ Make tool calling more robust by giving the LLM an explicit way to terminate the tool calls by wrapping them in a
<function_calls></function_calls>
block.c. 🐛 Add missing ":" stop token to determine whether to continue with another tool call, which prevented parallel function calling (fixes chatml-function-calling chat format fails to generate multi calls to the same tool #1756).
d. ✨ Set temperature=0 to determine whether to continue with another tool call, similar to the initial decision on whether to call a tool.