-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Add math benchmarks #1570
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @hallerite ,great work! but the docstring need to be polished ,please refer to:
https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#guideline-for-writing-docstrings
|
||
|
||
class GSM8KBenchmark(MathBenchmark): | ||
"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" | |
r"""Benchmark for evaluating ChatAgents on the GSM8K dataset from Hugging Face Hub.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a docstring optimize example
can we add an example under the example file directory? |
…oning data with thought process (Long Cot data)from deepseek R1 (#1532) Co-authored-by: “yifeng.wang” <“3038880699@qq.com;q:wqqgit config --global user.name “yifeng.wang”git config --global user.email “3038880699@qq.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com> Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com>
…mel (#1493) Co-authored-by: 任信行 <renxinxing@renxinxingdeMacBook-Pro.local> Co-authored-by: Harry Ye <116691547+harryeqs@users.noreply.github.com> Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
…nto feat/benchmarks
of the run function
it doesn't exist
parse and evaluate the Agents Output
style checks
and fixed errors
pass itself as a directory
verify and added it to mypy overrides since it doesn't have a typing package
@apokryphosx added math-verify as dependency, but since it is very new, it cannot be resolved it seems. Any idea what we should do? Without it, the benchmarks are much less powerful. cc: @Wendong-Fan |
Description
This PR introduces a base class for math benchmarks and provides implementations for:
Motivation and Context
This PR addresses and closes #1510.
Types of Changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks ✅
Checklist 📝
Please go over all the following points and put an
x
in the boxes that apply.If you're unsure about any, feel free to ask!
Draft Status 🚧
Current Progress:
Next Steps: