NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Code and data for our paper: NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models [Paper].

News

[2025.01.13] Release the scripts and the remaining code.
[2025.01.09] Release the code for inference and evaluation.
[2025.01.08] Release the data and code for data construction.

🔨 Preparations

$ git clone https://github.com/hhan1018/NesTools.git
$ cd NesTools
$ pip install -r requirements.txt

🍰 Get started

Our test data can be found in data/test_data.jsonl.

Data construction

If you want to experience our data construction method, please follow the steps:

Set your api key and url in data_construction/settings.py. Meanwhile, you can change the ICL examples to satisfy your taste in data_construction/settings.py.
Start the data construction:

python data_construction/main.py --refine

Build evaluation settings

Downloading gte-large-en-v1.5 [link] or other embedding models.
Modify the path of the embedding model in scripts/build.sh.
Start the process:

bash scripts/build.sh

Inference

Note: Our test prompt can be found in inference/test_prompt.jsonl, which can be used for evaluation directly or as a reference.

Set your api key and url in scripts/inference.sh.
Modify the model name and output path in scripts/inference.sh.
Start the Inference process:

bash scripts/inference.sh

Evaluation

Modify the output path for storing model inference results in scripts/eval.sh.
Choose the command corresponding to the evaluation mode in scripts/eval.sh.
Start the Evaluation process:

bash scripts/eval.sh

📝 Citation

If you find our work useful in your research, please cite our work:

@article{han2024nestools,
  title={NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models},
  author={Han, Han and Zhu, Tong and Zhang, Xiang and Wu, Mengsong and Xiong, Hao and Chen, Wenliang},
  journal={arXiv preprint arXiv:2410.11805},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

News

🔨 Preparations

🍰 Get started

Data construction

Build evaluation settings

Inference

Evaluation

📝 Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
data_construction		data_construction
evaluation		evaluation
inference		inference
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

hhan1018/NesTools

Folders and files

Latest commit

History

Repository files navigation

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

News

🔨 Preparations

🍰 Get started

Data construction

Build evaluation settings

Inference

Evaluation

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages