An adaptive self-improvement LLM agentic system for ML library development. We choose STeP as the target ASPL for next-generation RDA. Please run the following commands in order and under the /PCL-lite
folder.
(Optional)
pip install -r requirements.txt
Validate the test and reference yamls under /benchmark
and /prompts
./scripts/validate.sh
./scripts/prepare.sh
./experiments/single/run.sh
./experiments/agent/run.sh
./experiments/iterative/run.sh
./experiments/single-ws/run.sh
- We recommend changing the
BASE_PATH
in theexperiments
bash scripts to folder that are not git. Otherwise, parallel sampling might be slowed down by more than 10x because of git logging. - Users can change the
MODEL_NAME
in theexperiments
bash scripts to any supported model:
Model | API | Environment Variable |
---|---|---|
claude-3-5-sonnet-20241022 | Anthropic | ANTHROPIC_API_BASE, ANTHROPIC_API_KEY |
gpt-4o-2024-11-20 | OpenAI | OPENAI_API_BASE, OPENAI_API_KEY |
Meta-Llama-3-1-405B-Instruct-Turbo | TogetherAI | TOGETHER_API_BASE, TOGETHER_API_KEY |
DeepSeek-V3 | DeepSeek-chat | DEEPSEEK_API_BASE, DEEPSEEK_API_KEY |
Qwen2-5-Coder-32B-Instruct | TogetherAI | TOGETHER_API_BASE, TOGETHER_API_KEY |
- Since STeP is still a research prototype, we only publish the bmm tasks in the benchmark.
NUM_SAMPLES
andTEMPERATURE
can be adjusted.
If you find this work useful, please cite it:
@article{zhang2025adaptive,
title={Adaptive Self-improvement LLM Agentic System for ML Library Development},
author={Zhang, Genghan and Liang, Weixin and Hsu, Olivia and Olukotun, Kunle},
journal={arXiv preprint},
year={2025},
url={https://arxiv.org/abs/2502.02534},
}