A Benchmark for Automated Environment Setup
This project automates the process of setting up development environments by analyzing project requirements and configuring the necessary tools and dependencies. It supports both Python and JVM-based projects.
Setup a virtual environment and install dependencies using uv.
uv venv --python 3.12
source .venv/bin/activate
uv sync
To run the complete pipeline (inference and evaluation):
uv run envbench \
-cn python-bash \
llm@inference.agent=gpt-4o-mini \
traj_repo_id=<your-hf-username>/<your-repo-name> \ # repository to save trajectories
use_wandb=true
Results are automatically uploaded to the provided trajectories repository on HuggingFace.
For all configuration options, including different agents and llms, see conf directory with Hydra configs.
If you want to run the pipeline only for evaluation, you can use the following command:
uv run envbench -cn python-bash skip_inference=true skip_processing=true run_name<your-run-name>
Alternatively, take a look at the evaluation/main.py file for more details on how to run the evaluation step.
If you find our work helpful, please use the following citation:
@inproceedings{
eliseeva2025envbench,
title={EnvBench: A Benchmark for Automated Environment Setup},
author={Aleksandra Eliseeva and Alexander Kovrigin and Ilia Kholkin and Egor Bogomolov and Yaroslav Zharov},
booktitle={ICLR 2025 Third Workshop on Deep Learning for Code},
year={2025},
url={https://openreview.net/forum?id=izy1oaAOeX}
}
MIT. Check LICENSE
.