🌱⚙️ EnvBench

A Benchmark for Automated Environment Setup

Overview

This project automates the process of setting up development environments by analyzing project requirements and configuring the necessary tools and dependencies. It supports both Python and JVM-based projects.

Prerequisites

uv for dependency management
Docker for running isolated environments

Running the Benchmark

Setup

Setup a virtual environment and install dependencies using uv.

uv venv --python 3.12
source .venv/bin/activate
uv sync

Running the Pipeline

To run the complete pipeline (inference and evaluation):

uv run envbench \
    -cn python-bash \
    llm@inference.agent=gpt-4o-mini \
    traj_repo_id=<your-hf-username>/<your-repo-name> \ # repository to save trajectories
    use_wandb=true

Results are automatically uploaded to the provided trajectories repository on HuggingFace.

For all configuration options, including different agents and llms, see conf directory with Hydra configs.

If you want to run the pipeline only for evaluation, you can use the following command:

uv run envbench -cn python-bash skip_inference=true skip_processing=true run_name<your-run-name>

Alternatively, take a look at the evaluation/main.py file for more details on how to run the evaluation step.

Implementation Details

Artifacts

Citation

If you find our work helpful, please use the following citation:

@inproceedings{
eliseeva2025envbench,
title={EnvBench: A Benchmark for Automated Environment Setup},
author={Aleksandra Eliseeva and Alexander Kovrigin and Ilia Kholkin and Egor Bogomolov and Yaroslav Zharov},
booktitle={ICLR 2025 Third Workshop on Deep Learning for Code},
year={2025},
url={https://openreview.net/forum?id=izy1oaAOeX}
}

License

MIT. Check LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
conf		conf
data_collection		data_collection
dockerfiles		dockerfiles
env_setup_utils		env_setup_utils
evaluation		evaluation
inference		inference
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
envbench.py		envbench.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌱⚙️ EnvBench

Overview

Prerequisites

Running the Benchmark

Setup

Running the Pipeline

Implementation Details

Artifacts

Citation

License

About

Packages

Languages

License

JetBrains-Research/EnvBench

Folders and files

Latest commit

History

Repository files navigation

🌱⚙️ EnvBench

Overview

Prerequisites

Running the Benchmark

Setup

Running the Pipeline

Implementation Details

Artifacts

Citation

License

About

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages