Demo scripts

Ask Anything

Ask any questions about the given workspace

PYTHONPATH=. python scripts/run_ask.py \
  --workspace $(pwd)/benchmark/workspaces/OpenHands/39_Drug_Response_Prediction_SVM_GDSC_ML \
  --question "What does this workspace contain?"

Agent-as-a-Judge

Using the collected trajectories or development logs (gray-box setting)

PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "gray_box" \
  --planning "comprehensive (no planning)" \
  --benchmark_dir $(pwd)/benchmark

Do not have trajectories or development logs (black-box setting)

PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "black_box" \
  --planning "efficient (no planning)" \
  --benchmark_dir $(pwd)/benchmark

Do not have trajectories or development logs and using planning to decide the actions of Agent-as-a-Judge (black-box setting)

PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "gray_box" \
  --planning "planning" \
  --benchmark_dir $(pwd)/benchmark

Statistics

Get the statistics of the projects

PYTHONPATH=. python scripts/run_statistics.py \
    --benchmark_dir $(pwd)/benchmark \
    --developer_agent OpenHands

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Demo scripts

Ask Anything

Agent-as-a-Judge

Statistics

Files

README.md

Latest commit

History

README.md

File metadata and controls

Demo scripts

Ask Anything

Agent-as-a-Judge

Statistics