Skip to content

Latest commit

 

History

History
54 lines (40 loc) · 1.3 KB

README.md

File metadata and controls

54 lines (40 loc) · 1.3 KB

Demo scripts

Ask Anything

  1. Ask any questions about the given workspace
PYTHONPATH=. python scripts/run_ask.py \
  --workspace $(pwd)/benchmark/workspaces/OpenHands/39_Drug_Response_Prediction_SVM_GDSC_ML \
  --question "What does this workspace contain?"

Agent-as-a-Judge

  1. Using the collected trajectories or development logs (gray-box setting)
PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "gray_box" \
  --planning "comprehensive (no planning)" \
  --benchmark_dir $(pwd)/benchmark
  1. Do not have trajectories or development logs (black-box setting)
PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "black_box" \
  --planning "efficient (no planning)" \
  --benchmark_dir $(pwd)/benchmark
  1. Do not have trajectories or development logs and using planning to decide the actions of Agent-as-a-Judge (black-box setting)
PYTHONPATH=. python scripts/run_aaaj.py \
  --developer_agent "OpenHands" \
  --setting "gray_box" \
  --planning "planning" \
  --benchmark_dir $(pwd)/benchmark

Statistics

  1. Get the statistics of the projects
PYTHONPATH=. python scripts/run_statistics.py \
    --benchmark_dir $(pwd)/benchmark \
    --developer_agent OpenHands