-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make web browser agent example a CLI script #416
Conversation
@aymeric-roucher I wonder if it's helium or Qwen2VL but I can't seem to even execute |
fixes #416 |
@aymeric-roucher is there any other option you'd like me to add to argparse? |
@@ -64,7 +86,7 @@ def save_screenshot(step_log: ActionStep, agent: CodeAgent) -> None: | |||
# Initialize driver and agent | |||
chrome_options = webdriver.ChromeOptions() | |||
chrome_options.add_argument("--force-device-scale-factor=1") | |||
chrome_options.add_argument("--window-size=1000,1300") | |||
chrome_options.add_argument("--window-size=1000,1350") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change ? (out of curiosity)
parser.add_argument( | ||
"--model-id", | ||
type=str, | ||
default="gpt-4o", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made GPT-4o default here while waiting for Qwen-2.5-VL!
print(f"Error handling selector {selector}: {str(e)}") | ||
continue | ||
return "Modals closed" | ||
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vastly simplified this tool thanks to this comment on our blog post: just using Escape works without heavy engineering!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
|
||
agent.run(search_request + helium_instructions) | ||
agent.python_executor("from helium import *", agent.state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found this trick to avoid any potential errors: forces running the imports at start!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool! thank you!
This PR makes computer use example into a CLI script for more general easy use + adds
HfApiModel
as provider since we'll have longer context Qwen2VL soon