Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make web browser agent example a CLI script #416

Merged
merged 7 commits into from
Jan 30, 2025
Merged

Conversation

merveenoyan
Copy link
Contributor

@merveenoyan merveenoyan commented Jan 29, 2025

This PR makes computer use example into a CLI script for more general easy use + adds HfApiModel as provider since we'll have longer context Qwen2VL soon

@merveenoyan
Copy link
Contributor Author

@aymeric-roucher I wonder if it's helium or Qwen2VL but I can't seem to even execute go_to here 🧐👀

@merveenoyan merveenoyan marked this pull request as ready for review January 29, 2025 15:26
@merveenoyan
Copy link
Contributor Author

fixes #416

@merveenoyan
Copy link
Contributor Author

@aymeric-roucher is there any other option you'd like me to add to argparse?

@@ -64,7 +86,7 @@ def save_screenshot(step_log: ActionStep, agent: CodeAgent) -> None:
# Initialize driver and agent
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--force-device-scale-factor=1")
chrome_options.add_argument("--window-size=1000,1300")
chrome_options.add_argument("--window-size=1000,1350")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change ? (out of curiosity)

parser.add_argument(
"--model-id",
type=str,
default="gpt-4o",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made GPT-4o default here while waiting for Qwen-2.5-VL!

print(f"Error handling selector {selector}: {str(e)}")
continue
return "Modals closed"
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vastly simplified this tool thanks to this comment on our blog post: just using Escape works without heavy engineering!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍


agent.run(search_request + helium_instructions)
agent.python_executor("from helium import *", agent.state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this trick to avoid any potential errors: forces running the imports at start!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very cool! thank you!

@aymeric-roucher aymeric-roucher merged commit dcbbe44 into main Jan 30, 2025
3 of 4 checks passed
@aymeric-roucher aymeric-roucher changed the title Make computer use a CLI script Make web browser agent a CLI script Jan 30, 2025
@aymeric-roucher aymeric-roucher changed the title Make web browser agent a CLI script Make web browser agent example a CLI script Jan 30, 2025
merveenoyan

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants