Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor prompts #502

Merged
merged 15 commits into from
Feb 6, 2025
Merged

Refactor prompts #502

merged 15 commits into from
Feb 6, 2025

Conversation

aymeric-roucher
Copy link
Collaborator

No description provided.

@aymeric-roucher aymeric-roucher changed the title Share prompts Refactor prompts Feb 5, 2025
@aymeric-roucher
Copy link
Collaborator Author

@albertvillanova do you have pointers about that failed test? It's linked to the MagicMock, I don't really have experience with these.
https://github.com/huggingface/smolagents/actions/runs/13163295123/job/36737183697?pr=502

@albertvillanova
Copy link
Member

Let me have a look! 😉

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fix the MagicMock issue in the test.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However there is another issue:

  • MultiStepAgent does not implement initialize_system_prompt
  • Therefore: self.system_prompt = None
  • self.system_prompt.strip() will raise an error
AttributeError: 'NoneType' object has no attribute 'strip'

@aymeric-roucher
Copy link
Collaborator Author

Oh and btw @albertvillanova explanations about the removal of single_step option: this option was only introduced for a benchmark, not really useful because single-step agents struggle to solve real tasks, and it introduced lots of overhead with a dedicated prompt: thus the removal!

aymeric-roucher and others added 3 commits February 5, 2025 21:15
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
@aymeric-roucher
Copy link
Collaborator Author

@albertvillanova regarding you command above on system_prompt being None: I've removed the .strip() from the incriminated line, WDYT?

@albertvillanova
Copy link
Member

albertvillanova commented Feb 6, 2025

@albertvillanova regarding you command above on system_prompt being None: I've removed the .strip() from the incriminated line, WDYT?

Does it make sense to return "text": None?

[Message(role=MessageRole.SYSTEM, content=[{"type": "text", "text": None}])]

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the failing test.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed managed_agent_prompt.

aymeric-roucher and others added 5 commits February 6, 2025 14:01
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
def check_always_fails(final_answer, agent_memory):
assert False, "Error raised in check"

agent = CodeAgent(model=fake_code_model, tools=[], final_answer_checks=[check_always_fails])
Copy link
Collaborator Author

@aymeric-roucher aymeric-roucher Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertvillanova I've created this new logic with final_answer_checks to force the agent to keep running while the validation does not pass. It's quite handy for some validation logic like "output should be a list" or "let a LLM verify your output according to a pre-defined validation prompt and parse this LLM's output to find if the test passes".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of such a check for a task that involves making a plot:

def check_reasoning_and_plot(final_answer, agent_memory):
    final_answer
    multimodal_model = OpenAIServerModel("gpt-4o")
    filepath = "saved_map.png"
    assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
    image = Image.open(filepath)
    prompt = (
        f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
        "Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
        "First list 3 reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
        "Don't be harsh: if the plot mostly solves the task, it should pass."
        "But if any data was hallucinated/invented, you should refuse it."
    )
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {"type": "image_url", "image_url": {"url": make_image_url(encode_image_base64(image))}},
            ],
        }
    ]
    output = multimodal_model(messages).content
    print("Feedback: ", output)
    if "FAIL" in output:
        raise Exception(output)
    return True

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's start sharing agents!

@aymeric-roucher aymeric-roucher merged commit 8ba036b into main Feb 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants