Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save raw responses for base models #284

Closed
wants to merge 1 commit into from

Conversation

matthewcarbone
Copy link

Currently, it seems that the full response metadata is not saved anywhere in the agent logs or in the model. I could be missing something, but from the way response is parsed here, it would seem that all information e.g. content filter results is lost during the calls. I would very much like to access this information as an extra safety layer. In addition, I think it would probably be good to allow users to access this, at least at the model level.

I've implemented this very simple change, which does seem to work in my local testing! Really nothing too crazy, and I'm happy to modify the PR in whatever way makes sense if the maintainers feel it needs additions/changes. Now, you can access the model's raw responses via model.raw_responses. I think it might make more sense to implement at the agent level, but I figure it's a start.

Added raw_responses attribute to the smolagents models. This list is
appended at every call with the response object of the models.
@aymeric-roucher
Copy link
Collaborator

@matthewcarbone the LLM outputs are already all saved at the agent level under ActionStep.llm_output! Indeed storing them at the agent level as we do makes more sense IMO.

@matthewcarbone
Copy link
Author

matthewcarbone commented Jan 21, 2025

@aymeric-roucher checking llm_output gives me None in my case, which I've put below for convenience:

model = AzureOpenAIServerModel(
    model_id = ...,
    api_key=api_key,
    api_version=...,
    azure_endpoint=base_url,
)
agent = MultiStepAgent(tools=[], model=model, add_base_tools=False, max_steps=1)
result = agent.run("Why is the sky blue?", )

(see #282). agent.logs[-1].llm_output gives None in that case.

Also, I'm not specifically talking about the text output, I'm talking about the metadata from the call, which in my case, is accessed via response.response_metadata and looks something like this:

{'token_usage': {'completion_tokens': 193,
  'prompt_tokens': 4903,
  'total_tokens': 5096,
  'completion_tokens_details': None,
  'prompt_tokens_details': None},
 'model_name': ...,
 'system_fingerprint': ...,
 'prompt_filter_results': [{'prompt_index': 0,
   'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'},
    'jailbreak': {'filtered': False, 'detected': False},
    'self_harm': {'filtered': False, 'severity': 'safe'},
    'sexual': {'filtered': False, 'severity': 'safe'},
    'violence': {'filtered': False, 'severity': 'safe'}}}],
 'finish_reason': 'stop',
 'logprobs': None,
 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'},
  'protected_material_code': {'filtered': False, 'detected': False},
  'protected_material_text': {'filtered': False, 'detected': False},
  'self_harm': {'filtered': False, 'severity': 'safe'},
  'sexual': {'filtered': False, 'severity': 'safe'},
  'violence': {'filtered': False, 'severity': 'safe'}}}

It does not appear that the full response is logged anywhere!

@matthewcarbone
Copy link
Author

@aymeric-roucher it also occurred to me we can turn the other attributes (last_input_token_count and last_output_token_count) into properties, as those results are already contained in the new raw_results attribute.

Let me know if you're on board, or if you feel I'm missing something!

@aymeric-roucher aymeric-roucher mentioned this pull request Jan 22, 2025
@matthewcarbone
Copy link
Author

@aymeric-roucher moving the discussion back here so as to not bog down the separate discussion in #270 😄

You're right @matthewcarbone ! Then we also need a good json serializer with different levels to be able to store logs with the option to go to high detail or lower level detail.

I certainly agree, but I think the first step is to store everything at "maximum fidelity". The issue of serializing at different levels of fidelity could be challenging since the response format of every LLM API could be different, right?

How do you want to proceed here? I can try to modify this PR accordingly, and store things at the agent level, but that might be difficult to do without breaking changes at this stage. Thoughts?

@matthewcarbone
Copy link
Author

@aymeric-roucher not trying to poke too much but this repo is being developed at breakneck pace, I just don't want this to get lost 😄

Any further thoughts on this? I think it would probably be easiest to store things at the model.py level for now, since storing one level higher would probably require some breaking changes.

@clefourrier
Copy link
Member

Hey @matthewcarbone , I'm working on improving our logging system (to have a separate logger etc) and including your ideas of "storing everything"

@matthewcarbone
Copy link
Author

@clefourrier is there anything I can do to contribute? I'm strongly interested in learning the system here.

Also feel free to close this PR if you feel it conflicts with the changes you're going to make. No need for it to clog up the open PRs list 👍

@clefourrier
Copy link
Member

I'll probably ask you to take a look once it feels nice if you've got the bandwidth :)

@matthewcarbone
Copy link
Author

Sure sounds good to me! Feel free to close this if you wish. If there's currently an open PR or issue discussing this please do point me to it!

@clefourrier
Copy link
Member

It's here but still a wip

@albertvillanova
Copy link
Member

It seems the discussion is now in:

I'm closing this PR in favor of that. Please, feel free to reopen if you think this could add something different. And thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants