Save raw responses for base models #284

matthewcarbone · 2025-01-21T04:21:29Z

Currently, it seems that the full response metadata is not saved anywhere in the agent logs or in the model. I could be missing something, but from the way response is parsed here, it would seem that all information e.g. content filter results is lost during the calls. I would very much like to access this information as an extra safety layer. In addition, I think it would probably be good to allow users to access this, at least at the model level.

I've implemented this very simple change, which does seem to work in my local testing! Really nothing too crazy, and I'm happy to modify the PR in whatever way makes sense if the maintainers feel it needs additions/changes. Now, you can access the model's raw responses via model.raw_responses. I think it might make more sense to implement at the agent level, but I figure it's a start.

Added raw_responses attribute to the smolagents models. This list is appended at every call with the response object of the models.

aymeric-roucher · 2025-01-21T10:44:19Z

@matthewcarbone the LLM outputs are already all saved at the agent level under ActionStep.llm_output! Indeed storing them at the agent level as we do makes more sense IMO.

matthewcarbone · 2025-01-21T11:40:27Z

@aymeric-roucher checking llm_output gives me None in my case, which I've put below for convenience:

model = AzureOpenAIServerModel(
    model_id = ...,
    api_key=api_key,
    api_version=...,
    azure_endpoint=base_url,
)
agent = MultiStepAgent(tools=[], model=model, add_base_tools=False, max_steps=1)
result = agent.run("Why is the sky blue?", )

(see #282). agent.logs[-1].llm_output gives None in that case.

Also, I'm not specifically talking about the text output, I'm talking about the metadata from the call, which in my case, is accessed via response.response_metadata and looks something like this:

{'token_usage': {'completion_tokens': 193,
  'prompt_tokens': 4903,
  'total_tokens': 5096,
  'completion_tokens_details': None,
  'prompt_tokens_details': None},
 'model_name': ...,
 'system_fingerprint': ...,
 'prompt_filter_results': [{'prompt_index': 0,
   'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'},
    'jailbreak': {'filtered': False, 'detected': False},
    'self_harm': {'filtered': False, 'severity': 'safe'},
    'sexual': {'filtered': False, 'severity': 'safe'},
    'violence': {'filtered': False, 'severity': 'safe'}}}],
 'finish_reason': 'stop',
 'logprobs': None,
 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'},
  'protected_material_code': {'filtered': False, 'detected': False},
  'protected_material_text': {'filtered': False, 'detected': False},
  'self_harm': {'filtered': False, 'severity': 'safe'},
  'sexual': {'filtered': False, 'severity': 'safe'},
  'violence': {'filtered': False, 'severity': 'safe'}}}

It does not appear that the full response is logged anywhere!

matthewcarbone · 2025-01-21T14:10:25Z

@aymeric-roucher it also occurred to me we can turn the other attributes (last_input_token_count and last_output_token_count) into properties, as those results are already contained in the new raw_results attribute.

Let me know if you're on board, or if you feel I'm missing something!

matthewcarbone · 2025-01-22T14:26:45Z

@aymeric-roucher moving the discussion back here so as to not bog down the separate discussion in #270 😄

You're right @matthewcarbone ! Then we also need a good json serializer with different levels to be able to store logs with the option to go to high detail or lower level detail.

I certainly agree, but I think the first step is to store everything at "maximum fidelity". The issue of serializing at different levels of fidelity could be challenging since the response format of every LLM API could be different, right?

How do you want to proceed here? I can try to modify this PR accordingly, and store things at the agent level, but that might be difficult to do without breaking changes at this stage. Thoughts?

matthewcarbone · 2025-01-23T12:12:10Z

@aymeric-roucher not trying to poke too much but this repo is being developed at breakneck pace, I just don't want this to get lost 😄

Any further thoughts on this? I think it would probably be easiest to store things at the model.py level for now, since storing one level higher would probably require some breaking changes.

clefourrier · 2025-01-23T13:40:59Z

Hey @matthewcarbone , I'm working on improving our logging system (to have a separate logger etc) and including your ideas of "storing everything"

matthewcarbone · 2025-01-23T13:47:05Z

@clefourrier is there anything I can do to contribute? I'm strongly interested in learning the system here.

Also feel free to close this PR if you feel it conflicts with the changes you're going to make. No need for it to clog up the open PRs list 👍

clefourrier · 2025-01-23T14:29:07Z

I'll probably ask you to take a look once it feels nice if you've got the bandwidth :)

matthewcarbone · 2025-01-23T14:33:11Z

Sure sounds good to me! Feel free to close this if you wish. If there's currently an open PR or issue discussing this please do point me to it!

clefourrier · 2025-01-23T18:17:21Z

It's here but still a wip

albertvillanova · 2025-01-24T09:27:00Z

It seems the discussion is now in:

Untangling Logging #316

I'm closing this PR in favor of that. Please, feel free to reopen if you think this could add something different. And thank you!

Save raw responses for base models

9f17704

Added raw_responses attribute to the smolagents models. This list is appended at every call with the response object of the models.

aymeric-roucher mentioned this pull request Jan 22, 2025

Pr 270 #305

Closed

matthewcarbone mentioned this pull request Jan 23, 2025

How to capture CodeAgent's full thinking including the code, not just the final response into a variable #322

Open

matthewcarbone mentioned this pull request Jan 23, 2025

Untangling Logging #316

Merged

albertvillanova closed this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save raw responses for base models #284

Save raw responses for base models #284

matthewcarbone commented Jan 21, 2025

aymeric-roucher commented Jan 21, 2025

matthewcarbone commented Jan 21, 2025 •

edited

Loading

matthewcarbone commented Jan 21, 2025

matthewcarbone commented Jan 22, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

albertvillanova commented Jan 24, 2025

Save raw responses for base models #284

Save raw responses for base models #284

Conversation

matthewcarbone commented Jan 21, 2025

aymeric-roucher commented Jan 21, 2025

matthewcarbone commented Jan 21, 2025 • edited Loading

matthewcarbone commented Jan 21, 2025

matthewcarbone commented Jan 22, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

matthewcarbone commented Jan 23, 2025

clefourrier commented Jan 23, 2025

albertvillanova commented Jan 24, 2025

matthewcarbone commented Jan 21, 2025 •

edited

Loading