Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

intellectronica · 2024-11-21T14:56:59Z

See https://github.com/intellectronica/pydantic-ai-experiments/blob/main/scratch.ipynb

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/_result.py", line 189, in validate
    result = self.type_adapter.validate_json(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic/type_adapter.py", line 425, in validate_json
    return self.validator.validate_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Question
reflection
  Field required [type=missing, input_value={'_': {'reflection': "The...n': 'Is it an animal?'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
question
  Field required [type=missing, input_value={'_': {'reflection': "The...n': 'Is it an animal?'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 654, in _handle_model_response
    result_data = result_tool.validate(call)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/_result.py", line 203, in validate
    raise ToolRetryError(m) from e
pydantic_ai._result.ToolRetryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 181, in run
    either = await self._handle_model_response(model_response, deps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 657, in _handle_model_response
    self._incr_result_retry()
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 751, in _incr_result_retry
    raise exceptions.UnexpectedModelBehavior(
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (1) for result validation

See the prefixed "_" :? It's not there on earlier calls. Possibly a hallucination.

I think this can be avoided with strict mode. Would be great to have it as an option for OpenAI calls.

The text was updated successfully, but these errors were encountered:

intellectronica · 2024-11-21T14:57:25Z

@samuelcolvin ^^^^^

samuelcolvin · 2024-11-21T18:28:09Z

weird, not sure what's going on, I ran your code, and it worked first time.

I ran it directly as a script, and it worked fine:

from enum import Enum
from textwrap import dedent
from typing import List

from pydantic import BaseModel, Field
from pydantic_ai import Agent, CallContext


class Question(BaseModel):
    reflection: str = Field(..., description='Considering the questions and answers so far, what are things we can ask next?')
    question: str = Field(..., description='The question to ask the other player')


asking_agent = Agent('openai:gpt-4o', result_type=Question)


@asking_agent.system_prompt
async def asking_agent_system_prompt(ctx: CallContext[List]) -> str:
    turns = ctx.deps
    prompt = dedent(f"""
        You are playing a game of 20 questions.
        You are trying to guess the object the other player is thinking of.
        In each turn, you can ask a yes or no question.
        The other player will answer with "yes", "no".
    """).strip()
    if len(turns) > 0:
        prompt += f"\nHere are the questions you have asked so far and the answers you have received:\n"
        prompt += '\n'.join([' * ' + turn for turn in turns])
    return prompt


class Answer(str, Enum):
    YES = 'yes'
    NO = 'no'
    YOU_WIN = 'you win'


class AnswerResponse(BaseModel):
    reflection: str = Field(..., description=(
        'Considering the question, what is the answer? '
        'Is it "yes" or "no"? Or did they guess the '
        'object and the answer is "you win"?'))
    answer: Answer = Field(..., description='The answer to the question - "yes", "no", or "you win"')


ansering_agent = Agent('openai:gpt-4o', result_type=AnswerResponse)


@ansering_agent.system_prompt
async def answering_agent_system_prompt(ctx: CallContext[str]) -> str:
    prompt = dedent(f"""
        You are playing a game of 20 questions.
        The other player is trying to guess the object you are thinking of.
        The object you are thinking of is: {ctx.deps}.
        Answer with "yes" or "no", or "you win" if the other player has guessed the object.
    """).strip()
    return prompt


def twenty_questions(mytery_object):
    turns = []
    while True:
        question = asking_agent.run_sync('Ask the next question', deps=turns).data.question
        answer = ansering_agent.run_sync(question, deps=mytery_object).data.answer.value
        if answer == Answer.YOU_WIN:
            print('You Win!')
            break
        elif len(turns) >= 20:
            print('You Lose!')
            break
        else:
            turns.append(f'{question} - {answer}')
            print(f'{len(turns)}. QUESTION: {question}\nANSWER: {answer}\n')

twenty_questions('a cat')

output:

1. QUESTION: Is it something commonly found indoors?
ANSWER: yes

2. QUESTION: Does it use electricity?
ANSWER: no

3. QUESTION: Is it used for storage?
ANSWER: no

4. QUESTION: Is it used for entertainment purposes?
ANSWER: no

5. QUESTION: Is it used for cleaning?
ANSWER: no

6. QUESTION: Is it a piece of furniture?
ANSWER: no

7. QUESTION: Is it used for writing or drawing?
ANSWER: no

8. QUESTION: Is it used for personal grooming or hygiene?
ANSWER: no

9. QUESTION: Is it used in the kitchen?
ANSWER: no

10. QUESTION: Is it related to health or safety?
ANSWER: no

11. QUESTION: Is it used for decoration?
ANSWER: no

12. QUESTION: Is it used for organizing?
ANSWER: no

13. QUESTION: Is it used for communication?
ANSWER: no

14. QUESTION: Is it used for comfort or relaxation?
ANSWER: yes

15. QUESTION: Is it something you can wear indoors?
ANSWER: no

16. QUESTION: Is it something you can sit or lie on?
ANSWER: no

17. QUESTION: Is it something you can hold or carry? 
ANSWER: yes

18. QUESTION: Is it a textile item like a pillow or a blanket?
ANSWER: no

19. QUESTION: Is it used to provide warmth?
ANSWER: no

20. QUESTION: Is it something you use to hold or support things?
ANSWER: no

You Lose!

intellectronica · 2024-11-21T18:48:13Z

Yes, it also works most of the time for me. Just not all the time. The LLMs are non-deterministic and sometimes they do weird things. My point is that it's good to be defensive, and strict mode is one way to do this.

…

On Thu, 21 Nov 2024 at 19:28, Samuel Colvin ***@***.***> wrote: weird, not sure what's going on, I ran your code, and it worked first time. I ran it directly as a script, and it worked fine: from enum import Enumfrom textwrap import dedentfrom typing import List from pydantic import BaseModel, Fieldfrom pydantic_ai import Agent, CallContext class Question(BaseModel): reflection: str = Field(..., description='Considering the questions and answers so far, what are things we can ask next?') question: str = Field(..., description='The question to ask the other player') asking_agent = Agent('openai:gpt-4o', result_type=Question) @asking_agent.system_promptasync def asking_agent_system_prompt(ctx: CallContext[List]) -> str: turns = ctx.deps prompt = dedent(f""" You are playing a game of 20 questions. You are trying to guess the object the other player is thinking of. In each turn, you can ask a yes or no question. The other player will answer with "yes", "no". """).strip() if len(turns) > 0: prompt += f"\nHere are the questions you have asked so far and the answers you have received:\n" prompt += '\n'.join([' * ' + turn for turn in turns]) return prompt class Answer(str, Enum): YES = 'yes' NO = 'no' YOU_WIN = 'you win' class AnswerResponse(BaseModel): reflection: str = Field(..., description=( 'Considering the question, what is the answer? ' 'Is it "yes" or "no"? Or did they guess the ' 'object and the answer is "you win"?')) answer: Answer = Field(..., description='The answer to the question - "yes", "no", or "you win"') ansering_agent = Agent('openai:gpt-4o', result_type=AnswerResponse) @ansering_agent.system_promptasync def answering_agent_system_prompt(ctx: CallContext[str]) -> str: prompt = dedent(f""" You are playing a game of 20 questions. The other player is trying to guess the object you are thinking of. The object you are thinking of is: {ctx.deps}. Answer with "yes" or "no", or "you win" if the other player has guessed the object. """).strip() return prompt def twenty_questions(mytery_object): turns = [] while True: question = asking_agent.run_sync('Ask the next question', deps=turns).data.question answer = ansering_agent.run_sync(question, deps=mytery_object).data.answer.value if answer == Answer.YOU_WIN: print('You Win!') break elif len(turns) >= 20: print('You Lose!') break else: turns.append(f'{question} - {answer}') print(f'{len(turns)}. QUESTION: {question}\nANSWER: {answer}\n') twenty_questions('a cat') output: 1. QUESTION: Is it something commonly found indoors? ANSWER: yes 2. QUESTION: Does it use electricity? ANSWER: no 3. QUESTION: Is it used for storage? ANSWER: no 4. QUESTION: Is it used for entertainment purposes? ANSWER: no 5. QUESTION: Is it used for cleaning? ANSWER: no 6. QUESTION: Is it a piece of furniture? ANSWER: no 7. QUESTION: Is it used for writing or drawing? ANSWER: no 8. QUESTION: Is it used for personal grooming or hygiene? ANSWER: no 9. QUESTION: Is it used in the kitchen? ANSWER: no 10. QUESTION: Is it related to health or safety? ANSWER: no 11. QUESTION: Is it used for decoration? ANSWER: no 12. QUESTION: Is it used for organizing? ANSWER: no 13. QUESTION: Is it used for communication? ANSWER: no 14. QUESTION: Is it used for comfort or relaxation? ANSWER: yes 15. QUESTION: Is it something you can wear indoors? ANSWER: no 16. QUESTION: Is it something you can sit or lie on? ANSWER: no 17. QUESTION: Is it something you can hold or carry? ANSWER: yes 18. QUESTION: Is it a textile item like a pillow or a blanket? ANSWER: no 19. QUESTION: Is it used to provide warmth? ANSWER: no 20. QUESTION: Is it something you use to hold or support things? ANSWER: no You Lose! — Reply to this email directly, view it on GitHub <#81 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAALJOCW454F26AQQMIUAFT2BYQ47BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE4DCOBTGMYTCOJYQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGY3TSOBRGM2TGM5HORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you authored the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

samuelcolvin · 2024-11-21T19:24:57Z

Thanks, yup I'll look into it.

samuelcolvin · 2024-12-18T15:50:30Z

@sydney-runkle we should add strict to model settings.

sydney-runkle · 2024-12-20T16:09:51Z

@samuelcolvin, strict in what sense? Like for pydantic validation?

intellectronica · 2024-12-20T16:52:14Z

https://platform.openai.com/docs/guides/function-calling#structured-outputs IIUC The OpenAI API, when strict=True for a function call, will constrain the generated output so that it is guaranteed to agree with the schema.

…

On Fri, 20 Dec 2024 at 17:10, Sydney Runkle ***@***.***> wrote: @samuelcolvin <https://github.com/samuelcolvin>, strict in what sense? Like for pydantic validation? — Reply to this email directly, view it on GitHub <#81 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAALJOBC4C7TCHRDDC3I2GT2GQ6ONBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE4DCOBTGMYTCOJYQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGY3TSOBRGM2TGM5HORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

intellectronica · 2024-12-25T22:19:23Z

Here's how I make it work, with a subclass of OpenAIModel and redefinition of the static method that prepares the function calls: https://gist.github.com/intellectronica/9b190aca94bf4372c4b08e8b016922ec

Not sure how this could be passed via model_settings, since they're not available at this point.

dmontagu · 2025-01-02T22:56:00Z

We could change the relevant method from being a staticmethod to being an instance method which has access to the settings. Probably makes sense to do that either way for the sake of overriding.

That said, I'll note that strict mode has some unfortunate limitations — at the very least, it seems to be incompatible with types that have additionalProperties in the JSON schema, such as dict[str, int], even when it isn't at the top level.

Because of that, I don't think we can use strict mode by default, though I do think it should be possible to plumb it through as an opt-in parameter if you don't need/want to use JSON schemas with additionalProperties.

Would it work for you to turn it on always (via model configuration)? If not, it may be possible to make this work on a per-tool basis, though in that case there would be some awkward API decisions to be made. (I.e., do we add a strict option to Tool, despite not necessarily being a "standard" thing across models?)

(I'll also note that we explicitly confirmed that you can get reasonable responses from OpenAI making use of JSON schemas with additionalProperties, at least for appropriate prompts, despite getting an error message about it when you turn strict mode on.)

intellectronica · 2025-01-03T08:01:49Z

Agree that strict shouldn't be the default.
I think it's worth having it available as a setting for Agent, Tool, and Run, though, for models that support it. In my experience it can provide meaningful improvements in reliability, resulting in code that is focused on the task, rather than on guardrails, checks, and retries. In my own work and when advising other developers I tend to start without strict but introduce it (including making the necessary changes to the structure) if generating the desired structure proves unreliable. It is also useful in very small projects where you're not going to invest in comprehensive evals and just want to get a result you can rely on.
Given that it's quite easy to get it to work with just a little bit of custom code I don't think having it as a built-in setting is high priority.

dmontagu · 2025-01-03T16:27:56Z

I think it's a reasonable idea to add strict as a field on ModelSettings (which will work for Agent and Agent.run) and as a keyword argument to Agent.tool.

Note that we'll probably need to add some way to do error handling if we get an error response from the model due to it failing (server-side) to generate compatible JSON. I'm not sure what such errors will look like in general, and obviously it's likely to be somewhat model-specific. (Potentially closely related, we should add better error handling of "refusal" responses.) I don't think that necessarily needs to be a requirement in an initial implementation of this functionality though.

PR welcome, or we'll get to it eventually.

sydney-runkle added the bug Something isn't working label Dec 5, 2024

samuelcolvin removed the bug Something isn't working label Dec 18, 2024

samuelcolvin mentioned this issue Jan 2, 2025

Use OpenAI's Structured Outputs feature to prevent validation errors #514

Closed

samuelcolvin mentioned this issue Jan 16, 2025

Question: Regarding Structured Output Strategy - How does it compare to other libraries? #660

Closed

sydney-runkle added the model settings label Jan 24, 2025

sydney-runkle self-assigned this Jan 24, 2025

maxschulz-COL mentioned this issue Feb 13, 2025

[Feature] Bump VizroAI to pydantic V2 mckinsey/vizro#1018

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

intellectronica commented Nov 21, 2024

intellectronica commented Nov 21, 2024

samuelcolvin commented Nov 21, 2024

intellectronica commented Nov 21, 2024 via email

samuelcolvin commented Nov 21, 2024

samuelcolvin commented Dec 18, 2024

sydney-runkle commented Dec 20, 2024

intellectronica commented Dec 20, 2024 via email

intellectronica commented Dec 25, 2024

dmontagu commented Jan 2, 2025

intellectronica commented Jan 3, 2025

dmontagu commented Jan 3, 2025 •

edited

Loading

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Comments

intellectronica commented Nov 21, 2024

intellectronica commented Nov 21, 2024

samuelcolvin commented Nov 21, 2024

intellectronica commented Nov 21, 2024 via email

samuelcolvin commented Nov 21, 2024

samuelcolvin commented Dec 18, 2024

sydney-runkle commented Dec 20, 2024

intellectronica commented Dec 20, 2024 via email

intellectronica commented Dec 25, 2024

dmontagu commented Jan 2, 2025

intellectronica commented Jan 3, 2025

dmontagu commented Jan 3, 2025 • edited Loading

dmontagu commented Jan 3, 2025 •

edited

Loading