Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow flatten_messages_as_text to be pushed via LiteLLMModel to fix Ollama integration #406

Merged
merged 2 commits into from
Feb 3, 2025

Conversation

sysradium
Copy link
Contributor

@sysradium sysradium commented Jan 28, 2025

If a structured message is sent to ollama using:

model = LiteLLMModel(
    model_id="ollama_chat/mistral"
    api_base="http://127.0.0.1:11434",
    num_ctx=8192,
)

I receive the following error:

Error in generating model output:
litellm.APIConnectionError: Ollama_chatException - Client error '400 Bad Request' for url 'http://localhost:11434/api/chat'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

The problem is that by default a structured message will be sent, whilst ollama expects it to be flat.
So to fix that I decided to utilise the existing flattening of messages:

model = LiteLLMModel(
    model_id="ollama_chat/mistral"
    api_base="http://127.0.0.1:11434",
    num_ctx=8192,
    flatten_messages_as_text=True, # now you can pass this if using ollama
)

Maybe not the best solution, but worked for me.

@sysradium sysradium marked this pull request as ready for review January 28, 2025 23:22
@SixK
Copy link

SixK commented Jan 29, 2025

Tested the fix and seem's ok.
Thank's for the fix

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow it seems to me a bit odd to add as a class attribute. this arg and the utils about this arg was made because LLM and VLM chat templates are different in transformers and this arg indicates to format them to pass to model.
anyhow if this completely fixes for both LLM and VLM I'd rather not make this class attribute but rather just pass True anyway.

but it's up to @aymeric-roucher to decide

@sysradium
Copy link
Contributor Author

@merveenoyan I was hoping it can be injected via kwargs here: https://github.com/huggingface/smolagents/blob/main/src/smolagents/models.py#L683
But there was no way to manipulate them.

I don't know why I did not make and private (self._ flatten_messages_as_text), but even if I do, I agree, that it is a bit awkward. But I wasn't sure if hardcoding True value does not break anything.

@aymeric-roucher
Copy link
Collaborator

aymeric-roucher commented Jan 29, 2025

The key distrinction here and reason for errors is that VLMs need a list in "content" (like [{"type": "text", "text": "Hello!"}, {"type": "image", "image": base64_image}]) while LLMs need a string in "content".

We solved this issue for TransformersModel by auto-detecting the model type, by checking if it had a .processor attribute. Then if the model was a LLM (self.is_vlm is False), parameter flatten_messages_as_text would be always set to True.

Maybe there could be a mean to do this as well in Ollama models ? Maybe actually we could even perform this detection by checking if the Transformers version of the model has a tokenizer or a processor attribute. but it's heavy to load the Transformers model, so not sure if we can easily check the existence of a processor.

@sysradium sysradium force-pushed the fix-ollama-integration branch from 59bb664 to 4b54dfa Compare January 29, 2025 18:17
@sysradium
Copy link
Contributor Author

sysradium commented Jan 29, 2025

@aymeric-roucher I think it might be a bit too error-prone if add a detection of this sort. At least I don't see a reliable way of doing it :/

So either we can go with a separate Ollama model class (#368) or the hack we have in this PR.

@RolandJAAI
Copy link
Contributor

RolandJAAI commented Jan 29, 2025

@aymeric-roucher This fixes most ollama related issues for non-vision (tested e.g. #264, works), but using VLMs with ollama and actually passing images does not seem to work with either value of flatten_massages_as_text. I tested both endpoints with ollama_chat/llava and ollama/llava with the smolagents dev version and this fix, getting different errors depending on the endpoint and value of flatten_messages_as_text, but no success, while Claude works fine. @sysradium does it work for you with images?

Since the current released status is, ollama does not work at all without this fix (ollama_chat thows a 400, ollama/ works for messages but the tools fail), you could even do if "ollama" in self.model_id to decide to set self.flatten_messages_as_text=True to make the hack complete and prevent more issues being raised. Ollama model class might be a good idea to get rid of the hack next, or PR in LiteLLM.

@aymeric-roucher
Copy link
Collaborator

Thank you @RolandJAAI for the additional context: agree with the proposed solution: if "ollama" in self.model_id, set flatten_messages_as_text to True

@touseefahmed96
Copy link
Contributor

@aymeric-roucher so no need for separate Ollama model class (#368)?

@sysradium sysradium force-pushed the fix-ollama-integration branch from 4b54dfa to ef286ef Compare February 2, 2025 16:01
@sysradium
Copy link
Contributor Author

@RolandJAAI updated the PR to check for the model_id. Just decided to use startswith instead of in.

@merveenoyan
Copy link
Contributor

@RolandJAAI can you open a reproducible issue and assign to me, I'll look into it

@albertvillanova
Copy link
Member

albertvillanova commented Feb 3, 2025

Thinking about this issue and how it appeared for several models in the past, I would suggest:

  • propose some automatic fixes under the hood, so users don't need to worry about this issue
    • as we did for vision/text transformers models
    • as we could also do for ollama models now
  • but at the same time, always allow users to customize the flatten_message_as_text variable
    • this will be helpful for other models in the future that might be problematic
    • the users always have a workaround (setting its value manually), instead of having to wait for the next smolagents patch release

What do you think?

@sysradium
Copy link
Contributor Author

sysradium commented Feb 3, 2025

@albertvillanova the combined option could have been:

def __init__( # ...
_flatten_messages_as_text: Optional[bool] = None,
):
    self._flatten_messages_as_text =  _flatten_messages_as_text if _flatten_messages_as_text is not None else self.model_id.startswith("ollama")

def __call__(
    // ...
    **kwargs,
) -> ChatMessage:
    import litellm
    completion_kwargs = self._prepare_completion_kwargs(
        messages=messages,
        # ...
        convert_images_to_image_urls=True,
        flatten_messages_as_text=self._flatten_messages_as_text,
        custom_role_conversions=self.custom_role_conversions,
        **kwargs,
    )

That would let the user to override, otherwise autodetection is used based on the model_id. This can be extracted into a function so that users can override the class (using a property of a function call:

def __call__(...) -> ChatMessage:
    import litellm
    completion_kwargs = self._prepare_completion_kwargs(
        messages=messages,
        # ...
        convert_images_to_image_urls=True,
        flatten_messages_as_text=self._flatten_messages_as_text,
        **kwargs,
    )

@property
def _flatten_messages_as_text(self) -> bool {
    return self._force_message_flatenning if self._force_message_flatenning is not None else self.model_id.startswith("ollama")

@touseefahmed96
Copy link
Contributor

touseefahmed96 commented Feb 3, 2025

@albertvillanova Separating the OllamaModel class isn’t a good option? #368

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this. Let's merge it as is for a patch release and leave a more robust solution for a subsequent PR.

@albertvillanova albertvillanova merged commit 6d72ea7 into huggingface:main Feb 3, 2025
3 checks passed
@albertvillanova
Copy link
Member

@touseefahmed96, in principle, as a general rule, I would say a dedicated class might be necessary only for different model frameworks, and not for each model ID, so that we avoid having too many model classes. But let's continue this discussion in your PR!

@albertvillanova albertvillanova linked an issue Feb 3, 2025 that may be closed by this pull request
@tolkienist42
Copy link

Just updated to 1.6.0

With

Pip install smolagents --upgrade

Also updated litellm same way.
And updated to latest ollama.

Now if I use ollama_chat it will initially connect but all the functionality is worked and it throws a bunch of errors. In the end the only way I can get it to work is to get rid of the '_chat' part.

So basically, now it's a little bit improved but I wouldn't consider this bug fixed? Unless I am doing something super wrong.

@SixK
Copy link

SixK commented Feb 4, 2025

1.6.0 is more than a week old.
Latest version 1.7.0 has been released 4 days ago.
Fix is 17hours old, so it may be released in 1.8.0.

@sysradium
Copy link
Contributor Author

sysradium commented Feb 4, 2025

@tolkienist42 my bugfix hasn't made it to the release yet: v1.7.0...6d72ea7

@sysradium sysradium deleted the fix-ollama-integration branch February 4, 2025 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v1.5.0 not working with ollama
8 participants