Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically Feed Artifacts In Task Memory To LLM #1432

Closed
1 task done
shhlife opened this issue Dec 12, 2024 · 11 comments
Closed
1 task done

Automatically Feed Artifacts In Task Memory To LLM #1432

shhlife opened this issue Dec 12, 2024 · 11 comments

Comments

@shhlife
Copy link

shhlife commented Dec 12, 2024

Describe the bug
I'm trying to use the FileManagerTool with an agent and telling it to load an image file. The image file exists, and I can load it with a ImageLoader, but I'd like to use the FileManagerTool to be a bit more flexible and give the agent the ability to load files as it needs.

When I tell it to load the file I get the following message:

[12/13/24 05:24:37] INFO     ToolkitTask d27fb0c70c214443a9387ab5d85420a6
                             Input: Describe this file: images/banana_felt.png
[12/13/24 05:24:38] INFO     ToolkitTask d27fb0c70c214443a9387ab5d85420a6
                             Output: I am sorry, I cannot describe the image as I do not have the ability to work with  
                             image files. However, I can load the file if you want me to.

The FileManagerTool does have the ability to load other types of files besides text:

    loaders: dict[str, loaders.BaseLoader] = field(
        default=Factory(
            lambda self: {
                "application/pdf": loaders.PdfLoader(file_manager_driver=self.file_manager_driver),
                "text/csv": loaders.CsvLoader(file_manager_driver=self.file_manager_driver),
                "text": loaders.TextLoader(file_manager_driver=self.file_manager_driver),
                "image": loaders.ImageLoader(file_manager_driver=self.file_manager_driver),
                "application/octet-stream": BlobLoader(file_manager_driver=self.file_manager_driver),
            },
            takes_self=True,
        ),

To Reproduce

agent = Agent(
    prompt_driver=GooglePromptDriver(
        api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp"
    ),
    stream=True,
    tools=[FileManagerTool()],
)


agent.run("Describe this file: images/banana_felt.png")

Expected behavior
I expect the agent to be able to load and interpret files based on the capabilities of the model.

@shhlife shhlife added the bug label Dec 12, 2024
@collindutter
Copy link
Member

This is working as intended given the current implementation. You would need to use Task Memory (FileManagerTool(off_prompt=True)) and ImageQueryTool(off_prompt=False) (or future FileQueryTool).

I'm going to update this issue as an enhancement to make this possible without a secondary query tool.

@collindutter collindutter changed the title The FileManagerTool doesn't appear to be loading image files properly Automatically Feed Artifacts In Task Memory To LLM Dec 12, 2024
@collindutter
Copy link
Member

For future: this might be relevant to some of the Meta Memory refactors. CC @vasinov

@collindutter
Copy link
Member

collindutter commented Dec 12, 2024

Oh wait...we do support this! But only some models support it. Last I checked Claude was the only one who could take an image input from a Tool. This works for me:

from griptape.drivers import AnthropicPromptDriver
from griptape.structures import Agent
from griptape.tools import FileManagerTool

agent = Agent(
    prompt_driver=AnthropicPromptDriver(model="claude-3-5-sonnet-20240620"),
    stream=True,
    tools=[FileManagerTool()],
)


agent.run("Describe this file: assets/mountain.jpg")

I can look into gemini flash but this might be out of our hands.

@collindutter
Copy link
Member

Neither OpenAi nor Gemini appear to support Images coming from Tools. Which means the best solution is to wire up the two steps with a Pipeline or Workflow. For instance:

from griptape.structures import Pipeline
from griptape.tasks import PromptTask, ToolTask
from griptape.tools import FileManagerTool

agent = Pipeline(
    tasks=[
        ToolTask(tool=FileManagerTool(), id="file"),
        PromptTask(lambda task: task.parent_outputs["file"]),
    ],
)


agent.run("Describe this file: assets/mountain.jpg")

@shhlife
Copy link
Author

shhlife commented Dec 12, 2024

woah.. so.. this is how I could "chat" with it..

import os

from dotenv import load_dotenv
from griptape.drivers import GooglePromptDriver
from griptape.structures import Pipeline
from griptape.tasks import PromptTask, ToolTask
from griptape.tools import FileManagerTool
from griptape.utils import Chat

load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
agent = Pipeline(
    tasks=[
        ToolTask(tool=FileManagerTool(), id="file"),
        PromptTask(
            lambda task: task.parent_outputs["file"],
            prompt_driver=GooglePromptDriver(
                api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp", stream=True
            ),
        ),
    ],
)


Chat(agent).start()

that.. actually kind of works.. I hadn't thought about chatting with a pipeline.

this is .. strange.. it obviously requires that any conversation I have with it involves looking up file information. Would switching ToolTask to ToolkitTask allow for a more natural conversation?

is there a better way?

@collindutter
Copy link
Member

Yes, I think you might prefer this pattern.

@shhlife
Copy link
Author

shhlife commented Dec 12, 2024

yeah, it's a better pattern - left a comment there about my lack of master lego builder status :)

@collindutter
Copy link
Member

@shhlife can we close this issue? Sounds like there is still some work regarding making patterns more discoverable, but I'd prefer to make that a separate issue.

@shhlife
Copy link
Author

shhlife commented Dec 16, 2024

That's so interesting that claude can take an image input from a tool, but openai can't.. but it does seem to be reading the image..

[12/17/24 07:29:27] INFO     Subtask ef18d3c8171e41ff91b074975a905370
                             Actions: [
                               {
                                 "tag": "call_o64ooCesRO2fzi6T9haZKyFP",
                                 "name": "FileManagerTool",
                                 "path": "load_files_from_disk",
                                 "input": {
                                   "values": {
                                     "paths": [
                                       "sample_files/user.png"
                                     ]
                                   }
                                 }
                               }
                             ]
                    INFO     Subtask ef18d3c8171e41ff91b074975a905370
                             Response: Image, format: png, size: 1182120 bytes
[12/17/24 07:29:28] INFO     ToolkitTask ce27e267d434487a8ccf3e7374576c0d
                             Output: The file `sample_files/user.png` is an image in PNG 
                             format with a size of 1,182,120 bytes.

It looks like the output is right - it read it.
so does it have to do with how we pass data back from subtasks?

@collindutter
Copy link
Member

collindutter commented Dec 16, 2024

The only thing openai saw was Image, format: png, size: 1182120 bytes. When I update OpenAiChatPromptDriver to behave how AnthropicPromptDriver does, I get:

{'error': {'message': "Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user',
                             but this message with role 'tool' contains an image URL.", 'type': 'invalid_request_error', 'param': 'messages[3]', 'code': 'invalid_value'}}

More context on the topic: https://community.openai.com/t/returning-image-as-result-of-function-call-to-gpt-4-turbo/714903

@shhlife
Copy link
Author

shhlife commented Dec 16, 2024

@collindutter that context was super helpful, thank you! Yeah, let's hope OpenAI allows images in tool calls, and also close this issue & make a separate issue about discoverability.

cheers!

@shhlife shhlife closed this as completed Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants