Automatically Feed Artifacts In Task Memory To LLM #1432

shhlife · 2024-12-12T16:26:19Z

I have read and agree to the contributing guidelines.

Describe the bug
I'm trying to use the FileManagerTool with an agent and telling it to load an image file. The image file exists, and I can load it with a ImageLoader, but I'd like to use the FileManagerTool to be a bit more flexible and give the agent the ability to load files as it needs.

When I tell it to load the file I get the following message:

[12/13/24 05:24:37] INFO     ToolkitTask d27fb0c70c214443a9387ab5d85420a6
                             Input: Describe this file: images/banana_felt.png
[12/13/24 05:24:38] INFO     ToolkitTask d27fb0c70c214443a9387ab5d85420a6
                             Output: I am sorry, I cannot describe the image as I do not have the ability to work with  
                             image files. However, I can load the file if you want me to.

The FileManagerTool does have the ability to load other types of files besides text:

    loaders: dict[str, loaders.BaseLoader] = field(
        default=Factory(
            lambda self: {
                "application/pdf": loaders.PdfLoader(file_manager_driver=self.file_manager_driver),
                "text/csv": loaders.CsvLoader(file_manager_driver=self.file_manager_driver),
                "text": loaders.TextLoader(file_manager_driver=self.file_manager_driver),
                "image": loaders.ImageLoader(file_manager_driver=self.file_manager_driver),
                "application/octet-stream": BlobLoader(file_manager_driver=self.file_manager_driver),
            },
            takes_self=True,
        ),

To Reproduce

agent = Agent(
    prompt_driver=GooglePromptDriver(
        api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp"
    ),
    stream=True,
    tools=[FileManagerTool()],
)


agent.run("Describe this file: images/banana_felt.png")

Expected behavior
I expect the agent to be able to load and interpret files based on the capabilities of the model.

The text was updated successfully, but these errors were encountered:

collindutter · 2024-12-12T17:07:39Z

This is working as intended given the current implementation. You would need to use Task Memory (FileManagerTool(off_prompt=True)) and ImageQueryTool(off_prompt=False) (or future FileQueryTool).

I'm going to update this issue as an enhancement to make this possible without a secondary query tool.

collindutter · 2024-12-12T17:08:57Z

For future: this might be relevant to some of the Meta Memory refactors. CC @vasinov

collindutter · 2024-12-12T19:27:02Z

Oh wait...we do support this! But only some models support it. Last I checked Claude was the only one who could take an image input from a Tool. This works for me:

from griptape.drivers import AnthropicPromptDriver
from griptape.structures import Agent
from griptape.tools import FileManagerTool

agent = Agent(
    prompt_driver=AnthropicPromptDriver(model="claude-3-5-sonnet-20240620"),
    stream=True,
    tools=[FileManagerTool()],
)


agent.run("Describe this file: assets/mountain.jpg")

I can look into gemini flash but this might be out of our hands.

collindutter · 2024-12-12T19:39:53Z

Neither OpenAi nor Gemini appear to support Images coming from Tools. Which means the best solution is to wire up the two steps with a Pipeline or Workflow. For instance:

from griptape.structures import Pipeline
from griptape.tasks import PromptTask, ToolTask
from griptape.tools import FileManagerTool

agent = Pipeline(
    tasks=[
        ToolTask(tool=FileManagerTool(), id="file"),
        PromptTask(lambda task: task.parent_outputs["file"]),
    ],
)


agent.run("Describe this file: assets/mountain.jpg")

shhlife · 2024-12-12T19:56:34Z

woah.. so.. this is how I could "chat" with it..

import os

from dotenv import load_dotenv
from griptape.drivers import GooglePromptDriver
from griptape.structures import Pipeline
from griptape.tasks import PromptTask, ToolTask
from griptape.tools import FileManagerTool
from griptape.utils import Chat

load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
agent = Pipeline(
    tasks=[
        ToolTask(tool=FileManagerTool(), id="file"),
        PromptTask(
            lambda task: task.parent_outputs["file"],
            prompt_driver=GooglePromptDriver(
                api_key=GOOGLE_API_KEY, model="gemini-2.0-flash-exp", stream=True
            ),
        ),
    ],
)


Chat(agent).start()

that.. actually kind of works.. I hadn't thought about chatting with a pipeline.

this is .. strange.. it obviously requires that any conversation I have with it involves looking up file information. Would switching ToolTask to ToolkitTask allow for a more natural conversation?

is there a better way?

collindutter · 2024-12-12T20:20:07Z

Yes, I think you might prefer this pattern.

shhlife · 2024-12-12T22:55:55Z

yeah, it's a better pattern - left a comment there about my lack of master lego builder status :)

collindutter · 2024-12-16T17:32:21Z

@shhlife can we close this issue? Sounds like there is still some work regarding making patterns more discoverable, but I'd prefer to make that a separate issue.

shhlife · 2024-12-16T18:31:29Z

That's so interesting that claude can take an image input from a tool, but openai can't.. but it does seem to be reading the image..

[12/17/24 07:29:27] INFO     Subtask ef18d3c8171e41ff91b074975a905370
                             Actions: [
                               {
                                 "tag": "call_o64ooCesRO2fzi6T9haZKyFP",
                                 "name": "FileManagerTool",
                                 "path": "load_files_from_disk",
                                 "input": {
                                   "values": {
                                     "paths": [
                                       "sample_files/user.png"
                                     ]
                                   }
                                 }
                               }
                             ]
                    INFO     Subtask ef18d3c8171e41ff91b074975a905370
                             Response: Image, format: png, size: 1182120 bytes
[12/17/24 07:29:28] INFO     ToolkitTask ce27e267d434487a8ccf3e7374576c0d
                             Output: The file `sample_files/user.png` is an image in PNG 
                             format with a size of 1,182,120 bytes.

It looks like the output is right - it read it.
so does it have to do with how we pass data back from subtasks?

collindutter · 2024-12-16T18:56:10Z

The only thing openai saw was Image, format: png, size: 1182120 bytes. When I update OpenAiChatPromptDriver to behave how AnthropicPromptDriver does, I get:

{'error': {'message': "Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user',
                             but this message with role 'tool' contains an image URL.", 'type': 'invalid_request_error', 'param': 'messages[3]', 'code': 'invalid_value'}}

More context on the topic: https://community.openai.com/t/returning-image-as-result-of-function-call-to-gpt-4-turbo/714903

shhlife · 2024-12-16T19:16:44Z

@collindutter that context was super helpful, thank you! Yeah, let's hope OpenAI allows images in tool calls, and also close this issue & make a separate issue about discoverability.

cheers!

shhlife added the bug label Dec 12, 2024

collindutter added enhancement and removed bug labels Dec 12, 2024

collindutter changed the title ~~The FileManagerTool doesn't appear to be loading image files properly~~ Automatically Feed Artifacts In Task Memory To LLM Dec 12, 2024

collindutter mentioned this issue Dec 12, 2024

Would like a FileQueryTool that can describe any file type #1431

Closed

1 task

shhlife closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically Feed Artifacts In Task Memory To LLM #1432

Automatically Feed Artifacts In Task Memory To LLM #1432

shhlife commented Dec 12, 2024

collindutter commented Dec 12, 2024

collindutter commented Dec 12, 2024

collindutter commented Dec 12, 2024 •

edited

Loading

collindutter commented Dec 12, 2024

shhlife commented Dec 12, 2024

collindutter commented Dec 12, 2024

shhlife commented Dec 12, 2024

collindutter commented Dec 16, 2024

shhlife commented Dec 16, 2024

collindutter commented Dec 16, 2024 •

edited

Loading

shhlife commented Dec 16, 2024

Automatically Feed Artifacts In Task Memory To LLM #1432

Automatically Feed Artifacts In Task Memory To LLM #1432

Comments

shhlife commented Dec 12, 2024

collindutter commented Dec 12, 2024

collindutter commented Dec 12, 2024

collindutter commented Dec 12, 2024 • edited Loading

collindutter commented Dec 12, 2024

shhlife commented Dec 12, 2024

collindutter commented Dec 12, 2024

shhlife commented Dec 12, 2024

collindutter commented Dec 16, 2024

shhlife commented Dec 16, 2024

collindutter commented Dec 16, 2024 • edited Loading

shhlife commented Dec 16, 2024

collindutter commented Dec 12, 2024 •

edited

Loading

collindutter commented Dec 16, 2024 •

edited

Loading