Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding support for Q&A Agents and environments #6

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

danikhan632
Copy link

Motivation for this:
I want the ability to finetune LLMs on more than just a discrete action space such as Blackjack or other games with a discrete action space. So a few additions were made.

  • llm_eval.py: since the actions space is basically just text, we need an actual reward system. I've seen similar implementations to this using keyword however I settled on using another (larger)LLM , giving it the task, agent's current state and the goal state and asking it whether the agent is getting closer to the goal state and feedback to the agent. OpenAI function calling/gbnf is used to create structured JSON to generate a numeric reward for a given state
  • critic_server: this is a llama.cpp server to fill the role as previously mentioned. It uses llama grammars to reliably generate JSON similar to fuction calling. Would recommend using a Larger Model to be the critic. This whole thing is fully optional though and by default OAI API will be used.
    -(WIP) QA Agent & Env: this is agent that will use the llm_eval, its given question from orca-math 200k and asked to solve them

In Progress things:
Also working on the ability from transformer lib LLMs to produce structured JSON similar to llama.cpp. For example, someone could make a strategy game and a gym env and the agent could reliablity place a JSON format to play they game without having to resorting to truncating outputs and all that

Would love to get feed back on all of this

@taliu02
Copy link

taliu02 commented Mar 20, 2024

I think this has legitimate uses cases but there's alot of improvements that need to be made especially for the observation space which does seems to work. You could trying using a text observation space but not sure if that would be sampled properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants