[WIP] Adding support for Q&A Agents and environments #6

danikhan632 · 2024-03-19T17:34:28Z

Motivation for this:
I want the ability to finetune LLMs on more than just a discrete action space such as Blackjack or other games with a discrete action space. So a few additions were made.

llm_eval.py: since the actions space is basically just text, we need an actual reward system. I've seen similar implementations to this using keyword however I settled on using another (larger)LLM , giving it the task, agent's current state and the goal state and asking it whether the agent is getting closer to the goal state and feedback to the agent. OpenAI function calling/gbnf is used to create structured JSON to generate a numeric reward for a given state
critic_server: this is a llama.cpp server to fill the role as previously mentioned. It uses llama grammars to reliably generate JSON similar to fuction calling. Would recommend using a Larger Model to be the critic. This whole thing is fully optional though and by default OAI API will be used.
-(WIP) QA Agent & Env: this is agent that will use the llm_eval, its given question from orca-math 200k and asked to solve them

In Progress things:
Also working on the ability from transformer lib LLMs to produce structured JSON similar to llama.cpp. For example, someone could make a strategy game and a gym env and the agent could reliablity place a JSON format to play they game without having to resorting to truncating outputs and all that

Would love to get feed back on all of this

taliu02 · 2024-03-20T17:11:00Z

I think this has legitimate uses cases but there's alot of improvements that need to be made especially for the observation space which does seems to work. You could trying using a text observation space but not sure if that would be sampled properly

WIP QA env

ad08a7e

danikhan632 and others added 2 commits November 24, 2024 20:43

added reasoning model

aeb6473

add CoT finetuning

9411e1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adding support for Q&A Agents and environments #6

[WIP] Adding support for Q&A Agents and environments #6

danikhan632 commented Mar 19, 2024

taliu02 commented Mar 20, 2024

[WIP] Adding support for Q&A Agents and environments #6

Are you sure you want to change the base?

[WIP] Adding support for Q&A Agents and environments #6

Conversation

danikhan632 commented Mar 19, 2024

taliu02 commented Mar 20, 2024