Releases: google-deepmind/onetwo
Releases · google-deepmind/onetwo
v0.2.1
- Backends
- Multimodal: Enable support for sending arbitrary content chunk types to
Gemini models, including video, audio, PDFs, vision embeddings, more image
formats, and other chunk types. - Formatting: Define a
ConcatFormatter
that simply concatenates the
contents of a list of chat messages, while ignoring the roles. When applied
to certain styles of prompt, this provides a way to reuse the same prompt
across both chat-style LLMs (usingformatter=formatting.FormatterName.API
)
and plain text LLMs (usingformatter=formatting.FormatterName.CONCAT
). - Automatic retry: Implement a generic retry mechanism for use with
arbitrary LLMs, which can be used, for example, to automatically retry upon
receipt of a rate-limiting error.
- Multimodal: Enable support for sending arbitrary content chunk types to
- Core
- Tracing: Apply comprehensive tracing to all calls to built-in methods
(generate_text
,chat
,instruct
,score_text
,select
, etc.) of the
standard LLM backends, so that we no longer depend on the use of Jinja
templates for tracing. - Chat: Improve support for chat operations throughout the OneTwo
codebase, including adding chat support to composables and ensuring that
caching works robustly for chat messages that contain multimodal content.
- Tracing: Apply comprehensive tracing to all calls to built-in methods
- Agents
- Error handling: Improve error handling in
PythonPlanningAgent
,
including providing a way to configure inside of eachTool
definition
which types of errors are recoverable or not, so as to surface any
tool-generated error messages to the LLM for potentially recoverable errors
to allow the LLM to retry with adjusted syntax, while automatically
terminating the agent quickly if an irrecoverable error occurs.
- Error handling: Improve error handling in
- Standard library
- Chat: Re-implement the standard components such as
ReActAgent
,
PythonPlanningAgent
, and chain-of-thought components to use chat
operations and to improve performance on the latest generations of
chat-tuned models. - Multimodal: Add support for multimodal inputs in
ReActAgent
and
PythonPlanningAgent
.
- Chat: Re-implement the standard components such as
- Evaluation
- LLM critic: Re-implement naive_evaluation_critic using chat operations
and with a parser that is robust to more diverse reply formats, including
reply formats commonly output by Gemini 1.5 models.
- LLM critic: Re-implement naive_evaluation_critic using chat operations
- Visualization
- Improve
HTMLRenderer
to ensure that strings are properly escaped before
rendering and to robustly handle a broader range of data types, including
graceful fallbacks for images and other large byte objects.
- Improve
- Documentation
- Update the tutorial colab to support the latest Gemini and OpenAI models
and to illustrate best practices for chat semantics and multimodal support.
Includes, among other things, new sections illustrating multimodalReAct
andPythonPlanning
agents.
- Update the tutorial colab to support the latest Gemini and OpenAI models
- Other
- Move commonly used utility functions (e.g., for cache management, etc.) from
the tutorial colab into acolab_utils
library to facilitate reuse in other
colabs. - Various bug fixes and incremental improvements to the
GeminiAPI
and
OpenAIAPI
backends, theVertexAIAPI
backend, multi-threading support,
and Jinja templates.
- Move commonly used utility functions (e.g., for cache management, etc.) from
v0.2.0
- Backends
- VertexAI: Add VertexAI chat support.
- Space healing: Add token/space healing options to builtin functions, including proper support for space healing in
llm.generate_text
andllm.chat
ofGeminiAPI
.
- Core
- Caching: Enable loading from multiple cache files, while merging the contents. This is useful, for example, when collaborating in a group, where each person can save to a personal cache file, while loading from both their own and ones from teammates.
- Retries: Implement a generic
with_retry
decorator that automatically retries a given function with exponential backoff when an exception occurs, and enable this for theGeminiAPI
andOpenAIAPI
backends.
- Standard library
- Chain-of-thought: Define a library of helper functions and data structures for implementing chain-of-thought [Wei, et al., 2023] strategies, including off-the-shelf implementations of several commonly-used approaches, and add a corresponding section to the tutorial colab. Variants illustrated include:
- Chain-of-thought implemented using a prompt template alone (w/2 calls).
- Chain-of-thought implemented using a prompt template (1 call) + answer parser.
- Few-shot chain-of-thought.
- Few-shot exemplars represented as data, so as to be reusable across different styles of prompt template.
- Few-shot chain-of-thought with different exemplars specified for each question (e.g., for dynamic exemplar selection).
- Self-consistency: Define a generic implementation of self-consistency [Wang, et al., 2023] and add a corresponding section to the tutorial colab. In this implementation, we reformulate self-consistency as a meta-strategy that wraps some underlying strategy that outputs a single answer (typically via some kind of reasoning path or other intermediate steps) and converts it into a strategy that outputs a marginal distribution over possible answers (marginalizing over the intermediate steps). The marginal distribution is estimated via repeated sampling from the underlying strategy. Supported variations include:
- Self-consistency over chain-of-thought (like in the original paper).
- Self-consistency over a multi-step prompting strategy (e.g., ReAct).
- Self-consistency over a multi-arg strategy (e.g., Retrieval QA).
- Self-consistency over diverse parameterizations of the underlying strategy (e.g., with samples taken using different choices of few-shot exemplars).
- Self-consistency over diverse underlying strategies.
- Self-consistency with answer normalization applied during bucketization.
- Self-consistency with weighted voting.
- Evaluation based on the consensus answer alone.
- Evaluation based on the full answer distribution (e.g., accuracy@k).
- Evaluation taking into account a representative reasoning path.
- Chain-of-thought: Define a library of helper functions and data structures for implementing chain-of-thought [Wei, et al., 2023] strategies, including off-the-shelf implementations of several commonly-used approaches, and add a corresponding section to the tutorial colab. Variants illustrated include:
- Evaluation
- Add a new
agent_evaluation
library, which is similar to the existingevaluation
library, but automatically packages the results of the evaluation run in a standardizedEvaluationSummary
object, with options to include detailed debugging information for each example. This can be used for evaluating arbitrary prompting strategies, but contains particular optimizations for agents. - Add library for writing an
EvaluationSummary
to disk.
- Add a new
- Visualization
- Update
HTMLRenderer
to support rendering ofEvaluationSummary
objects, to render structured Python objects in an expandable/collapsible form, and to allow specification of custom renderers for other data types.
- Update
- Documentation
- Add sections to the tutorial colab on chain-of-thought, self-consistency, and swapping backends.
- Other
- Various other bug fixes and incremental improvements to
VertexAIAPI
backend,ReActAgent
, caching, composables, and handling of multimodal content chunks.
- Various other bug fixes and incremental improvements to
v0.1.1
- Add support for Vertex AI models, on top of the Gemini, Gemma, and OpenAI models that were supported in the initial v0.1.0 elease.
- Add an
HTMLRenderer
library for rendering anExecutionResult
,ExperimentResult
, orExperimentSummary
as a block of HTML suitable for interactive display in colab. - Various bug fixes and incremental improvements to
ReActAgent
,PythonPlanningAgent
, tracing, and handling of multimodal content chunks.
v0.1.0
- Initial release including support for three kinds of models:
GeminiAPI
: remote connection to any of the models supported by the Gemini API.Gemma
: possibility to load and use open-weights Gemma models.OpenAIAPI
: remote connection to any of the models supported by the OpenAI API.
- OneTwo core includes support for asynchronous execution, batching, caching, tracing, sequential and parallel flows, prompting templating via Jinja2 or composables, and multimodal support.
- OneTwo standard library includes off-the-shelf implementations of two popular tool use strategies in the form of a
ReActAgent
and aPythonPlanningAgent
, along with aPythonSandbox
API, simple autorater critics, and a genericBeamSearch
implementation that can be composed with arbitrary underlying agents for producing multi-trajectory strategies such as Tree-of-Thoughts.