Skip to content

Commit 146da17

Browse files
committed
Added Documentation for Verbosity and Aggregation
1 parent 012800a commit 146da17

File tree

1 file changed

+25
-2
lines changed

1 file changed

+25
-2
lines changed

README.md

+25-2
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,49 @@ GroqEval is a powerful and easy-to-use evaluation framework designed specificall
1515
## Getting Started
1616

1717
Installation
18+
To install GroqEval, simply use pip:
1819
```bash
1920
pip install groqeval
2021
```
2122

2223
Initialising an evaluator.
24+
To begin using GroqEval, you need to initialize an evaluator with your API key:
2325
```python
2426
from groqeval import GroqEval
2527
evaluator = GroqEval(api_key=API_KEY)
2628
```
27-
The evaluator is the central orchestrator that initializes the metrics.
29+
The evaluator is the central component that orchestrates the initialization and execution of various metrics.
2830

31+
You can create metric instances with the evaluator. Here's the default behavior:
2932
```python
33+
# Default Behaviour
3034
metrics = evaluator(metric_name, **kwargs)
35+
36+
# Verbosity Enabled
37+
metrics = evaluator(metric_name, verbose=True, **kwargs)
38+
```
39+
Three additional keyword arguments form the basis of evaluation: context, prompt, and output. Their usage varies by metric and is detailed in the respective sections for each metric.
40+
41+
Once the metric class is initialized with the inputs, you can obtain the score by calling the score() function:
42+
```python
43+
metrics.score()
44+
```
45+
By default the `score` function uses a default aggregation function which is the average for relevance type metrics and max for metrics like bias and toxicity. You can pass a custom aggregation function to the score function. The custom function should accept a list of integers and return a float or integer value:
46+
```python
47+
from typing import List, Union
48+
49+
def custom_function(scores: List[int]) -> Union[int, float]:
50+
# Define your custom aggregation function.
51+
52+
metrics.score(aggregation = custom_function)
3153
```
3254

33-
To list all the available metrics
55+
To list all available metrics offered by GroqEval:
3456
```python
3557
>>> evaluator.list_metrics()
3658
['AnswerRelevance', 'Bias', 'ContextRelevance', 'Faithfulness', 'Hallucination', 'Toxicity']
3759
```
60+
This section provides an overview of how to set up and use GroqEval. For detailed usage and calculation methods of each metric, refer to the respective metric sections below.
3861

3962
## Answer Relevance
4063
The Answer Relevance metric evaluates how accurately and closely the responses of a language model align with the specific query or prompt provided. This metric ensures that each part of the output, recognized as coherent statements, is scored for its relevance to the original question, helping to gauge the utility and appropriateness of the model's responses.

0 commit comments

Comments
 (0)