You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Three additional keyword arguments form the basis of evaluation: context, prompt, and output. Their usage varies by metric and is detailed in the respective sections for each metric.
40
+
41
+
Once the metric class is initialized with the inputs, you can obtain the score by calling the score() function:
42
+
```python
43
+
metrics.score()
44
+
```
45
+
By default the `score` function uses a default aggregation function which is the average for relevance type metrics and max for metrics like bias and toxicity. You can pass a custom aggregation function to the score function. The custom function should accept a list of integers and return a float or integer value:
This section provides an overview of how to set up and use GroqEval. For detailed usage and calculation methods of each metric, refer to the respective metric sections below.
38
61
39
62
## Answer Relevance
40
63
The Answer Relevance metric evaluates how accurately and closely the responses of a language model align with the specific query or prompt provided. This metric ensures that each part of the output, recognized as coherent statements, is scored for its relevance to the original question, helping to gauge the utility and appropriateness of the model's responses.
0 commit comments