-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathresults.txt
23 lines (16 loc) · 1.9 KB
/
results.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
what is this file ? it reports the specific tasks succeded (on first try) by the different models tried.
the tasks used for evaluation are the 400 training tasks. how to link an id to a task name ? see the file task_no_to_name.json on the repo
format :
model + custom setup if specified (range of evaluated tasks, temperature T) : list of tasks that the model completed successfully
gpt-4 (0->99, T=1 (openai default)): 6, 14, 16, 19, 25, 33, 34, 37, 41, 42, 43, 44, 54, 57, 58, 59, 69, 72, 79, 85, 95
gpt-4 random alphabet (0->99, T=1) (['al', 'hello', 'rl', 'rf', 'was', 'once', 'hi', 'word', 'ape', 'ml']) : 6, 14, 19, 33, 34, 37, 41, 44, 54, 57, 59, 69, 72, 79, 85, 86, 95
gpt-4-turbo (0->99, T=0) : 14, 15, 16, 19, 31, 33, 37, 41, 44, 49, 54, 59, 69, 72, 76, 79, 85, 95
gpt-3.5-turbo (0->99, T=0): 6, 14, 33, 34, 37, 41, 59, 72, 79, 85, 89
gpt-3.5-turbo-instruct (0-> 399, T=0): 6, 19, 33, 34, 41, 54, 59, 72, 79, 85, 103, 104, 117, 123, 126, 156, 160, 174, 176, 179, 197, 204, 217, 221, 226, 232, 246, 247, 251, 255, 259, 267, 283, 296, 299, 312, 331, 346, 364, 392, 393
gpt-3.5-turbo-instruct transposed (0-> 199, T=0): 6, 14, 19, 31, 41, 58, 59, 72, 85, 103, 104, 123, 129, 156, 160, 175, 197, 198
gpt-3.5-turbo-instruct mixed letters alphabet (0-> 99, T=0): 6, 16, 19, 33, 34, 41, 59, 72, 85, 99
gpt-3.5-turbo-instruct transposed AND mixed letters alphabet (0-> 99, T=0): 6, 14, 33, 41, 59, 69, 72, 79, 85
text-davinci-003 (0->399, T=1): 6, 14, 19, 31, 33, 34, 37, 41, 44, 58, 59, 72, 79, 85, 99, 103, 104, 117, 123, 126, 155, 156, 160, 174, 175, 179, 186, 197, 199, 201, 203, 221, 226, 238, 246, 247, 255, 259, 267, 283, 296, 299, 303, 312, 323, 331, 334, 343, 354, 364, 367, 370, 373, 392, 393, 399
text-davinci-002 (0->199, T=1): 6, 14, 16, 31, 33, 34, 41, 58, 59, 72, 79, 85, 104, 117, 118, 123, 156, 174, 179, 186, 197
llama-70b-chat (0->99, T=0): (4 tasks, dont have the specific tasks)
llama-70b-chat (0->99, T=0): (none)