Skip to content

Latest commit

 

History

History
97 lines (80 loc) · 6.3 KB

leaderboard.md

File metadata and controls

97 lines (80 loc) · 6.3 KB
layout
default

Leaderboard

Full leaderboard at Huggingface leaderboard

  • Validation Benchmark Performance is averaged.
  • Final performance will be assessed at the end of the competition on a hidden test set, which may or may not be correlated with Validation performance.
  • Higher values are better.
Rank 🤖 Model / Submission Name ⭐ Validation Performance
1 BVD_Mega 54.5
2 wcf_lar 52.2
3 readapt_median 48.4
4 kobeni 46.7
5 lore_route 45.8
6 shira_llama3_8b_it_algo0 44.9
7 basic_merge_00 44.7
8 llama_base_fc 44.2
9 llama_base_qa 44.2
10 shira_qw2_7b_it_algo0 44.1
11 llama_merge2 42.7
12 mistral_avg_exp_04 42.2
13 shira_mtl7b_0_2_algo0 42.0
14 mistral_avg_exp_05 41.4
15 mistral_avg_exp_07 41.3
16 mistral_avg_exp_06 41.2
17 cdutr_AqQ3 41.1
18 shira_ft5_algo0 40.8
19 shira_ft5xl_algo0 40.8
20 yi15_exp 40.7
21 yi15_exp 39.9
22 llama_avg 38.5
23 llama_avg (Baseline) 38.4
24 knovel_test 38.4
25 abc 38.1
26 flan_t5_avg 38.0
27 llama_optimized 38.0
28 Fbaseline 38.0
29 flan_t5_weights 37.7
30 flan_t5_avg_lora 37.6
31 cdutr_pi5c 37.2
32 my_t5_avg 37.1
33 deepseek_exp 36.5
34 shira_algo_k00 29.5
35 SLM 26.0
36 llama_avg 18.8

Updated on 08/19/2024. The full leaderboard is hosted on Huggingface leaderboard

How to submit your merging method

  • Start from our starter code template LLM-Merging and build your own merging method.
  • Please submit the whole repository. After modifying the files, tar the file into a tarball using the command:
tar -cvf llm-merging.tar LLM-Merging
  • Submit your tar file using this form

  • Please submit a report describing your merging method to our OpenReview LMC 2024 page. Please follow the standard NeurIPS format template. There are no strict restrictions or limitations for the report, but we suggest that the page limit not exceed 4 pages. All submitted reports will be publicly accessible on our website.

Note:

  • Each team’s submission will be evaluated at most once per day. Evaluation frequency will increase as the deadline approaches.
  • An automatic submission method is comming soon.