Implement Mixture of Quantized Experts (MoQE) #747

EricLBuehler · 2024-09-03T20:43:51Z

Expose the ability to only quantize the experts, not the attention or gating layers. According to the authors of the paper, this easily enables quantization to 3 bit, and even to 2 bit.

github-actions · 2024-09-03T20:44:56Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 46         2018         1718           62          238
 TOML                   20          596          536            2           58
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               30         2080            0         1580          500
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 7          441          395           22           24
 |- TOML                 2           75           63            0           12
 (Total)                           2801          650         1602          549
-------------------------------------------------------------------------------
 Rust                  202        62743        56960         1148         4635
 |- Markdown           103          950           13          885           52
 (Total)                          63693        56973         2033         4687
===============================================================================
 Total                 321        68074        59759         2794         5521
===============================================================================

Implement mixture of quantized experts

6d6e74c

EricLBuehler merged commit 5fb7fbf into master Sep 3, 2024
12 checks passed

EricLBuehler deleted the phi3.5_moe_moqe branch September 3, 2024 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Mixture of Quantized Experts (MoQE) #747

Implement Mixture of Quantized Experts (MoQE) #747

EricLBuehler commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

Implement Mixture of Quantized Experts (MoQE) #747

Implement Mixture of Quantized Experts (MoQE) #747

Conversation

EricLBuehler commented Sep 3, 2024

github-actions bot commented Sep 3, 2024