Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Mixture of Quantized Experts (MoQE) #747

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

EricLBuehler
Copy link
Owner

https://arxiv.org/abs/2310.02410

Expose the ability to only quantize the experts, not the attention or gating layers. According to the authors of the paper, this easily enables quantization to 3 bit, and even to 2 bit.

Copy link

github-actions bot commented Sep 3, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 46         2018         1718           62          238
 TOML                   20          596          536            2           58
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               30         2080            0         1580          500
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 7          441          395           22           24
 |- TOML                 2           75           63            0           12
 (Total)                           2801          650         1602          549
-------------------------------------------------------------------------------
 Rust                  202        62743        56960         1148         4635
 |- Markdown           103          950           13          885           52
 (Total)                          63693        56973         2033         4687
===============================================================================
 Total                 321        68074        59759         2794         5521
===============================================================================
  

@EricLBuehler EricLBuehler merged commit 5fb7fbf into master Sep 3, 2024
12 checks passed
@EricLBuehler EricLBuehler deleted the phi3.5_moe_moqe branch September 3, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant