-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix phi 3 conversion #8262
Fix phi 3 conversion #8262
Conversation
is this all that's needed? I thought longrope wasn't supported |
Longrope is added in #7225 |
Is that long rope? I thought it was supporting yarn with a long factor, I could be way off base I guess my question is, with this, does it actually work as it should or does this just make conversion work |
(I'm a bit confused because I'm on the fork and |
@bartowski1182 I ran perplexity test yesterday (forgot the save the result), but result seems good. @ann-brown Maybe problem with your git. I converted |
|
@ann-brown branch |
It seems that the chat template has also been changed to support a system prompt. |
Actually we (accidentally) have support for system message from the beginning: https://github.com/ggerganov/llama.cpp/blob/f8d6a23804f3798ff2869da68c1223b618df09ec/src/llama.cpp#L19897-L19905 This is because phi-3 template mostly the same as zephyr template, so we copied the code from zephyr which already had system message support. |
@bartowski1182 Could you confirm if this fix works on your side? I'll merge when someone can confirm that. Thanks. |
Sure I'll check it out Did your perplexity test go past 4k tokens? I'll try to validate the models coherency at ~10k+ just in case too |
@bartowski1182 No I didn't tested beyond -c 4096 (my laptop is potato as always - have to wait until next month for the new rig). Thanks for testing that. |
Btw the 128k "mini" version should work fine with context bigger than 4k. Only beginning from the "small" that they use sliding window and block sparse attention, which will break on llama.cpp from 2k tokens |
It doesn't seem to work with contexts above 4096. ppl with
|
that's kind of what i expected, this isn't a longrope implementation, it's just allowing it to fall through to old rope if its set to longrope |
thanks for the test. hmm, I'm looking at the commit on HF, the only change to the file is a new function https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/configuration_phi3.py#L187 probably there are other changes under the hood |
it's likely because it was trained with whatever the longrope method is, so while this may extend the context, it's in the wrong way and the model will get confused |
It is likely a bug in
|
Oo that's interesting... Good find, where is that change being made sorry? |
@bartowski1182 in llama.cpp, |
I think the problem is that perplexity sets |
ah so it's more than just a conversion change, it'll need an update to the runtime as well? does it work with this conversion change + new llama.cpp change, or does it need a different conversion AND llama.cpp changes? |
I think this is fine as is. It will work on programs that set |
You're right @slaren , when running with |
A small note is that on huggingface model, they simply rename I'm merging this PR now. |
I'm still suspicious but will run some tests and report back if there's anything off, maybe i'm just naturally too suspicious tho |
will the "old" Phi-3s continue to run and convert properly, with this change? also thank you for the diligence, I think it really helps catch subtle issues like these |
It looks good to me. Tried to convert to Q8_0 from f32. 4k ppl
8k ppl
16k ppl
and tried a simple test
and got
not brilliant but alright for a small model. |
@bartowski1182 I did a custom test based on 16k+ tokens input (first 2133 lines of NetHack Guidebook as a single message), and this updated version of Phi-3-mini-128k-instruct freshly converted with llama.cpp b3291 (Q8_0 quant) seems to be working pretty well. Launched with |
You are correct, I meant to update my comment as I did so a test where I fed it some 20k tokens and it replied coherently, my suspicions are completely unjustified and I appreciate you doing a good thorough test to prove it, thank you!! |
Hi Bartowski, any update on Phi 3 Small? |
Reflect the change in latest update: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/commit/10d25dfa1593265daf4a3bac573ab76ccf61d60f
The only changes are:
longrope
Related to #6849 (comment)
Example result after conversion: