-
-
Notifications
You must be signed in to change notification settings - Fork 3
docs should mention limitation of sbert #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks, that's a good tip - I've added it to the "usage" section. |
I'm sharing my own 'rolling' sbert script to avoid clipping the sentences. It's seemingly functionnal but not very elegent, a class would be better of course but I just hope it helps someone :
edit: fixed the code :/ |
I like the idea of that "rolling window". Is it the way to do it, or is it just to have an alternative to clipping i.e how well does it work? |
Keep in mind that my implementation is pretty naive and can certainly be vastly optimized but the idea is there. An implementation for langchain can be found here. A nonlangchain implementation in a code I'm using regularly can be found here I don't know if I have found by chance the way but probably not. In my example I did a maxpooling but I could have done a meanpooling instead. That also brings about the question of L1 vs L2 if doing a normalization. Also one can think about having an exponential decay of the importance of each new token of text etc. In my experience: Maxpooling or meanpooling seem to work fine. More tests would be needed with proper metrics to find out which is best and if the enhancement is not just placebo that degrade results. |
Hi,
I encounter time and time again people disappointed by the effectiveness of the sentence-transformers models. Usually the reason being that the models have very short "max sequence length" (the default model is 256) and everything after that is silently clipped.
GIven that this happens silently, I think most people are not aware of that. And the multilingual models have even shorted length!
I brought that up several times here on langchain and there too.
So I think it would be good to mention this in the README.md.
And if anyone is down for writing a simple wrapper that does a rolling average/maxpooling/whateverpooling of the input instead of clipping it that would be awesome! That would be a workaround that can't possibly be worse than just clipping the input right?
Cheers and llm is great!
(related to simonw/llm#220)
The text was updated successfully, but these errors were encountered: