-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351
Comments
Given that embedding is a block, have you tried |
I'm already add parameter freeze as workaroud - and it certainly works. |
Just to clarify, the following is your workaround, right?
Ok. Then it will be the issue of setting the default behaviour of embedding. Changing default behaviour will affect other existing applications. We will probably need to evaluate such effect first. |
No - currently i'm using freeze() in my Embeding extending class. But in my opinion enabled gradient on embedding matrix - it's not a "default behaviour" it's a bug. Embeding is way to represent scalar events as vectors. A set of input events is limited and mapping between scalar and vector is always one to one. If someone will use current "behaviour" as is - he would get different result vectors for one scalar event as it was two differend input events. |
Thanks you for raising this discussion. The main reason we had it training was because the behavior across DJL was that all blocks are created as unfrozen. I discussed this issue with some of the others and we think you raise a good point. So, I moved the behavior in #2360 to having all blocks created with initialization as unfrozen and blocks created with preTrained data (like this case) as frozen. |
Description
ai.djl.nn.core.Embedding - implements a simple lookup table that stores embeddings of a fixed dictionary and size.
Embedding matrix stored as Parameter:
protected Parameter embedding;
This parameter created with flag requireGrad = true (default for all parameters) - so content of embedding matrix always changes during optimization step. As a result embedding function gives different results for equal inputs after each optimization.
Expected Behavior
Embedding matrix parameter should be created with flag requireGrad = false
The text was updated successfully, but these errors were encountered: