Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

Closed
tipame opened this issue Jan 27, 2023 · 5 comments · Fixed by #2360
Closed

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

tipame opened this issue Jan 27, 2023 · 5 comments · Fixed by #2360
Labels
bug Something isn't working

Comments

@tipame
Copy link
Contributor

tipame commented Jan 27, 2023

Description

ai.djl.nn.core.Embedding - implements a simple lookup table that stores embeddings of a fixed dictionary and size.
Embedding matrix stored as Parameter:
protected Parameter embedding;
This parameter created with flag requireGrad = true (default for all parameters) - so content of embedding matrix always changes during optimization step. As a result embedding function gives different results for equal inputs after each optimization.

Expected Behavior

Embedding matrix parameter should be created with flag requireGrad = false

@tipame tipame added the bug Something isn't working label Jan 27, 2023
@KexinFeng
Copy link
Contributor

Given that embedding is a block, have you tried ai/djl/nn/Block.java:300 freezeParameters() method to solve this issue? If this can be solved this way, then the remaining issue is how to set the default behaviour of embedding.

@tipame
Copy link
Contributor Author

tipame commented Jan 31, 2023

I'm already add parameter freeze as workaroud - and it certainly works.
In my thoughts parameter should be created with requireGrad = false:
embedding = addParameter( Parameter.builder() .setName("embedding") .setType(Parameter.Type.WEIGHT) .optRequiresGrad(false) .build());

@KexinFeng
Copy link
Contributor

KexinFeng commented Jan 31, 2023

Just to clarify, the following is your workaround, right?

embedding = addParameter( Parameter.builder() .setName("embedding") .setType(Parameter.Type.WEIGHT) .optRequiresGrad(false) .build());

Ok. Then it will be the issue of setting the default behaviour of embedding. Changing default behaviour will affect other existing applications. We will probably need to evaluate such effect first.

@tipame
Copy link
Contributor Author

tipame commented Feb 1, 2023

No - currently i'm using freeze() in my Embeding extending class. But in my opinion enabled gradient on embedding matrix - it's not a "default behaviour" it's a bug. Embeding is way to represent scalar events as vectors. A set of input events is limited and mapping between scalar and vector is always one to one. If someone will use current "behaviour" as is - he would get different result vectors for one scalar event as it was two differend input events.
Also training of embedding matrix may degrade initial random distribution - as a result there would be duplicated (equal) vectors representing different input events.
Pytorch embeding implementation uses static (freezed) embeding matrix.

@zachgk
Copy link
Contributor

zachgk commented Feb 1, 2023

Thanks you for raising this discussion. The main reason we had it training was because the behavior across DJL was that all blocks are created as unfrozen. I discussed this issue with some of the others and we think you raise a good point. So, I moved the behavior in #2360 to having all blocks created with initialization as unfrozen and blocks created with preTrained data (like this case) as frozen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants