ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

tipame · 2023-01-27T11:31:34Z

Description

ai.djl.nn.core.Embedding - implements a simple lookup table that stores embeddings of a fixed dictionary and size.
Embedding matrix stored as Parameter:
protected Parameter embedding;
This parameter created with flag requireGrad = true (default for all parameters) - so content of embedding matrix always changes during optimization step. As a result embedding function gives different results for equal inputs after each optimization.

Expected Behavior

Embedding matrix parameter should be created with flag requireGrad = false

The text was updated successfully, but these errors were encountered:

KexinFeng · 2023-01-30T17:59:03Z

Given that embedding is a block, have you tried ai/djl/nn/Block.java:300 freezeParameters() method to solve this issue? If this can be solved this way, then the remaining issue is how to set the default behaviour of embedding.

tipame · 2023-01-31T10:21:15Z

I'm already add parameter freeze as workaroud - and it certainly works.
In my thoughts parameter should be created with requireGrad = false:
embedding = addParameter( Parameter.builder() .setName("embedding") .setType(Parameter.Type.WEIGHT) .optRequiresGrad(false) .build());

KexinFeng · 2023-01-31T16:08:31Z

Just to clarify, the following is your workaround, right?

embedding = addParameter( Parameter.builder() .setName("embedding") .setType(Parameter.Type.WEIGHT) .optRequiresGrad(false) .build());

Ok. Then it will be the issue of setting the default behaviour of embedding. Changing default behaviour will affect other existing applications. We will probably need to evaluate such effect first.

tipame · 2023-02-01T04:22:36Z

No - currently i'm using freeze() in my Embeding extending class. But in my opinion enabled gradient on embedding matrix - it's not a "default behaviour" it's a bug. Embeding is way to represent scalar events as vectors. A set of input events is limited and mapping between scalar and vector is always one to one. If someone will use current "behaviour" as is - he would get different result vectors for one scalar event as it was two differend input events.
Also training of embedding matrix may degrade initial random distribution - as a result there would be duplicated (equal) vectors representing different input events.
Pytorch embeding implementation uses static (freezed) embeding matrix.

zachgk · 2023-02-01T20:46:58Z

Thanks you for raising this discussion. The main reason we had it training was because the behavior across DJL was that all blocks are created as unfrozen. I discussed this issue with some of the others and we think you raise a good point. So, I moved the behavior in #2360 to having all blocks created with initialization as unfrozen and blocks created with preTrained data (like this case) as frozen.

tipame added the bug Something isn't working label Jan 27, 2023

zachgk mentioned this issue Feb 1, 2023

Creates standard for PreTrained behavior #2360

Merged

zachgk closed this as completed in #2360 Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

tipame commented Jan 27, 2023

KexinFeng commented Jan 30, 2023

tipame commented Jan 31, 2023

KexinFeng commented Jan 31, 2023 •

edited

Loading

tipame commented Feb 1, 2023

zachgk commented Feb 1, 2023

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

ai.djl.nn.core.Embedding embedding matrix changes during optimization #2351

Comments

tipame commented Jan 27, 2023

Description

Expected Behavior

KexinFeng commented Jan 30, 2023

tipame commented Jan 31, 2023

KexinFeng commented Jan 31, 2023 • edited Loading

tipame commented Feb 1, 2023

zachgk commented Feb 1, 2023

KexinFeng commented Jan 31, 2023 •

edited

Loading