Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates standard for PreTrained behavior #2360

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

zachgk
Copy link
Contributor

@zachgk zachgk commented Feb 1, 2023

This changes the standard for DJL behavior with preTrained blocks. As of now, they should also start out with frozen parameters. This has been applied to the embeddings.

It was previously applied only to PyTorch, but as of now applies to all models. However, I did leave a carveout for models. It adds a boolean "wasLoaded" so that if you load a model and then create a Trainer directly from it, it will not be frozen. If you load a model and then append some new layers to it (as we have several examples of), then it will need to be unfrozen to retrain.

fixes #2351

This changes the standard for DJL behavior with preTrained blocks. As of now,
they should also start out with frozen parameters. This has been applied to the embeddings.

It was previously applied only to PyTorch, but as of now applies to all models.
However, I did leave a carveout for models. It adds a boolean "wasLoaded" so
that if you load a model and then create a Trainer directly from it, it will not
be frozen. If you load a model and then append some new layers to it (as we have
several examples of), then it will need to be unfrozen.
@codecov-commenter
Copy link

Codecov Report

Base: 72.08% // Head: 74.37% // Increases project coverage by +2.28% 🎉

Coverage data is based on head (f118ce9) compared to base (bb5073f).
Patch coverage: 74.70% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2360      +/-   ##
============================================
+ Coverage     72.08%   74.37%   +2.28%     
- Complexity     5126     6817    +1691     
============================================
  Files           473      670     +197     
  Lines         21970    29599    +7629     
  Branches       2351     3073     +722     
============================================
+ Hits          15838    22013    +6175     
- Misses         4925     6086    +1161     
- Partials       1207     1500     +293     
Impacted Files Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java 69.23% <ø> (-4.11%) ⬇️
...rc/main/java/ai/djl/modality/cv/MultiBoxPrior.java 76.00% <ø> (ø)
...rc/main/java/ai/djl/modality/cv/output/Joints.java 71.42% <ø> (ø)
.../main/java/ai/djl/modality/cv/output/Landmark.java 100.00% <ø> (ø)
...main/java/ai/djl/modality/cv/output/Rectangle.java 72.41% <0.00%> (ø)
...i/djl/modality/cv/translator/BigGANTranslator.java 21.42% <0.00%> (-5.24%) ⬇️
.../modality/cv/translator/ImageFeatureExtractor.java 0.00% <0.00%> (ø)
.../ai/djl/modality/cv/translator/YoloTranslator.java 27.77% <0.00%> (+18.95%) ⬆️
...ain/java/ai/djl/modality/cv/util/NDImageUtils.java 67.10% <0.00%> (+7.89%) ⬆️
api/src/main/java/ai/djl/modality/nlp/Decoder.java 63.63% <ø> (ø)
... and 616 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zachgk zachgk merged commit cba412d into deepjavalibrary:master Feb 3, 2023
@zachgk zachgk deleted the preTrained branch February 3, 2023 17:58
zachgk added a commit to zachgk/djl that referenced this pull request Feb 14, 2023
In deepjavalibrary#2360, the behavior of using pre-trained models was to freeze parameters.
However freezing the parameters on MXNet seems to cause a significant
performance regression for training. This removes those changes for a temporary
workaround until a deeper investigation can take place.
frankfliu added a commit that referenced this pull request Feb 14, 2023
* Remove performance issues from freezing MXNet

In #2360, the behavior of using pre-trained models was to freeze parameters.
However freezing the parameters on MXNet seems to cause a significant
performance regression for training. This removes those changes for a temporary
workaround until a deeper investigation can take place.

Co-authored-by: Frank Liu <frankfliu2000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ai.djl.nn.core.Embedding embedding matrix changes during optimization
3 participants