Creates standard for PreTrained behavior #2360
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes the standard for DJL behavior with preTrained blocks. As of now, they should also start out with frozen parameters. This has been applied to the embeddings.
It was previously applied only to PyTorch, but as of now applies to all models. However, I did leave a carveout for models. It adds a boolean "wasLoaded" so that if you load a model and then create a Trainer directly from it, it will not be frozen. If you load a model and then append some new layers to it (as we have several examples of), then it will need to be unfrozen to retrain.
fixes #2351