-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: distinguish the parameters that require grad and that is not for PyTorch models #51
Comments
Hi @2catycm, can you clarify what you mean by this? |
Hi, thanks for your reply. Sorry my description was not clear. In pytorch, some In deep learning, usually we need to compute the gradient of all the model parameters w.r.t. the loss, in order to train the model via optimizers like SGD. But in transfer learning and parameter-efficient fine-tuning, it is suggested that not all the model parameters are needed to be modified, some can be freezed in order to preserve the knowledge learned in previous task, in order to prevent catastrophic forgetting. If we train all the parameters, it is called full fine-tuning. If we only train part of the model, for example only the bias(BitFit method), only the LayerNorm layer(LN_Tuning), or add some new modules and only train those modules (like LoRA and Adapter and Prompt Tuning), it is called parameter-efficient fine-tuning. As models like LLM are very large, fine-tuning the pretrained model on many downstream tasks requires someone to save the modificationof the models many times. If the modification is only partial, then it saves storage a lot. Then, it becomes a useful feature when we want to visualize the model before training. It is really helpful to see, which parts of the model we prepared for the training recipe, are frozen(don't requires grad and don't requires storage), while which parts of the model are modified (requires grad). Another Model Visualization library called |
Thanks for the clarification! Added the requires_grad info to the rendered summary of parameters and other torch tensors. |
No description provided.
The text was updated successfully, but these errors were encountered: