-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: Suggestion how to simplify serialization of funcs #320
MNT: Suggestion how to simplify serialization of funcs #320
Conversation
Description It looks like the state we save for functions contains unnecessary information (the class name, which is not used) and duplicate information (the module name appears twice). This change gets rid of that unnecessary information. This also allows to remove the need for ufunc_get_state, which can now be replaced by function_get_state. Comment Maybe I'm missing something and there is a reason for the previous structure, but if there is, there should be a test to check whatever is covered by that, or at the very least a comment to explain. Btw. can't remember who wrote this, could be my fault :)
@skops-dev/maintainers ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice cleanup
skops/io/_numpy.py
Outdated
@@ -200,13 +201,9 @@ def _construct(self): | |||
# get_state method for them here. The load is the same as other functions. | |||
def ufunc_get_state(obj: Any, save_context: SaveContext) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want to remove this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, good catch, done
We finally got here. This simplification means users with a new skops version won't be able to load a model dumped with an older skops version. So we need to increase the file format version with this, and somehow route loading to the right methods. The "easiest" way would be to have a separate folder/module for each version, I'm not sure how we can do better. |
Aha, yes, totally right.
I assume that is to avoid cluttering the code with It would be nice if we could automatically dispatch not only on type but also protocol. Then we could have different versions for, say, |
The comparison here is with adding a version to the dispatcher sounds like a very good idea to me, do you think you could come up with a prototype for that? |
Hmm, yes and no. Of course, we're positioning ourselves to replace pickle for this purpose. But if a user wants to load a 5 year old sklearn model dumped with skops, they would either be on an unsupported sklearn version or they would load with a new sklearn version, which gives no guarantees that the model still works. I think it would provide tremendous value if we could give long guarantees, longer than sklearn would. But I think we should allow ourselves a "breaking" release (skops 1.0) as we should expect the format to still change quite a lot.
Not sure, given the problems I mentioned above, I'll think about it. |
Taking the changes from skops-dev#320 to how functions are persisted and adding them here. Therefore, this PR supersedes skops-dev#320. That change is added here because it is the perfect test case for the update route of the skops protocol.
Closed in favor of #322 |
Description
It looks like the state we save for functions contains unnecessary information (the class name, which is not used) and duplicate information (the module name appears twice). This change gets rid of that unnecessary information.
This also allows to remove the need for
ufunc_get_state
, which can now be replaced byfunction_get_state
.Comment
Maybe I'm missing something and there is a reason for the previous structure, but if there is, there should be a test to check whatever is covered by that, or at the very least a comment to explain.
Btw. can't remember who wrote this, could be my fault :)