-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Dealing with skops persistence protocol updates #322
ENH Dealing with skops persistence protocol updates #322
Conversation
This PR is a POC that is intended as a basis for discussing how we can deal with updating the skops persistence protocol. The problem in general occurs if we make a change to the state that is stored with an object. For instance, we could store an additional field. When we try to load old state without that key, there would be an error. Therefore, we have to somehow adjust our loading code to deal with this possibility. First of all, we want to avoid having a bunch of conditionals in our loading code that checks what protocol is used and then does this or that thing. This approach would become messy and error prone quite quickly, because we intend to support old protocols for a very long time. This proposal here would allow to write a clean implementation of the new deserialization code without regard for backwards compatibility. The old code, however, would not be deleted but instead moved to a different module and registered with its old protocol number. Then, while loading, we would check the protocol stored in the schema and use the old code if there is a match for it. If there is no match, we assume we can safely use the current protocol instead. This is just a draft and I haven't tested it on a real example. If we agree that this is the way forward, I would expand on it and test it properly. But first we should agree that this is the way to go and check if there are no problems with this approach. Below are the instructions with how to deal with a change in protocol: Every time that a backwards incompatible change to the skops format is made for the first time within a release, the protocol should be bumped to the next higher number. The old version of the Node, which knows how to deal with the old state, should be preserved and registered. Let's give an example: - There is a BC breaking change in FunctionNode. - Since it's the first BC breaking change in the skops format in this release, bump skops.io._protocol.PROTOCOL (this file) from version X to Y. - Move the old FunctionNode code into 'skops/io/old/_general_vX.py', where 'X' is the old protocol. - Register the _general_vX.FunctionNode in NODE_TYPE_MAPPING inside of _persist.py. Now, if a user loads a FunctionNode state with version X using skops with version Y, the old code will be used instead of the new one. For all other node types, if there is no loader for version X, skops will automatically use version Y instead.
Not sure why pre-commit didn't catch that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than adding some docs in the _general_v0.py
and some tests to check for old protocol, I think this looks perfect.
One way to go about that would be that I wrap #320 into this PR and have the old + new |
adding #320 here sounds good to me. |
Taking the changes from skops-dev#320 to how functions are persisted and adding them here. Therefore, this PR supersedes skops-dev#320. That change is added here because it is the perfect test case for the update route of the skops protocol.
@skops-dev/maintainers Despite the best efforts of codecov and rtd, I managed to get CI green. Ready for review. |
PROTOCOL = 0 | ||
|
||
|
||
class FunctionNode(Node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should actually this this one, shouldn't we? we're not at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this message is a bit cryptic :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I meant to write, "we should actually test this one, and we're not at the moment", as in, test this particular upgrade for this particular node type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. I added a test, basically the same as for the old version, minus the downgrading. Strictly speaking, this was already tested indirectly through other tests, which is why there was no test to begin with, but it was easy enough to add.
Also, add "# pragma: no cover" where it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replied to your comments, please check again.
PROTOCOL = 0 | ||
|
||
|
||
class FunctionNode(Node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this message is a bit cryptic :P
Test for old and new protocol version are basically identical.
I'm feeling much better about our persistence now :) |
This PR is a POC that is intended as a basis for discussing how we can deal with updating the skops persistence protocol.This is a working implementation that shows one way to deal with updates in the skops persistence protocol in a backwards compatible manner that should, hopefully, scale.
As an example, the update to how functions are persisted is rolled into this PR to show that it works. Therefore, this PR supersedes #320.
Problem description
The problem in general occurs if we make a change to the state that is stored with an object. For instance, we could store an additional field. When we try to load old state without that key, there would be an error. Therefore, we have to somehow adjust our loading code to deal with this possibility.
Rejected solution
We want to avoid having a bunch of conditionals in our loading code that checks what protocol is used and then does this or that thing. This approach would become messy and error prone quite quickly, because we intend to support old protocols for a very long time.
Proposed solution
This proposal here would allow to write a clean implementation of the new deserialization code without regard for backwards compatibility. The old code, however, would not be deleted but instead moved to a different module and registered with its old protocol number. Then, while loading, we would check the protocol stored in the schema and use the old code if there is a match for it. If there is no match, we assume we can safely use the current protocol instead. For this to work, the key used for mapping the state to the loader is changed to be the tuple
(loader, protocol)
, instead of just theloader
, as has been the case so far. This is a hopefully transparent way to "dispatch" on both the loader and the protocol.This is just a draft and I haven't tested it on a real example. If we agree that this is the way forward, I would expand on it and test it properly. But first we should agree that this is the way to go and check if there are no problems with this approach.
Description of the mechanism for changing the protocol
Below are the instructions with how to deal with a change in protocol:
Every time that a backwards incompatible change to the skops format is made for the first time within a release, the protocol should be bumped to the next higher number. The old version of the Node, which knows how to deal with the old state, should be preserved and registered. Let's give an example:
FunctionNode
.skops.io._protocol.PROTOCOL
from version X to Y.FunctionNode
code intoskops/io/old/_general_vX.py
, where 'X' is the old protocol._general_vX.FunctionNode
inNODE_TYPE_MAPPING
inside of_persist.py
(on top of registering the current version).test_persist_old.py
that shows that the old state can still be loaded.Now, if a user loads a
FunctionNode
state with version X using skops with version Y, the old code will be used instead of the new one. For all other node types, if there is no loader for version X, skops will automatically use version Y instead.@skops-dev/maintainers RFC