-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added CLI command to update skops files #343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this.
Before going further, we need a few more things: tests and documentation (see docs/persistence.rst
).
IMO it would also be nice to have a feature to check the protocol of the loaded file and if it's the same as the current protocol, to skip the conversion. If it's too complicated, we can also leave this open for a future PR, though.
I finally moved this PR from "Draft" to "ready for review". 🙂 You can see that, for now, I have not implemented any extra checks that we discussed here and in the issue. I wanted to make sure I’m on the right track before adding anything else. The interfaces of the functions are now slightly different from the ones in |
I did a quick pass and it looks good to me overall. @adrinjalali would you be fine with this PR as is, i.e. no extra checks etc.? |
I'm happy as is, without the checks, but we should figure out the checks separately, we've needed them in a few places now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is a very clean implementation, well done.
There are only a few minor code comments from my side, please take a look.
Regarding the more general points, I have two suggestions (not must haves):
-
How about making it possible to call the command without
-o
to just overwrite the existing file? I.e.skops update my_model.skops
would write the new, updated skops file tomy_model.skops
. -
There is no test for the actual work that is being done. So e.g. if the
_update_file
function would just create a copy, the tests would still pass fine. I know that creating a file in an old skops format to test the updating feature is not trivial. Maybe you could come up with something clever? Say, create the skops file, manually set the protocol to an older version, then checking after the update that the protocol is the current version now. WDYT?
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Thank you very much for your review!
I actually did that initially, but then I thought that if we did that by default, it could be a negative surprise for certain users that want to keep the original file. Another idea was to attach a prefix or a suffix to the file name provided as input if no
You are right! I added the protocol check inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the feedback. Clever way of patching the protocol.
I think there is a bug with one of the log messages, please take a look at my comment. Otherwise, I just left some comments on how to potentially tighten the tests.
I actually did that initially, but then I thought that if we did that by default, it could be a negative surprise for certain users that want to keep the original file. Another idea was to attach a prefix or a suffix to the file name provided as input if no
output-file
is provided. For example:my-model.skops
could be updated by default tomy-model-updated.skops
. In the end, this is a design decision; I'm happy to do what you think is best :)
Good points. My intuition for being okay with allowing to overwrite the existing file is that other terminal commands also allow to just overwrite existing files without warning (say, mv
). Adding a prefix/suffix would be a way to prevent accidents but has its own complications (what if my-model-updated.skops
already exists?). @adrinjalali wanna break the tie?
re: overwriting the existing file I think kinda what only print info about the diffs by default, no change in any file, no new files created
WDYT? |
Hmm, how exactly would that diff look like for a skops file?
I can get behind the idea to allow inplace but requiring an extra argument. Not sure about |
we can start by reporting if there'd be any change at all or not, we can improve the diff maybe later if needed. It could be something like the diff of the json schema, and reporting any other saved file (like numpy files) if they're changed.
Input is usually just an argument, w/o a parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. I missed the test for the error, my bad. There is a merge conflict, apart from that, the PR is ready to be merged.
I'm gonna wait for this one before pushing out a release :) |
Hi both so sorry for the delay, I have been very busy lately. 🥲 I checked all the remaining comments and fixed the conflicts. Let me know if anything else should be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks a lot for all your work. I don't have any blockers for merging, just two comments, please take a look. If the general thought is that they don't require changes, I'm fine with you merging @adrinjalali
skops/cli/_update.py
Outdated
) | ||
return None | ||
|
||
dump(input_model, output_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an extreme edge case, but I wanted to bring it up: If the user cancels the process during the dump
ing operation, could they end up with a broken file? Since we write to the zip archive progressively, I think it could happen. Sometimes, I would start a command and see I did a mistake and would cancel it, so it's not impossible.
Now when the user uses inplace, they could even bust their original file, ending up with no working version. It is an extreme edge case for this to happen, not sure if we want to take measures against it. @adrinjalali WDYT?
To prevent this behavior, we could write the output to a temp file and then move it to output_file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is quite major, and you're right. We should simply create a temp file beside the existing file, and then replace the existing file with what's created if conversion is successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi both :) is this what you had in mind? 3a30721
parser_subgroup.add_argument( | ||
"--inplace", | ||
help="Update and overwrite the input file in place.", | ||
action="store_true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That behavior would be fine. It would also be acceptable IMO to update inplace by default, since it would only be done if the current version is higher, but leaving it like this is also good.
Just one question: Right now, if using the defaults (no change to log level), a user would typically not get feedback at all, right? Do we want the default behavior to be not do anything, nor message anything?
skops/cli/_update.py
Outdated
tmp_output_file = f"{output_file}.tmp" | ||
logger.debug(f"Writing updated skops file to temporary path: {tmp_output_file}") | ||
dump(input_model, tmp_output_file) | ||
logger.debug(f"Moving updated skops file to output path: {output_file}") | ||
os.replace(tmp_output_file, output_file) | ||
logger.info(f"Updated skops file written to {output_file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't need so much debug logging.
This should use a temp file from (https://docs.python.org/3/library/tempfile.html), and in a context manager to protect from random OS failures. Otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it now. I used tempfile.TemporaryDirectory
because I understood that tempfile.NamedTemporaryFile
cannot be moved when opened on Windows ("Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows).").
Am I overcomplicating this? 🤔
The issues on the CI are numpy / persistence related. Merging. |
@BenjaminBossan somehow here it shows you've requested changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes, this LGTM now.
Done now! 🤞 |
There is still a small issue with SciPy, that it shouldn't be caused by this PR |
Made a separate PR in an attempt to fix the issues with the |
The failing tests are unrelated to this PR, so I'll merge. |
Reference Issues/PRs
Closes #333
What does this implement/fix? Explain your changes.
Taking inspiration from the implementation from
skops convert
, this adds theupdate
command to the CLI.It allows to update a
skops
file by simply 'loading' it and 'dumping' it again.I have also added MacOS files in the gitignore.