-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add performance test + GH action to run it weekly #289
ENH Add performance test + GH action to run it weekly #289
Conversation
Out of curiosity, I did some profiling of the script (with slight modifications) using scalene to see if there are any bottlenecks in our persistence code. The result was not too enlightening, not sure if really everything that's relevant is being shown. At the very least, there doesn't appear to be a single function call that takes a big chunk of time. I have attached a screenshot (can't upload the html here unfortunately). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite nice!
Could you maybe add a bit of comments/basic docstrings to the methods in the python file? It took me a while the first time reading it to figure out what's doing what. Otherwise LGTM.
time_pickle = timeit.timeit(run_pickle, number=number) / number | ||
time_skops = timeit.timeit(run_skops, number=number) / number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably would take the median rather than the mean here, but that probably complicates things.
This should hopefully make it easier to understand what the functions are doing. Also had to update pre-commit dependencies because I was running into this issue: pre-commit/pre-commit#2782 There seems to have been something wrong with its dependencies. Pretty sure it's unrelated to this PR.
@adrinjalali I added docstrings and type annotations. This should hopefully make it easier to understand what the functions Also had to update pre-commit dependencies because I was running into
There seems to have been something wrong with its dependencies. Pretty |
Resolves #275
Ready for review @skops-dev/maintainers
Description
This PR adds a GitHub action that is run on a weekly basis on Sunday. For all
tested sklearn estimators, it persists them once with pickle and once with skops,
and then records the timing difference. If the absolute difference is too large,
an error is raised and we can take a look at what went wrong.
For convenience, that 10 slowest estimators are printed to stdout. That way, we
can find out problematic candidates by looking at the log.
The persistence of an estimator is considered to be too slow when it takes more
than 1 sec longer to persist with skops than with pickle.
Comments
If someone has a good idea how to test that the GH action is specified correctly
without actually merging it, let me know. For reference, this action is based on
https://github.com/skorch-dev/skorch/pull/904/files, which works.
On top of running the action, we could also add a badge to the README. This
would show at a glance if the last run failed, but it's not strictly necessary.
Results
Running the script locally, I found these results:
So the slowest absolute difference is 0.24 sec, which I think is more than
acceptable. Relative differences can be quite large (100x), which is maybe not
too surprising, given that pickle is highly optimized and skops performs some
extra checks.
According to the criterion described above, none of the estimators is
unacceptably slow, so the action would pass successfully.