Improve startup time of workers #85

lars-reimann · 2024-04-13T22:12:15Z

During evaluation, we observed delays ranging from 3s to 20s+ between a program message and the first values coming back, regardless of the size of the data. For better usability, this time should be reduced as much as possible, since this

delays the table view showing up
and later causes another delay for the profiling.

lars-reimann · 2024-04-17T08:59:42Z

A few imports in the library constitute most of the delay:

(Measure-Command { python -c "import seaborn" }).TotalSeconds: 1.21s
(Measure-Command { python -c "import torch" }).TotalSeconds: 1.24s
(Measure-Command { python -c "import torchvision" }).TotalSeconds: 1.90s (also loads torch)
(Measure-Command { python -c "import openpyxl" }).TotalSeconds: 0.26s

We should get major improvements regarding startup by:

Lazily loading our own classes in __init__ files (apipkg). Why is this not the default behavior in Python?!
Lazily importing external libraries. Why is this not the default behavior in Python?!

Unfortunately, many libraries currently import all their stuff into a global package, so if we use anything from, say seaborn, there will still be the 1.21s delay. This can only be fixed by the libraries themselves.

Regarding the runner, we will need to have a primed process ready that can handle the execution of the next pipeline. Ideally, we would also keep those processes alive and reset them after pipeline execution is done, so we don't have the startup delay again and again.

WinPlay02 · 2024-04-17T13:41:16Z

Ideally, we would also keep those processes alive and reset them after pipeline execution is done, so we don't have the startup delay again and again.

@lars-reimann This is somewhat supported, as metapath (for loading our pipelines) is already reset after a pipeline ends. The process could theoretically be reused, but this behavior would not guarantee correctness in all cases. States from external libraries (like seaborn) however, are not reset.

Regarding the runner, we will need to have a primed process ready that can handle the execution of the next pipeline.

I'll try something out for tomorrow, maybe something like that can be achieved.

lars-reimann · 2024-04-17T13:42:33Z

The library improvements are already in Safe-DS/Library#624. But that will only get us so far. It would be great if you could take a look.

lars-reimann · 2024-04-17T14:46:35Z

There's also a new release of the library now, containing your hash functions and your other improvements, as well as the lazy imports.

Closes partially #85 ### Summary of Changes Require the latest version of `safe-ds`.

## [0.11.0](v0.10.0...v0.11.0) (2024-04-17) ### Features * bump `safe-ds` to `v0.21.0` ([#86](#86)) ([d780822](d780822)), closes [#85](#85) * memoization improvements ([#81](#81)) ([6bc2288](6bc2288)), closes [#44](#44)

Closes #85 ### Summary of Changes - Use a process pool to keep started processes waiting - The max. amount of pipeline processes is now set to `4`. - Reuse started processes. This should be correct, as the same pipeline process cannot be used by multiple pipelines at the same time. As the `metapath` is reset to remove the custom generated Safe-DS pipeline code, only global library imports (and settings) should remain. If this is a concern, `maxtasksperchild` can be set to `1`, in which case pipeline processes are not reused. - Reuse shared memory location for saving placeholders, if the memoization infrastructure has added such a location to the object being saved --------- Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com> Co-authored-by: Lars Reimann <mail@larsreimann.com>

## [0.12.0](v0.11.0...v0.12.0) (2024-04-22) ### Features * handle list of filenames in `absolute_path` and `file_mtime` ([#89](#89)) ([50d831f](50d831f)), closes [#88](#88) * prepare and pool processes ([#87](#87)) ([e5e7011](e5e7011)), closes [#85](#85)

lars-reimann · 2024-04-22T18:27:34Z

🎉 This issue has been resolved in version 0.12.0 🎉

The release is available on:

v0.12.0
GitHub release

Your semantic-release bot 📦🚀

lars-reimann added the performance 🏃 Speed things up label Apr 13, 2024

lars-reimann added this to DSL Apr 13, 2024

github-project-automation bot moved this to Backlog in DSL Apr 13, 2024

lars-reimann mentioned this issue Apr 17, 2024

feat: bump safe-ds to v0.21.0 #86

Merged

lars-reimann added a commit that referenced this issue Apr 17, 2024

feat: bump safe-ds to v0.21.0 (#86)

d780822

Closes partially #85 ### Summary of Changes Require the latest version of `safe-ds`.

WinPlay02 mentioned this issue Apr 17, 2024

feat: prepare and pool processes #87

Merged

lars-reimann closed this as completed in #87 Apr 21, 2024

github-project-automation bot moved this from Backlog to ✔️ Done in DSL Apr 21, 2024

lars-reimann added the released Included in a release label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve startup time of workers #85

Improve startup time of workers #85

lars-reimann commented Apr 13, 2024 •

edited

Loading

lars-reimann commented Apr 17, 2024 •

edited

Loading

WinPlay02 commented Apr 17, 2024 •

edited

Loading

lars-reimann commented Apr 17, 2024 •

edited

Loading

lars-reimann commented Apr 17, 2024

lars-reimann commented Apr 22, 2024

Improve startup time of workers #85

Improve startup time of workers #85

Comments

lars-reimann commented Apr 13, 2024 • edited Loading

lars-reimann commented Apr 17, 2024 • edited Loading

WinPlay02 commented Apr 17, 2024 • edited Loading

lars-reimann commented Apr 17, 2024 • edited Loading

lars-reimann commented Apr 17, 2024

lars-reimann commented Apr 22, 2024

lars-reimann commented Apr 13, 2024 •

edited

Loading

lars-reimann commented Apr 17, 2024 •

edited

Loading

WinPlay02 commented Apr 17, 2024 •

edited

Loading

lars-reimann commented Apr 17, 2024 •

edited

Loading