-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(connect): Rust ray exec #3666
feat(connect): Rust ray exec #3666
Conversation
CodSpeed Performance ReportMerging #3666 will improve performances by 49.77%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks fine to me. Could you also update the spark connect tests to allow them to run with the Ray runner + add that to the CI test matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we isolate the Python dependency to just the Ray runner and the connect_start
Python function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the spark connect code pretty much has a hard dependency on python at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is that there's no non python entrypoint for spark connect, so it doesn't make much sense to try to feature flag when it can only be run from python.
I can follow up with adding this in in another PR. I'm not sure if this'll be straightforward or not yet. |
depends on #3666 see here for proper diff universalmind303/Daft@rust-ray-exec...universalmind303:Daft:error-messages
depends on #3666 see here for proper diff universalmind303/Daft@error-messages...connect_distinct
Description
you can now specify the runner you want to use via native spark config
Note for reviewers
so i had to do a bit of refactoring to get this to work, mostly in how the show string works. The actual ray implementation is isolated within the new
daft-ray-execution
lib, and it's just a wrapper around our existing python code. The idea with putting it in it's own lib is that it creates a better abstraction and if we want to later port more of that code into rust, it'll be a lot easier.also a few small drivebys that were bugging me while working on this
warn!
's todebug!
's as it was cluttering the output on every command.PlanIds
to actually reflects what it does, aResponseBuilder
.