Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connect): Rust ray exec #3666

Merged
merged 16 commits into from
Jan 14, 2025

Conversation

universalmind303
Copy link
Contributor

@universalmind303 universalmind303 commented Jan 10, 2025

Description

you can now specify the runner you want to use via native spark config

from daft.daft import connect_start
from pyspark.sql import SparkSession

server = connect_start()
url = f"sc://localhost:{server.port()}"


daft_spark = SparkSession.builder.appName("DaftConnectExample").remote(url).getOrCreate()

daft_spark.conf.set("daft.runner", "ray")
# or use native
# daft_spark.conf.set("daft.runner", "native")

df1 = daft_spark.read.parquet("~/datasets/tpcds/sf10/customer.parquet")
df1.limit(10).show()

Note for reviewers

so i had to do a bit of refactoring to get this to work, mostly in how the show string works. The actual ray implementation is isolated within the new daft-ray-execution lib, and it's just a wrapper around our existing python code. The idea with putting it in it's own lib is that it creates a better abstraction and if we want to later port more of that code into rust, it'll be a lot easier.

also a few small drivebys that were bugging me while working on this

  • change warn!'s to debug!'s as it was cluttering the output on every command.
  • refactor PlanIds to actually reflects what it does, a ResponseBuilder.
  • the error output for unsupported relations was nasty, so i simplified it here and here

@github-actions github-actions bot added the feat label Jan 10, 2025
Copy link

codspeed-hq bot commented Jan 10, 2025

CodSpeed Performance Report

Merging #3666 will improve performances by 49.77%

Comparing universalmind303:rust-ray-exec (f3849b5) with main (c932ec9)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main universalmind303:rust-ray-exec Change
test_show[100 Small Files] 24 ms 16 ms +49.77%

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 58.96861% with 183 lines in your changes missing coverage. Please review.

Project coverage is 77.81%. Comparing base (c932ec9) to head (f3849b5).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-connect/src/translation/logical_plan.rs 21.90% 82 Missing ⚠️
src/daft-ray-execution/src/lib.rs 0.00% 48 Missing ⚠️
src/daft-connect/src/execute.rs 82.29% 37 Missing ⚠️
src/daft-connect/src/response_builder.rs 66.66% 7 Missing ⚠️
src/daft-connect/src/lib.rs 61.53% 5 Missing ⚠️
src/daft-connect/src/translation/expr.rs 50.00% 2 Missing ⚠️
src/daft-connect/src/translation/datatype.rs 0.00% 1 Missing ⚠️
...t/src/translation/logical_plan/read/data_source.rs 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3666      +/-   ##
==========================================
- Coverage   78.06%   77.81%   -0.26%     
==========================================
  Files         728      728              
  Lines       89967    89876      -91     
==========================================
- Hits        70236    69938     -298     
- Misses      19731    19938     +207     
Files with missing lines Coverage Δ
src/daft-connect/src/display.rs 54.44% <ø> (-37.37%) ⬇️
src/daft-connect/src/session.rs 100.00% <100.00%> (ø)
...daft-connect/src/translation/logical_plan/range.rs 100.00% <100.00%> (+35.29%) ⬆️
.../daft-connect/src/translation/logical_plan/read.rs 73.68% <100.00%> (ø)
src/daft-connect/src/translation/schema.rs 100.00% <100.00%> (ø)
src/daft-micropartition/src/partitioning.rs 56.94% <ø> (ø)
src/daft-micropartition/src/python.rs 67.21% <ø> (ø)
src/daft-scheduler/src/scheduler.rs 92.98% <ø> (ø)
src/daft-connect/src/translation/datatype.rs 12.66% <0.00%> (ø)
...t/src/translation/logical_plan/read/data_source.rs 81.08% <66.66%> (ø)
... and 6 more

... and 7 files with indirect coverage changes

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine to me. Could you also update the spark connect tests to allow them to run with the Ray runner + add that to the CI test matrix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we isolate the Python dependency to just the Ray runner and the connect_start Python function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the spark connect code pretty much has a hard dependency on python at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason is that there's no non python entrypoint for spark connect, so it doesn't make much sense to try to feature flag when it can only be run from python.

@universalmind303
Copy link
Contributor Author

Could you also update the spark connect tests to allow them to run with the Ray runner + add that to the CI test matrix?

I can follow up with adding this in in another PR. I'm not sure if this'll be straightforward or not yet.

@universalmind303 universalmind303 merged commit 0e03303 into Eventual-Inc:main Jan 14, 2025
40 of 41 checks passed
universalmind303 added a commit that referenced this pull request Jan 15, 2025
@universalmind303 universalmind303 deleted the rust-ray-exec branch January 23, 2025 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants