feat(connect): Rust ray exec #3666

universalmind303 · 2025-01-10T18:02:34Z

Description

you can now specify the runner you want to use via native spark config

from daft.daft import connect_start
from pyspark.sql import SparkSession

server = connect_start()
url = f"sc://localhost:{server.port()}"


daft_spark = SparkSession.builder.appName("DaftConnectExample").remote(url).getOrCreate()

daft_spark.conf.set("daft.runner", "ray")
# or use native
# daft_spark.conf.set("daft.runner", "native")

df1 = daft_spark.read.parquet("~/datasets/tpcds/sf10/customer.parquet")
df1.limit(10).show()

Note for reviewers

so i had to do a bit of refactoring to get this to work, mostly in how the show string works. The actual ray implementation is isolated within the new daft-ray-execution lib, and it's just a wrapper around our existing python code. The idea with putting it in it's own lib is that it creates a better abstraction and if we want to later port more of that code into rust, it'll be a lot easier.

also a few small drivebys that were bugging me while working on this

change warn!'s to debug!'s as it was cluttering the output on every command.
refactor PlanIds to actually reflects what it does, a ResponseBuilder.
the error output for unsupported relations was nasty, so i simplified it here and here

…-ray-exec

codspeed-hq · 2025-01-10T18:15:08Z

CodSpeed Performance Report

Merging #3666 will improve performances by 49.77%

_{Comparing universalmind303:rust-ray-exec (f3849b5) with main (c932ec9)}

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`universalmind303:rust-ray-exec`	Change
⚡	`test_show[100 Small Files]`	24 ms	16 ms	+49.77%

codecov · 2025-01-10T21:44:20Z

Codecov Report

Attention: Patch coverage is 58.96861% with 183 lines in your changes missing coverage. Please review.

Project coverage is 77.81%. Comparing base (c932ec9) to head (f3849b5).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-connect/src/translation/logical_plan.rs	21.90%	82 Missing ⚠️
src/daft-ray-execution/src/lib.rs	0.00%	48 Missing ⚠️
src/daft-connect/src/execute.rs	82.29%	37 Missing ⚠️
src/daft-connect/src/response_builder.rs	66.66%	7 Missing ⚠️
src/daft-connect/src/lib.rs	61.53%	5 Missing ⚠️
src/daft-connect/src/translation/expr.rs	50.00%	2 Missing ⚠️
src/daft-connect/src/translation/datatype.rs	0.00%	1 Missing ⚠️
...t/src/translation/logical_plan/read/data_source.rs	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3666      +/-   ##
==========================================
- Coverage   78.06%   77.81%   -0.26%     
==========================================
  Files         728      728              
  Lines       89967    89876      -91     
==========================================
- Hits        70236    69938     -298     
- Misses      19731    19938     +207

Files with missing lines	Coverage Δ
src/daft-connect/src/display.rs	`54.44% <ø> (-37.37%)`	⬇️
src/daft-connect/src/session.rs	`100.00% <100.00%> (ø)`
...daft-connect/src/translation/logical_plan/range.rs	`100.00% <100.00%> (+35.29%)`	⬆️
.../daft-connect/src/translation/logical_plan/read.rs	`73.68% <100.00%> (ø)`
src/daft-connect/src/translation/schema.rs	`100.00% <100.00%> (ø)`
src/daft-micropartition/src/partitioning.rs	`56.94% <ø> (ø)`
src/daft-micropartition/src/python.rs	`67.21% <ø> (ø)`
src/daft-scheduler/src/scheduler.rs	`92.98% <ø> (ø)`
src/daft-connect/src/translation/datatype.rs	`12.66% <0.00%> (ø)`
...t/src/translation/logical_plan/read/data_source.rs	`81.08% <66.66%> (ø)`
... and 6 more

... and 7 files with indirect coverage changes

…-ray-exec

kevinzwang

Generally looks fine to me. Could you also update the spark connect tests to allow them to run with the Ray runner + add that to the CI test matrix?

src/daft-connect/src/execute.rs

kevinzwang · 2025-01-14T01:00:43Z

src/daft-connect/src/lib.rs

Could we isolate the Python dependency to just the Ray runner and the connect_start Python function?

no, the spark connect code pretty much has a hard dependency on python at this point.

the reason is that there's no non python entrypoint for spark connect, so it doesn't make much sense to try to feature flag when it can only be run from python.

src/daft-connect/src/translation/logical_plan.rs

universalmind303 · 2025-01-14T15:46:06Z

Could you also update the spark connect tests to allow them to run with the Ray runner + add that to the CI test matrix?

I can follow up with adding this in in another PR. I'm not sure if this'll be straightforward or not yet.

depends on #3666 see here for proper diff universalmind303/Daft@rust-ray-exec...universalmind303:Daft:error-messages

depends on #3666 see here for proper diff universalmind303/Daft@error-messages...connect_distinct

universalmind303 added 9 commits January 8, 2025 11:10

wip

6fc29c9

wip

6ef13bc

wip

445b0d6

Merge branch 'main' of https://github.com/Eventual-Inc/Daft into rust…

5114ef6

…-ray-exec

wip

f2c4074

wip

8535db9

wip

1ade876

Merge branch 'main' of https://github.com/Eventual-Inc/Daft into rust…

3ac05d9

…-ray-exec

ray runner for connect

0a1c028

github-actions bot added the feat label Jan 10, 2025

universalmind303 added 5 commits January 10, 2025 12:46

fix compile feature checks

1784fec

machete

67762b0

fix compile feature checks

8959c2f

fix compile feature checks

4b83883

fix compile feature checks

eb477e8

universalmind303 added 2 commits January 10, 2025 15:51

Merge branch 'main' of https://github.com/Eventual-Inc/Daft into rust…

42ebb47

…-ray-exec

add config var for "daft.runner.ray.address"

f3849b5

universalmind303 requested review from raunakab and kevinzwang January 13, 2025 16:50

This was referenced Jan 13, 2025

chore(connect): better error propagation & handling #3675

Merged

feat(connect): distinct + sort #3677

Merged

kevinzwang reviewed Jan 14, 2025

View reviewed changes

universalmind303 merged commit 0e03303 into Eventual-Inc:main Jan 14, 2025
40 of 41 checks passed

universalmind303 added a commit that referenced this pull request Jan 15, 2025

chore(connect): better error propagation & handling (#3675)

809e411

depends on #3666 see here for proper diff universalmind303/Daft@rust-ray-exec...universalmind303:Daft:error-messages

universalmind303 added a commit that referenced this pull request Jan 15, 2025

feat(connect): distinct + sort (#3677)

34d2036

depends on #3666 see here for proper diff universalmind303/Daft@error-messages...connect_distinct

universalmind303 deleted the rust-ray-exec branch January 23, 2025 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connect): Rust ray exec #3666

feat(connect): Rust ray exec #3666

universalmind303 commented Jan 10, 2025 •

edited

Loading

codspeed-hq bot commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 10, 2025 •

edited

Loading

kevinzwang left a comment

kevinzwang Jan 14, 2025

universalmind303 Jan 14, 2025

universalmind303 Jan 14, 2025

universalmind303 commented Jan 14, 2025

feat(connect): Rust ray exec #3666

feat(connect): Rust ray exec #3666

Conversation

universalmind303 commented Jan 10, 2025 • edited Loading

Description

Note for reviewers

codspeed-hq bot commented Jan 10, 2025 • edited Loading

CodSpeed Performance Report

Merging #3666 will improve performances by 49.77%

Summary

Benchmarks breakdown

codecov bot commented Jan 10, 2025 • edited Loading

Codecov Report

kevinzwang left a comment

Choose a reason for hiding this comment

kevinzwang Jan 14, 2025

Choose a reason for hiding this comment

universalmind303 Jan 14, 2025

Choose a reason for hiding this comment

universalmind303 Jan 14, 2025

Choose a reason for hiding this comment

universalmind303 commented Jan 14, 2025

universalmind303 commented Jan 10, 2025 •

edited

Loading

codspeed-hq bot commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 10, 2025 •

edited

Loading