Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(iceberg): Adds support for read_iceberg with metadata_location to Daft-SQL #3701

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

rchowell
Copy link
Contributor

This PR adds support for read_iceberg given a metadata location, and it adds this to Daft-SQL.

Note

This was tested manually because PyIceberg StaticTable has a bug where the filepaths in the iceberg metadata are resolved relative to the caller's current working directory rather than relative to the metadata file. This will need to be fixed in PyIceberg, but I wanted to close this out for now since it still works if the paths are right.

Future Additions

Testing

read_iceberg

Copy link

codspeed-hq bot commented Jan 17, 2025

CodSpeed Performance Report

Merging #3701 will degrade performances by 32.65%

Comparing rchowell/read_iceberg (c9ac983) with main (a7dd7ef)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main rchowell/read_iceberg Change
test_show[100 Small Files] 16 ms 23.7 ms -32.65%

@rchowell rchowell enabled auto-merge (squash) January 17, 2025 20:15
@rchowell rchowell merged commit ecc2970 into main Jan 17, 2025
40 of 41 checks passed
@rchowell rchowell deleted the rchowell/read_iceberg branch January 17, 2025 20:36
Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 5.20833% with 91 lines in your changes missing coverage. Please review.

Project coverage is 77.58%. Comparing base (6b302af) to head (c9ac983).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-sql/src/table_provider/read_iceberg.rs 0.00% 34 Missing ⚠️
src/daft-scan/src/builder.rs 0.00% 29 Missing ⚠️
src/daft-sql/src/table_provider/read_deltalake.rs 0.00% 24 Missing ⚠️
src/daft-scan/src/storage_config.rs 0.00% 3 Missing ⚠️
daft/io/_iceberg.py 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3701      +/-   ##
==========================================
- Coverage   77.94%   77.58%   -0.37%     
==========================================
  Files         725      727       +2     
  Lines       90952    92081    +1129     
==========================================
+ Hits        70895    71442     +547     
- Misses      20057    20639     +582     
Files with missing lines Coverage Δ
src/daft-sql/src/functions.rs 85.41% <ø> (ø)
src/daft-sql/src/table_provider/mod.rs 96.77% <100.00%> (+42.22%) ⬆️
daft/io/_iceberg.py 87.87% <75.00%> (-2.13%) ⬇️
src/daft-scan/src/storage_config.rs 84.44% <0.00%> (-6.04%) ⬇️
src/daft-sql/src/table_provider/read_deltalake.rs 0.00% <0.00%> (ø)
src/daft-scan/src/builder.rs 33.95% <0.00%> (-3.38%) ⬇️
src/daft-sql/src/table_provider/read_iceberg.rs 0.00% <0.00%> (ø)

... and 25 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants