-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stats column binary_column has unsupported type binary #3146
base: main
Are you sure you want to change the base?
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
32f7ca2
to
6434a27
Compare
6434a27
to
2ea21f4
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3146 +/- ##
==========================================
- Coverage 72.08% 72.08% -0.01%
==========================================
Files 134 134
Lines 43360 43360
Branches 43360 43360
==========================================
- Hits 31258 31255 -3
+ Misses 10085 10082 -3
- Partials 2017 2023 +6 ☔ View full report in Codecov by Sentry. |
2ea21f4
to
615c3a5
Compare
@omkar-foss are you still working on a fix :)? |
@ion-elgreco I was looking into it but haven't yet committed a fix as such. Looking into the UC issue right now, so you can pick this one up if you'd like :) |
Signed-off-by: Omkar P <45419097+omkar-foss@users.noreply.github.com>
615c3a5
to
152b6de
Compare
I checked the delta lake protocol, and it looks like the stats are processed only for numeric columns types and stored in this structure: https://github.com/delta-io/delta/blob/master/PROTOCOL.md?plain=1#L1942-L1955 Since minValues and maxValues cannot be computed for BINARY columns, one option would be that we keep BINARY columns excluded from column statistics altogether. Another option could be that we support BINARY columns and only compute numRecords and nullCount for them. minValues and maxValues will be unavailable. @ion-elgreco any thoughts on which way we should go? My inclination is towards keeping them excluded. |
@omkar-foss let's exclude it then yeah, maybe add a warm trace when such a column was set |
Yes sure, I'll update this PR accordingly. Thank you :) |
Description
Handle
BINARY
column appropriately when specified indelta.dataSkippingStatsColumns
while writing to a delta table.Related Issue(s)
delta.dataSkippingStatsColumns
not working #2854Related PR(s)
Documentation
N/A