-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data]: Categorizer fails with non uniform distributions #50792
Comments
Traceback:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened + What you expected to happen
When
using ray.data.preprocessors.Categorizer
on a non uniformly distributed column, we get a failure onfit()
:pyarrow.lib.ArrowTypeError: struct fields don't match or are in the wrong order
.If the column is distributed uniformly, the Categorizer works fine. If we use a sample of the dataset to fit, the Categorizer works fine.
Versions / Dependencies
python==3.12.8
ray==2.41.0
pandas==2.2.3
numpy==1.26.4
pyarrow==18.1.0
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: