-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#3896: Change Series.values to default to to_numpy(). #3979
FIX-#3896: Change Series.values to default to to_numpy(). #3979
Conversation
Signed-off-by: jeffreykennethli <jkli@ponder.io>
Codecov Report
@@ Coverage Diff @@
## master #3979 +/- ##
===========================================
- Coverage 85.45% 47.20% -38.25%
===========================================
Files 199 187 -12
Lines 16531 15892 -639
===========================================
- Hits 14126 7502 -6624
- Misses 2405 8390 +5985
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR description, you mention wanting to default to pandas only when the Series
is empty - doesn't this default to pandas in all occasions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some digging, and it looks like the error comes from this line, which expects non-empty Dataframes.
If we remove the call to super
the error is fixed, and we are defaulting to to_numpy
like @jeffreykennethli suggested, which seems to be what pandas intends in the future as well.
modin/pandas/series.py
Outdated
def values(df): | ||
return df.values | ||
|
||
return self._default_to_pandas(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self._default_to_pandas(values) | |
return self.to_numpy().flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also removed the flatten() since presumably Series will already be 1-D, and flatten() is not necessary. Let me know if this is not a safe assumption.
Signed-off-by: jeffreykennethli <jkli@ponder.io>
@@ -543,7 +543,7 @@ def values(self): # noqa: RT01, D200 | |||
""" | |||
Return Series as ndarray or ndarray-like depending on the dtype. | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was considering putting a:
UserWarning: 'Series.values` defaulting to to_numpy().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not done for DataFrame.values.
Having this message for both DataFrame.values and Series.values is probably not super useful, except in the case where we're operating on an empty DataFrame/Series and we're already surfacing a
UserWarning: 'DataFrame.to_numpy` for empty DataFrame defaulting to pandas implementation.
in which case it will be confusing why DataFrame/Series using to_numpy and defaulting to pandas when the user is calling .values? (and the "values defaulting to to_numpy()" message would clarify).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick thing about the tests, but once that's addressed LGTM!
Signed-off-by: jeffreykennethli <jkli@ponder.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: jeffreykennethli jkli@ponder.io
What do these changes do?
Modify Series.values to default to Series.to_numpy(), which conveniently handles the empty dataframe/series case from #3896.
In pandas/base.py, notice that if an attr is callable and self.empty, we default to pandas like in Series.to_numpy() which would fix our empty Series case from #3896.
Because to_numpy() and .values are supposed to have identical behavior (see pandas.Series.values and this PR), we can safely make Series.values call Series.to_numpy().
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/developer/architecture.rst
is up-to-date