-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: support casting from bool array to EA Integer dtype #25265
BUG: support casting from bool array to EA Integer dtype #25265
Conversation
Fixes pandas-dev#25211. Cast boolean array to int before casting to EA Integer dtype.
Codecov Report
@@ Coverage Diff @@
## master #25265 +/- ##
==========================================
- Coverage 91.93% 42.13% -49.8%
==========================================
Files 166 166
Lines 52187 52189 +2
==========================================
- Hits 47978 21991 -25987
- Misses 4209 30198 +25989
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25265 +/- ##
==========================================
- Coverage 91.68% 91.67% -0.01%
==========================================
Files 174 174
Lines 50704 50706 +2
==========================================
- Hits 46488 46486 -2
- Misses 4216 4220 +4
Continue to review full report at Codecov.
|
@vladserkoff : Don't worry about the Codecov. Windows failure looks weird though. |
this needs a test, pls merge master |
@jreback, I've added a test. Casting boolean to integer as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you add a whatsnew note, bug fixes, numeric is ok
|
Regarding pandas/pandas/core/arrays/integer.py Lines 190 to 191 in d0acd20
If I change the condition to elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None) then all the test in TestComparisonOps.test_compare_[array|scalar] fail with following errors:
self = <pandas.tests.extension.test_integer.TestComparisonOps object at 0x1c1c3687b8>
s = 0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 NaN
9 10
10 11
11 1...91
91 92
92 93
93 94
94 95
95 96
96 97
97 NaN
98 99
99 100
Length: 100, dtype: UInt64
op = <built-in function gt>
other = 0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 ... 1
89 1
90 1
91 1
92 1
93 1
94 1
95 1
96 1
97 1
98 1
99 1
Length: 100, dtype: int64
op_name = '__gt__', exc = None
def _check_op(self, s, op, other, op_name, exc=NotImplementedError):
if exc is None:
result = op(s, other)
expected = s.combine(other, op)
> self.assert_series_equal(result, expected)
E AssertionError: Attributes are different
E
E Attribute "dtype" are different
E [left]: bool
E [right]: Int64 |
yeah i think passing an explict |
@jreback, not sure what you mean by taking a stab here 🙂 |
@vladserkoff in fixing passing |
@jreback, I've tried to fix this but with no success. The issue lies here and I'm not sure how to handle this: pandas/pandas/tests/arrays/test_integer.py Lines 362 to 365 in 7c24ea8
_compare_other in turn calls pandas/pandas/tests/extension/base/ops.py Lines 29 to 33 in 7c24ea8
while s.combine is a method in which Lines 2644 to 2656 in 7c24ea8
method _from_sequence internally calls integer_array without specifying dtype. When passed boolean array it just fails to convert it to integer, and everything is okay as expected.Cant understand why call integer_array on an array with no intention to actually cast it to integer.
|
@vladserkoff ok, can you create an issue for this and we can deal with it separately. |
see my comment |
Hello @vladserkoff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-05-14 10:40:34 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good
pandas/core/arrays/integer.py
Outdated
@@ -187,6 +187,9 @@ def coerce_to_array(values, dtype, mask=None, copy=False): | |||
raise TypeError("{} cannot be converted to an IntegerDtype".format( | |||
values.dtype)) | |||
|
|||
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a 1-line comment here on acceptable dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): | |
# convert boolean to int if target dtype is integer or unspecified | |
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): |
Ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking more for the reason we need to check here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more about the reason you need to check, and when/how dtype is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this? Couldn't fit it into one line, though.
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): | |
if the array is boolean, we first want to convert it to int to continue execution, | |
if the target dtype is unspecified Int64 would be used later. | |
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): |
pandas/core/arrays/integer.py
Outdated
@@ -187,6 +187,9 @@ def coerce_to_array(values, dtype, mask=None, copy=False): | |||
raise TypeError("{} cannot be converted to an IntegerDtype".format( | |||
values.dtype)) | |||
|
|||
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking more for the reason we need to check here
ping when updated |
@jreback, pls check out my latest commit, the main idea here in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this solution looks good, see my comments though
can you merge master |
pandas/core/arrays/integer.py
Outdated
@@ -187,6 +187,9 @@ def coerce_to_array(values, dtype, mask=None, copy=False): | |||
raise TypeError("{} cannot be converted to an IntegerDtype".format( | |||
values.dtype)) | |||
|
|||
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more about the reason you need to check, and when/how dtype is None
@@ -187,6 +187,9 @@ def coerce_to_array(values, dtype, mask=None, copy=False): | |||
raise TypeError("{} cannot be converted to an IntegerDtype".format( | |||
values.dtype)) | |||
|
|||
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None): | |||
values = np.array(values, dtype=int, copy=copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.asarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's like in other places in the code, e.g.
pandas/pandas/core/arrays/integer.py
Line 179 in f2e2eac
values = np.array(values, copy=copy) |
can you merge master and i'll have a look |
Hey, @jreback. Before I proceed, let me clear one thing a bit. The main difficulty here is handling the case when pandas/pandas/tests/extension/base/ops.py Lines 31 to 33 in f2e2eac
If s is an Integer EA, this call to s.combine will subsequently call integer_array and it should not convert boolean array to ints.What I propose is that we don't convert to int if it is not explicitly stated by specifying dtype, as I first suggested. This way we address this specific issue I was trying to resolve, meaning handling the cases like this one pd.Series([True, False, ...], dtype=pd.Int64()) , AND not breaking any other cases.Does it sound okay to you? Otherwise I don't seem to be qualified enough to fix this in a general way :) |
@vladserkoff ok let's try what you suggest above and see |
also pls merge master |
@jreback, done. Checks are ok. |
thanks @vladserkoff happy to have an issue about |
Cast boolean array to int before casting to EA Integer dtype.
git diff upstream/master -u -- "*.py" | flake8 --diff