Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: support casting from bool array to EA Integer dtype #25265

Merged
merged 22 commits into from
May 14, 2019
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
0c97445
BUG: support casting from bool array to EA Integer dtype
vladserkoff Feb 11, 2019
b3e1d76
only cast bool to int if target dtype is interger type
vladserkoff Feb 12, 2019
ed61191
Merge branch 'master' into integer-array-from-bool
vladserkoff Mar 20, 2019
c80d4cd
add test to #25265
vladserkoff Mar 20, 2019
9c2877a
fix missing parameter to the test to #25265
vladserkoff Mar 20, 2019
697852d
pass dtype to integer_array to the test to #25265
vladserkoff Mar 20, 2019
c9c4d87
add another test case
vladserkoff Mar 20, 2019
c313cca
add note to watsnew
vladserkoff Mar 20, 2019
d0acd20
Merge branch 'master' into integer-array-from-bool
vladserkoff Mar 20, 2019
7c24ea8
Fix passing result_dtype as string
vladserkoff Mar 20, 2019
c18cf28
Merge branch 'master' into integer-array-from-bool
vladserkoff Mar 25, 2019
54431e4
possible fix for integer_array with dtype=None
vladserkoff Mar 25, 2019
986548f
fix pep8
vladserkoff Mar 25, 2019
c21412c
add test cases with dtype=None
vladserkoff Mar 26, 2019
775114a
fix tests
vladserkoff Mar 26, 2019
cb207aa
fix pep8
vladserkoff Mar 26, 2019
e6655bc
quickfix failing s.combine on extention array
vladserkoff Apr 10, 2019
96fa493
fix for infering dtype in failing s.combine on extention array
vladserkoff Apr 11, 2019
089a72a
fix for infering dtype in failing s.combine on extention array
vladserkoff Apr 11, 2019
f2e2eac
Merge branch 'master' into integer-array-from-bool
vladserkoff Apr 22, 2019
99a6678
Merge branch 'master' into integer-array-from-bool
vladserkoff May 14, 2019
3bce8a1
revert to c18cf28
vladserkoff May 14, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,8 +298,9 @@ Numeric
- Bug in :meth:`Series.divmod` and :meth:`Series.rdivmod` which would raise an (incorrect) ``ValueError`` rather than return a pair of :class:`Series` objects as result (:issue:`25557`)
- Raises a helpful exception when a non-numeric index is sent to :meth:`interpolate` with methods which require numeric index. (:issue:`21662`)
- Bug in :meth:`~pandas.eval` when comparing floats with scalar operators, for example: ``x < -0.1`` (:issue:`25928`)
- Fixed bug where casting all-boolean array to integer extension array failed (:issue:`25211`)
-
-


Conversion
^^^^^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ def coerce_to_array(values, dtype, mask=None, copy=False):
raise TypeError("{} cannot be converted to an IntegerDtype".format(
values.dtype))

elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 1-line comment here on acceptable dtypes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None):
# convert boolean to int if target dtype is integer or unspecified
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None):

Ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking more for the reason we need to check here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more about the reason you need to check, and when/how dtype is None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? Couldn't fit it into one line, though.

Suggested change
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None):
if the array is boolean, we first want to convert it to int to continue execution,
if the target dtype is unspecified Int64 would be used later.
elif is_bool_dtype(values) and (is_integer_dtype(dtype) or dtype is None):

values = np.array(values, dtype=int, copy=copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.asarray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like in other places in the code, e.g.

values = np.array(values, copy=copy)


elif not (is_integer_dtype(values) or is_float_dtype(values)):
raise TypeError("{} cannot be converted to an IntegerDtype".format(
values.dtype))
Expand Down
9 changes: 7 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from pandas.util._decorators import Appender, Substitution, deprecate
from pandas.util._validators import validate_bool_kwarg

from pandas.core.dtypes.cast import infer_dtype_from
from pandas.core.dtypes.common import (
_is_unorderable_exception, ensure_platform_int, is_bool,
is_categorical_dtype, is_datetime64_dtype, is_datetimelike, is_dict_like,
Expand Down Expand Up @@ -2638,15 +2639,19 @@ def combine(self, other, func, fill_value=None):
elif is_extension_array_dtype(self.values):
# The function can return something of any type, so check
# if the type is compatible with the calling EA.
non_na_values = [v for v in new_values if notna(v)]
if non_na_values:
new_dtype, _ = infer_dtype_from(non_na_values, pandas_dtype=True)
else:
new_dtype = self.dtype
try:
new_values = self._values._from_sequence(new_values)
new_values = self._values._from_sequence(new_values, dtype=new_dtype)
except Exception:
# https://github.com/pandas-dev/pandas/issues/22850
# pandas has no control over what 3rd-party ExtensionArrays
# do in _values_from_sequence. We still want ops to work
# though, so we catch any regular Exception.
pass

return self._constructor(new_values, index=new_index, name=new_name)

def combine_first(self, other):
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/arrays/test_integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,21 @@ def test_to_integer_array_float():
assert result.dtype == Int64Dtype()


@pytest.mark.parametrize(
'bool_values, int_values, target_dtype, expected_dtype',
[([False, True], [0, 1], Int64Dtype(), Int64Dtype()),
([False, True], [0, 1], 'Int64', Int64Dtype()),
([False, True, np.nan], [0, 1, np.nan], Int64Dtype(), Int64Dtype()),
([False, True], [0, 1], None, Int64Dtype()),
([False, True, np.nan], [0, 1, np.nan], None, Int64Dtype())])
def test_to_integer_array_bool(bool_values, int_values, target_dtype,
expected_dtype):
result = integer_array(bool_values, dtype=target_dtype)
assert result.dtype == expected_dtype
expected = integer_array(int_values, dtype=target_dtype)
tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize(
'values, to_dtype, result_dtype',
[
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/extension/base/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,13 +163,15 @@ def test_combine_add(self, data_repeated):
expected = pd.Series(
orig_data1._from_sequence([a + b for (a, b) in
zip(list(orig_data1),
list(orig_data2))]))
list(orig_data2))],
dtype=s1.dtype))
self.assert_series_equal(result, expected)

val = s1.iloc[0]
result = s1.combine(val, lambda x1, x2: x1 + x2)
expected = pd.Series(
orig_data1._from_sequence([a + val for a in list(orig_data1)]))
orig_data1._from_sequence([a + val for a in list(orig_data1)],
dtype=s1.dtype))
self.assert_series_equal(result, expected)

def test_combine_first(self, data):
Expand Down