BUG: Series.combine() fails with ExtensionArray inside of Series #21183

Dr-Irv · 2018-05-23T18:39:32Z

closes BUG: Series.combine() fails with ExtensionArray inside of Series #20825
closes Series.combine() with a scalar only works if function is compatible with (vec, scalar) operation #21248
tests added / passed
- extension/category/test_categorical.py:test_combine
- extension/decimal/test_decimal.py:test_combine
- series/test_combine_concat.py:test_combine_scalar
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
- modified whatsnew.v0.24.0.txt

This is a split of #20889 to just fix the combine issue.

jschendel · 2018-05-23T19:13:29Z

doc/source/whatsnew/v0.23.1.txt

@@ -47,6 +47,7 @@ Bug Fixes
 ~~~~~~~~~

 - tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`)
+- Series.combine() no longer fails with ExtensionArray inside of Series (:issue:`20825`)


Series.combine() --> :meth:`Series.combine`

ExtensionArray --> :class:`~pandas.api.extensions.ExtensionArray`

Series --> :class:`Series`

jschendel · 2018-05-23T19:14:00Z

pandas/core/series.py

+            except TypeError:
+                result = NotImplemented
+            except Exception as e:
+                raise e


I think this last except block is redundant, since non-TypeError exceptions will be raised regardless of if it that block is present or not.

jschendel · 2018-05-23T19:14:32Z

pandas/core/series.py

        if isinstance(other, Series):
            new_index = self.index.union(other.index)
            new_name = ops.get_op_result_name(self, other)
-            new_values = np.empty(len(new_index), dtype=self.dtype)
+            new_values = []
            for i, idx in enumerate(new_index):


It looks like the enumerate is no longer necessary since i isn't used, so you should be able to directly iterate over new_index.

jschendel · 2018-05-23T19:23:01Z

pandas/tests/extension/category/test_categorical.py

@@ -157,3 +160,13 @@ def test_value_counts(self, all_data, dropna):

 class TestCasting(base.BaseCastingTests):
    pass
+
+
+def test_combine():


Should this be moved to be within the TestMethods class? And can you reference the GitHub issue number in a comment?

codecov · 2018-05-23T19:32:07Z

Codecov Report

Merging #21183 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21183      +/-   ##
==========================================
+ Coverage   91.85%   91.85%   +<.01%     
==========================================
  Files         153      153              
  Lines       49549    49573      +24     
==========================================
+ Hits        45512    45537      +25     
+ Misses       4037     4036       -1

Flag	Coverage Δ
#multiple	`90.25% <100%> (ø)`	⬆️
#single	`41.85% <7.14%> (-0.03%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/series.py	`94.17% <100%> (+0.05%)`	⬆️
pandas/core/reshape/melt.py	`97.34% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.8% <0%> (+0.02%)`	⬆️
pandas/io/formats/style.py	`96.12% <0%> (+0.08%)`	⬆️
pandas/core/dtypes/missing.py	`92.52% <0%> (+0.57%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c65c57...4ca28b2. Read the comment docs.

TomAugspurger · 2018-05-23T20:21:28Z

pandas/core/series.py

            new_name = self.name
+
+        if (self_is_ext and self.values.is_sequence_of_dtype(new_values)):
+            new_values = self._values._from_sequence(new_values)


Under what conditions is self.values.is_sequence_of_dtype(new_values) false?

combine takes a func as an argument. The func may not necessarily return objects of the ExtensionDtype. For example, consider a logical operator. So here self.values is the original ExtensionArray, but new_values is the result of applying the func element by element, and those individual results aren't necessarily of the corresponding ExtensionDtype.

Thanks.

How important is it to allow coercion of output type? The previous code certainly considered dtype-preserving functions to be the expected case, since the pre-allocated new_dtype had the same dtype as self. Only if necessary was it coerced.

Without having studied the uses of combine, can we say "extension arrays are not allowed to upcast in combine?". i.e. Series[ExtensionArray].combine(Series[Any]) -> Series[ExtensionArray]. That doesn't sound quiet right...

Anyway, my aversion is to having to perform a full scan of the data just to determine the dtype. That's what types are for :) Can we ask the user to provide an output_dtype argument?

Consider the following (using the implementation in this PR):

In [1]: import pandas as pd ...: from pandas.tests.extension.decimal.array import DecimalArray, make_dat ...: a ...: In [2]: da1= make_data() ...: da2= make_data() ...: da1[:5], da2[:5] ...: Out[2]: ([Decimal('0.9100374922132099531069115982973016798496246337890625'), Decimal('0.559003605365540945371094494475983083248138427734375'), Decimal('0.31951366993722529752375294265220873057842254638671875'), Decimal('0.238154945978455767630066475248895585536956787109375'), Decimal('0.5851317119327676952167394119896925985813140869140625')], [Decimal('0.12525543165059660477567149428068660199642181396484375'), Decimal('0.68213474143447905273518472313298843801021575927734375'), Decimal('0.29982507800002611286771525556105189025402069091796875'), Decimal('0.297029189226840184545608281041495501995086669921875'), Decimal('0.5969224093736389402664599401759915053844451904296875')]) In [3]: s1 = pd.Series(DecimalArray(da1)) ...: s2 = pd.Series(DecimalArray(da2)) ...: pd.DataFrame({'s1':s1, 's2':s2}).head() ...: Out[3]: s1 s2 0 0.91003749221320995310691159829730167984962463... 0.12525543165059660477567149428068660199642181... 1 0.55900360536554094537109449447598308324813842... 0.68213474143447905273518472313298843801021575... 2 0.31951366993722529752375294265220873057842254... 0.29982507800002611286771525556105189025402069... 3 0.23815494597845576763006647524889558553695678... 0.29702918922684018454560828104149550199508666... 4 0.58513171193276769521673941198969259858131408... 0.59692240937363894026645994017599150538444519... In [4]: s1.dtype, s2.dtype Out[4]: (<pandas.tests.extension.decimal.array.DecimalDtype at 0x15bc5ef1048>, <pandas.tests.extension.decimal.array.DecimalDtype at 0x15bc5ef1048>) In [5]: cores = s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2) ...: cores.head() ...: Out[5]: 0 0.12525543165059660477567149428068660199642181... 1 0.55900360536554094537109449447598308324813842... 2 0.29982507800002611286771525556105189025402069... 3 0.23815494597845576763006647524889558553695678... 4 0.58513171193276769521673941198969259858131408... dtype: decimal In [6]: clres = s1.combine(s2, lambda x1, x2: (x1 < x2)) ...: clres.head() ...: Out[6]: 0 False 1 True 2 False 3 True 4 True dtype: bool

Note that with the implementation as in this PR, we get a Series of dtype decimal since the results are all Decimal objects, and we get a boolean Series when the results are logical.

The previous behavior would product object, which is not what you want. Given a sequence s, if you do pd.Series(s), the internals of pandas look at the sequence s to determine the dtype. But those internals won't pick things up correctly for things that are ExtensionDtype. So somehow, we need to know to call ExtensionArray._from_sequence() to get the result of combine into the right type.

The implementation is already doing this element-by-element, so we are doing a full scan of both the left and right arrays. This is an extra scan on the result.

We can add an output_dtype argument to combine, but then we'd be changing the API of combine and would need to decide what the default value of that parameter is.

I would rather not do this here at all, prefering instead to dispatch to the EA itself for a .combine() method, this will be much simpler.

Instead of introducing the is_sequence_of_dtype check, an alternative might be to try: .. except _from_sequence()?

I would prefer not to add combine logic to the extension array

@jorisvandenbossche My most recent commit has this change you suggested and removes is_sequence_of_dtype. It required an extra check to see if it is categorical. I also added a test which revealed a couple of other bugs.

In terms of whether combine should be in the EA class or in series.py, you guys will have to decide on that.

TomAugspurger · 2018-05-23T20:21:50Z

pandas/tests/test_window.py

@@ -41,7 +41,7 @@ def win_types(request):
    return request.param


-@pytest.fixture(params=['kaiser', 'gaussian', 'general_gaussian', 'slepian'])
+@pytest.fixture(params=['kaiser', 'gaussian', 'general_gaussian'])


Merge conflict?

Yes, something went wrong with git. I didn't touch that code, but did a rebase, and I think it pulled that in.

TomAugspurger · 2018-05-23T20:23:16Z

pandas/util/testing.py

@@ -1118,10 +1118,12 @@ def assert_extension_array_equal(left, right):
    right_na = right.isna()
    assert_numpy_array_equal(left_na, right_na)

-    left_valid = left[~left_na].astype(object)
-    right_valid = right[~right_na].astype(object)
+    if len(left_na) > 0 and len(right_na) > 0:


Why the changes here? left_na is an array of booleans, so it'll always be the same length as left.

If left has length zero, then the statement that follows, i.e.

left_valid = left[~left_na].astype(object)

fails. There was some test case that was failing without this change (but I don't remember which one, and it may have been in the ops stuff). I will leave this change out and see if this fix for combine still works.

Given that this works

In [9]: pd.Series()[np.array([], dtype='bool')].astype(object) Out[9]: Series([], dtype: object)

then we should ensure that Series[extension_array] works too. If you find a reproducible example, could you make a new issue?

I took this code out of this PR. It's not needed here. When we get to the ops stuff, I'll have to reintroduce it, and figure out the boundary case that required this particular case.

TomAugspurger · 2018-05-23T20:24:34Z

pandas/util/testing.py

@@ -1224,6 +1226,9 @@ def assert_series_equal(left, right, check_dtype=True,
        left = pd.IntervalIndex(left)
        right = pd.IntervalIndex(right)
        assert_index_equal(left, right, obj='{obj}.index'.format(obj=obj))
+    elif (is_extension_array_dtype(left) and not is_categorical_dtype(left) and


is the not is_categorical_dtype needed?

This was left over from the ops stuff, so I have removed it.

pep8speaks · 2018-05-23T21:50:32Z

Hello @Dr-Irv! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 07, 2018 at 15:45 Hours UTC

jreback · 2018-05-24T11:53:07Z

doc/source/whatsnew/v0.23.1.txt

@@ -47,6 +47,7 @@ Bug Fixes
 ~~~~~~~~~

 - tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`)
+- :meth:`Series.combine()` no longer fails with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)


move to 0.24

I've moved it to 0.24

jreback · 2018-05-24T11:54:36Z

pandas/core/series.py

            new_name = self.name
+
+        if (self_is_ext and self.values.is_sequence_of_dtype(new_values)):
+            new_values = self._values._from_sequence(new_values)


I would rather not do this here at all, prefering instead to dispatch to the EA itself for a .combine() method, this will be much simpler.

Dr-Irv · 2018-05-24T15:01:58Z

@jreback You write:

I would rather not do this here at all, prefering instead to dispatch to the EA itself for a .combine() method, this will be much simpler.

Are you suggesting that there is a default implementation of .combine() in ExtensionArray that can be overridden by implementers of the EA subclass, and that Series.combine() dispatch to that? If so, that sounds reasonable to me, and I can make that change.

jreback · 2018-05-24T22:30:07Z

Are you suggesting that there is a default implementation of .combine() in ExtensionArray that can be overridden by implementers of the EA subclass, and that Series.combine() dispatch to that? If so, that sounds reasonable to me, and I can make that change.

no I would suggest the default impl is NotImplemented, this is getting too messy here with all of these if/thens

TomAugspurger · 2018-05-26T18:54:04Z

pandas/core/series.py

        name = ops.get_op_result_name(self, other)
+
+        if (is_extension_array_dtype(this_vals) or


Stepping back a bit here, why does ExtensionArray get a fallback to [func(a, b) for a, b in zip(...)], when ndarray doesn't? This confuses the expected signature for func: does it that two arrays a and b, or two scalars?

Indeed.
And I also don't really understand how this is related to combine, as _binop does not seem to be used there. I think _binop is for implementing the flex arithmetic ops like .add() etc. But for that I think we should do this together with the actual ops change, and probably end up with the same conclusion (no explicit element-wise fallback but rely on the ExtensionArray implementation to do this)

agree here. I am puzzled why this change at all. If the extension array supports ops this would work as is, if not it would raise a TypeError. So not sure what you are trying to do here.

My mistake. When I separated out this PR from #20889, I should have put back the old _binop. Fixed in my next commit.

TomAugspurger · 2018-05-26T18:56:09Z

pandas/core/series.py

@@ -2227,20 +2245,36 @@ def combine(self, other, func, fill_value=np.nan):
        Series.combine_first : Combine Series values, choosing the calling
            Series's values first
        """
+        self_is_ext = is_extension_array_dtype(self.values)
+        if fill_value is None:
+            if self_is_ext:


I thikn na_value_for_dtype is appropriate here, probably with compat=False.

I made that change.

jorisvandenbossche · 2018-05-28T07:48:45Z

pandas/core/series.py

-                new_values = func(self._values, other)
+            if not self_is_ext:
+                with np.errstate(all='ignore'):
+                    new_values = func(self._values, other)


I also don't really understand (but this is related with the current implementation, not your changes) why we don't do it element-wise here (no loop over the values as is the case if other is Series).

For me, this seems like a bug in the current implementation

@jorisvandenbossche You're correct. I created a new issue #21248 . I will fix that here.

jreback · 2018-05-29T00:51:08Z

doc/source/whatsnew/v0.24.0.txt

@@ -174,6 +174,7 @@ Reshaping
 Other
 ^^^^^

+- :meth:`Series.combine()` no longer fails with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)


be positive. say it works!

jreback · 2018-05-29T00:53:35Z

pandas/core/series.py

        name = ops.get_op_result_name(self, other)
+
+        if (is_extension_array_dtype(this_vals) or


agree here. I am puzzled why this change at all. If the extension array supports ops this would work as is, if not it would raise a TypeError. So not sure what you are trying to do here.

jreback · 2018-05-29T00:54:32Z

pandas/core/series.py

            new_name = self.name
+
+        if self_is_ext and not is_categorical_dtype(self.values):


I am puzzled by all of these element-by-element operations. Is there a reason you don't simply define .combine on an EA array?

@jreback combine is by definition an element-by-element operation. We do exactly the same for other dtypes, see 10 lines above.
This check here only tries to convert the scalars back to an ExtensionArray.

jorisvandenbossche

This looks much cleaner!

jorisvandenbossche · 2018-05-30T19:06:48Z

pandas/core/series.py

            new_name = self.name
+
+        if self_is_ext and not is_categorical_dtype(self.values):


The not is_categorical_dtype(self.values) signals again that we need a way to pass more information on the dtype to _from_sequence. xref #20747, but not for this PR!

sounds good to me

pls don't use self_is_ext simply use is_extension_type and write this out so its readable

if is_categorical_dtype(...): pass elif is_extension_type(...): ....

why is the try/except here? that is very odd

I changed the if test as you suggested.

@jreback wrote:

why is the try/except here? that is very odd

The idea is to first try to coerce to the same type as the extension dtype, and if that doesn't work, just call the regular constructor for the Series.

jorisvandenbossche · 2018-05-30T19:09:20Z

pandas/tests/extension/decimal/test_decimal.py

@@ -138,6 +138,22 @@ def test_value_counts(self, all_data, dropna):

        tm.assert_series_equal(result, expected)

+    def test_combine(self):


can you put this in the base class? (and then if needed skip it for the json tests)

And make use of the fixtures? (so passing all_data as an argument to the test function, instead of calling DecimalArray(make_data()) in the test function

That took a bit of work to do, since I need two vectors that are different. Found discussion at pytest-dev/pytest#2703 about how to do it.

that's clever .. :-)

jorisvandenbossche · 2018-05-31T16:31:25Z

pandas/tests/extension/conftest.py

+    """Return different versions of data for count times"""
+    def gen(count):
+        for _ in range(count):
+            yield NotImplemented


NotImplemented -> NotImplementedError

TomAugspurger · 2018-06-02T21:07:34Z

pandas/core/series.py

@@ -2227,20 +2229,31 @@ def combine(self, other, func, fill_value=np.nan):
        Series.combine_first : Combine Series values, choosing the calling
            Series's values first
        """
+        self_is_ext = is_extension_array_dtype(self.values)
+        if fill_value is None:
+            fill_value = na_value_for_dtype(self.dtype, False)


Use the keyword for False so it’s clearer what’s being specified.

OK, that is done.

TomAugspurger · 2018-06-02T21:11:03Z

Lgtm otherwise.

jreback

mostly minor comments. looks pretty good. this is a nice target changed.

jreback · 2018-06-05T21:23:52Z

pandas/core/series.py

            new_name = self.name
+
+        if self_is_ext and not is_categorical_dtype(self.values):


pls don't use self_is_ext simply use is_extension_type and write this out so its readable

if is_categorical_dtype(...): pass elif is_extension_type(...): ....

why is the try/except here? that is very odd

jreback · 2018-06-05T21:24:21Z

pandas/tests/extension/category/test_categorical.py

@@ -154,6 +163,10 @@ class TestMethods(base.BaseMethodsTests):
    def test_value_counts(self, all_data, dropna):
        pass

+    @pytest.mark.skip(reason="combine add for categorical not supported")
+    def test_combine_add(self, data_repeated):
+        pass


does it raise? if so pls tests that

This is interesting. Since combine by definition in the docs does an element by element operation, if you do something like:

In [1]: import pandas as pd In [2]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"], ...: categories=["one","two","three"], ordered=True) In [3]: s1 = pd.Series(cat1) In [4]: s1 Out[4]: 0 one 1 two 2 three 3 three 4 two 5 one dtype: category Categories (3, object): [one < two < three] In [5]: s1.combine("abc", lambda x,y : x+y)

In pandas 0.23.0, this fails. But the binary operation is defined on an element by element basis, so the new implementation will return:

Out[4]: 0 oneabc 1 twoabc 2 threeabc 3 threeabc 4 twoabc 5 oneabc dtype: object

IMHO, this is correct, because of what combine is supposed to do. So I will test that.

jreback · 2018-06-05T21:25:14Z

pandas/tests/extension/base/methods.py

@@ -103,3 +103,36 @@ def test_factorize_equivalence(self, data_for_grouping, na_sentinel):

        tm.assert_numpy_array_equal(l1, l2)
        self.assert_extension_array_equal(u1, u2)
+
+    def test_combine_le(self, data_repeated):


can you give a 1-liner explaining what this is testing. the name of the test is uninformative.

jreback · 2018-06-05T21:27:07Z

pandas/core/series.py

@@ -2208,6 +2208,8 @@ def combine(self, other, func, fill_value=np.nan):
        func : function
            Function that takes two scalars as inputs and return a scalar
        fill_value : scalar value
+            The default specifies to use the appropriate NaN value for
+            the underlying dtype of the Series


does this need a versionchanged?

There should be no change in behaviour for normal Series I think, as the na_value_for_dtypewill give NaN/NaT (which was the default before). It's only for extension arrays that it might give another value, depending on what the extension array defined its missing value to be.

I agree with @jorisvandenbossche

jreback · 2018-06-05T21:27:34Z

pandas/core/series.py

            new_name = self.name
+


can you put a comment on what is going on here

jreback · 2018-06-05T21:29:29Z

@Dr-Irv also can you make sure that the cases in your OP are covered here (I think so, but pls verify)

Dr-Irv · 2018-06-06T15:14:13Z

@jreback wrote:

@Dr-Irv also can you make sure that the cases in your OP are covered here (I think so, but pls verify)

Here you go:

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
   ...: a

In [3]: da1= make_data()

In [4]: da2 = make_data()

In [5]: s1 = pd.Series(DecimalArray(da1))

In [6]: s2 = pd.Series(DecimalArray(da2))

In [7]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
Out[7]:
0     0.08098492909619514623642544393078424036502838...
1     0.38231358539972493115755014514434151351451873...
2     0.28683621751796128940270591556327417492866516...
3     0.68315691040978210324396968644578009843826293...
4     0.37188279135446988821200875463546253740787506...
5     0.29391070580371847498213355720508843660354614...
6     0.00281662212378319676275850724778138101100921...
7     0.68424426414243522120983698187046684324741363...
8     0.82298917655718306640721948497230187058448791...
9       0.104520052984668154749670065939426422119140625
10    0.36541026835682677287309161329176276922225952...
11    0.15450295421108961591016850434243679046630859375
12    0.28227045216414337058807859648368321359157562...
13    0.07428041682799846334717130957869812846183776...
14    0.76112360264534761888910452398704364895820617...
15    0.31991420667129122357152937183855101466178894...
16    0.43808559346706621440148410329129546880722045...
17    0.81185756018222499097447553140227682888507843...
18    0.18755848298291555309447176114190369844436645...
19    0.45064358633799639353156862853211350739002227...
20    0.01908941180337764276231382609694264829158782...
21    0.05994588216042351369594598509138450026512145...
22    0.45670931996552022180679841767414472997188568...
23    0.21696297680411980035586338999564759433269500...
24    0.77792921984939356061516946283518336713314056...
25    0.10732191741882657343154505724669434130191802...
26    0.26112839619732719498301776184234768152236938...
27    0.78917786400767853116633432364324107766151428...
28    0.11041422426646729793020540455472655594348907...
29    0.48501200782261966182318246865179389715194702...
                            ...
70    0.01573854854098000188855621672701090574264526...
71    0.44412972763803593156950455522746779024600982...
72    0.25386583135624463114510263039846904575824737...
73    0.18052436704533048050791421701433137059211730...
74    0.01768961191491325024571779067628085613250732...
75    0.19783353177328266703227654943475499749183654...
76    0.64275305832443418996291484290850348770618438...
77    0.56556945132078839666434078026213683187961578...
78    0.45692610540309075428666574225644581019878387...
79    0.54208092770195082099604633185663260519504547...
80    0.04475845682326173857745743589475750923156738...
81    0.02586933176340200368770183558808639645576477...
82    0.38495809393040258949980625402531586587429046...
83    0.09473826636927540345567422264139167964458465...
84    0.09800918702868577359055279885069467127323150...
85    0.06037114142095312274705065647140145301818847...
86    0.54181647584734216049895394462510012090206146...
87    0.39820409063224215806542360951425507664680480...
88    0.08573080156333134915769278450170531868934631...
89    0.31841090690861806322686788917053490877151489...
90    0.05991365875060672419039065061951987445354461...
91    0.20527392489879703330046822884469293057918548...
92    0.68151285114935533648861110123107209801673889...
93    0.33294634130492617440921776506002061069011688...
94    0.50598008087709389624109235228388570249080657...
95    0.31112635294260559959411693853326141834259033...
96    0.15173983856659867264227159466827288269996643...
97    0.12529127428765418628131556033622473478317260...
98    0.15472130908820258543556747099501080811023712...
99    0.36042221705147114985123835140257142484188079...
Length: 100, dtype: decimal

In [8]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
   ...:
   ...:    ...:  categories=["one","two","three"], ordered=True)
   ...:    ...: cat2 = pd.Categorical(values=["three","two","one","one","two","
   ...: three"],
   ...:    ...:  categories=["one","two","three"], ordered=True)
   ...:    ...: s1 = pd.Series(cat1)
   ...:    ...: s2 = pd.Series(cat2)
   ...:    ...: s1, s2
   ...:
Out[8]:
(0      one
 1      two
 2    three
 3    three
 4      two
 5      one
 dtype: category
 Categories (3, object): [one < two < three], 0    three
 1      two
 2      one
 3      one
 4      two
 5    three
 dtype: category
 Categories (3, object): [one < two < three])

In [9]:  s1.combine(s2, lambda x1, x2: x1 <= x2)
Out[9]:
0     True
1     True
2    False
3    False
4     True
5     True
dtype: bool

In [10]: s = pd.Series([i*10 for i in range(5)])

In [11]: s.combine(22, lambda x,y: min(x,y))
Out[11]:
0     0
1    10
2    20
3    22
4    22
dtype: int64

In [12]: pd.__version__
Out[12]: '0.24.0.dev0+76.g2a21117c7.dirty'

jreback · 2018-06-06T18:30:44Z

i mean can u verify that u have test coverage for those cases

Dr-Irv · 2018-06-06T19:06:22Z

@jreback wrote:

i mean can u verify that u have test coverage for those cases

The new tests subsume those cases. Should I explicitly include those cases?

jorisvandenbossche · 2018-06-07T09:59:33Z

The new tests subsume those cases. Should I explicitly include those cases?

As long the case is part of the tests you added, that is fine.

jorisvandenbossche · 2018-06-07T10:05:26Z

Regarding combine now working for categorical data while in 0.23.0 it failed (#21183 (comment)), I think that is fine.
At least that is the consequence of catching the TypeError possibly raised by ExtensionArray._from_sequence; which allows the output has another dtype as in the input from combine.

jreback

looks good. a question.

jreback · 2018-06-07T11:35:49Z

pandas/core/series.py

+        if is_categorical_dtype(self.values):
+            pass
+        elif is_extension_array_dtype(self.values):
+            try:


so what kind of op hits the type error here? I find this a bit disconcerting that you need to catch a TypeError? is this a case that the combine op returns a result which is not an extension type (e.g. say its an int or someting), is that the reason? if so pls indicate via a comment.

@jreback Here is an example:

In [1]: import pandas as pd In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat ...: a In [3]: s = pd.Series(DecimalArray(make_data())).head() In [4]: s.combine(0.5, lambda x, y: x < y) Out[4]: 0 False 1 False 2 False 3 False 4 False dtype: bool

Since the function passed to combine is an arbitrary function, it could return a result of any type, which may not be the type that will fit in the EA.

I'll add a comment.

jreback · 2018-06-07T11:36:29Z

doc/source/whatsnew/v0.24.0.txt

@@ -181,6 +181,7 @@ Reshaping
 Other
 ^^^^^

-
+- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)


can you move the ExtensionArray to a sub-section (create a new one) as lots of changes in EA (you can make a new bug fix section).

I just made a bug fix section

jreback · 2018-06-07T11:36:40Z

doc/source/whatsnew/v0.24.0.txt

@@ -181,6 +181,7 @@ Reshaping
 Other
 ^^^^^

-
+- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
+- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)


this one can go in reshaping bug fixes.

or into EA bug fixes

In EA bug fixes

jreback · 2018-06-07T11:38:03Z

doc/source/whatsnew/v0.24.0.txt

@@ -181,6 +181,7 @@ Reshaping
 Other
 ^^^^^

-
+- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
+- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)


or into EA bug fixes

jreback · 2018-06-08T11:35:01Z

thanks @Dr-Irv nice patch!

thanks for the patience. non-trivial things take more time :-D

…das-dev#21183)

BUG: Series.combine() fails with ExtensionArray inside of Series

7469ca9

Dr-Irv mentioned this pull request May 23, 2018

ENH: Support operators for ExtensionArray #20889

Closed

4 tasks

jschendel added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels May 23, 2018

jschendel reviewed May 23, 2018

View reviewed changes

TomAugspurger reviewed May 23, 2018

View reviewed changes

Dr-Irv added 2 commits May 23, 2018 17:16

Merge remote-tracking branch 'upstream/master' into issue20825

bbb6640

Fixes as requested by pandas maintainers

339b23a

Fix PEP8 issues

61a09e7

Dr-Irv mentioned this pull request May 24, 2018

BUG: DecimalArray and JSONArray that are empty return incorrect results for isna() #21189

Closed

jreback requested changes May 24, 2018

View reviewed changes

Dr-Irv added 3 commits May 24, 2018 11:54

Merge remote-tracking branch 'upstream/master' into issue20825

d862e83

get rid of is_sequence_of_dtype

4c925fc

move whatsnew from 0.23.1 to 0.24

27480ac

TomAugspurger reviewed May 26, 2018

View reviewed changes

jorisvandenbossche reviewed May 28, 2018

View reviewed changes

jreback requested changes May 29, 2018

View reviewed changes

Dr-Irv added 2 commits May 29, 2018 10:26

Merge remote-tracking branch 'upstream/master' into issue20825

f96372e

remove binop changes and make it work right with scalar

677fe18

Dr-Irv mentioned this pull request May 30, 2018

ENH: Support ExtensionArray operators via a mixin #21261

Merged

4 tasks

jorisvandenbossche reviewed May 30, 2018

View reviewed changes

Dr-Irv added 2 commits May 30, 2018 17:58

Move test_combine tests to base class

9fceee7

Fix lint issue

1010cb5

jorisvandenbossche reviewed May 31, 2018

View reviewed changes

Dr-Irv added 2 commits May 31, 2018 15:03

Change NotImplemented to NotImplementedError

aceea9f

Merge remote-tracking branch 'upstream/master' into issue20825

79506ac

jorisvandenbossche approved these changes May 31, 2018

View reviewed changes

TomAugspurger reviewed Jun 2, 2018

View reviewed changes

put in keyword arg for na_value_for_dtype

0e4720b

jreback requested changes Jun 5, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jun 5, 2018

Merge remote-tracking branch 'upstream/master' into issue20825

2a21117

Dr-Irv added 2 commits June 6, 2018 11:16

Merge branch 'master' of github.com:Dr-Irv/pandas into issue20825

e08f832

cosmetic changes. fix test_add for category

d3ed2c7

jreback requested changes Jun 7, 2018

View reviewed changes

Doc changes

4ca28b2

jreback approved these changes Jun 8, 2018

View reviewed changes

jreback merged commit 7f6ea67 into pandas-dev:master Jun 8, 2018

Dr-Irv deleted the issue20825 branch June 8, 2018 13:53

daminisatya pushed a commit to daminisatya/pandas that referenced this pull request Jun 8, 2018

BUG: Series.combine() fails with ExtensionArray inside of Series (pan…

1027a1c

…das-dev#21183)

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: Series.combine() fails with ExtensionArray inside of Series (pan…

98008d9

…das-dev#21183)

vladserkoff mentioned this pull request Apr 3, 2019

BUG: support casting from bool array to EA Integer dtype #25265

Merged

3 tasks

		name = ops.get_op_result_name(self, other)

		if (is_extension_array_dtype(this_vals) or

		new_name = self.name

		if self_is_ext and not is_categorical_dtype(self.values):

		@@ -138,6 +138,22 @@ def test_value_counts(self, all_data, dropna):

		tm.assert_series_equal(result, expected)

		def test_combine(self):

BUG: Series.combine() fails with ExtensionArray inside of Series #21183

BUG: Series.combine() fails with ExtensionArray inside of Series #21183

Conversation

Dr-Irv commented May 23, 2018 • edited by jreback Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 23, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger May 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv May 23, 2018 • edited Loading

Choose a reason for hiding this comment

pep8speaks commented May 23, 2018 • edited Loading

Comment last updated on June 07, 2018 at 15:45 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv commented May 24, 2018

jreback commented May 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 2, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 5, 2018

Dr-Irv commented Jun 6, 2018

jreback commented Jun 6, 2018

Dr-Irv commented May 23, 2018 •

edited by jreback

Loading

codecov bot commented May 23, 2018 •

edited

Loading

TomAugspurger May 23, 2018 •

edited

Loading

Dr-Irv May 23, 2018 •

edited

Loading

pep8speaks commented May 23, 2018 •

edited

Loading