Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: regression in error raised by idxmin/idxmax for extension dtypes #32749

Closed
jorisvandenbossche opened this issue Mar 16, 2020 · 11 comments · Fixed by #37924
Closed

BUG: regression in error raised by idxmin/idxmax for extension dtypes #32749

jorisvandenbossche opened this issue Mar 16, 2020 · 11 comments · Fixed by #37924
Labels
Bug Error Reporting Incorrect or improved errors from pandas ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 16, 2020

With pandas 1.0.1:

In [7]: from pandas.tests.extension.decimal import DecimalArray, make_data 

In [8]: s = pd.Series(DecimalArray(make_data()[:5]))  

In [9]: s 
Out[9]: 
0    Decimal: 0.25866656130631138221787068687262944...
1    Decimal: 0.11511905452780368808163302674074657...
2    Decimal: 0.15301679241167220890673661415348760...
3    Decimal: 0.41125672759464626526693109553889371...
4    Decimal: 0.46391316685725048074573351186700165...
dtype: decimal

In [10]: s.idxmin()   
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-5060aab05d12> in <module>
----> 1 s.idxmin()

~/miniconda3/envs/pandas10/lib/python3.8/site-packages/pandas/core/series.py in idxmin(self, axis, skipna, *args, **kwargs)
   2037         """
   2038         skipna = nv.validate_argmin_with_skipna(skipna, args, kwargs)
-> 2039         i = nanops.nanargmin(com.values_from_object(self), skipna=skipna)
   2040         if i == -1:
   2041             return np.nan

~/miniconda3/envs/pandas10/lib/python3.8/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
     62             if any(self.check(obj) for obj in obj_iter):
     63                 f_name = f.__name__.replace("nan", "")
---> 64                 raise TypeError(
     65                     f"reduction operation '{f_name}' not allowed for this dtype"
     66                 )

TypeError: reduction operation 'argmin' not allowed for this dtype

Now with master you get:

In [4]: s.idxmin() 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-7e62303a9d43> in <module>
----> 1 s.idxmin()

~/scipy/pandas/pandas/core/series.py in idxmin(self, axis, skipna, *args, **kwargs)
   1988         """
   1989         skipna = nv.validate_argmin_with_skipna(skipna, args, kwargs)
-> 1990         i = nanops.nanargmin(self._values, skipna=skipna)
   1991         if i == -1:
   1992             return np.nan

~/scipy/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     69             try:
     70                 with np.errstate(invalid="ignore"):
---> 71                     return f(*args, **kwargs)
     72             except ValueError as e:
     73                 # we want to transform an object array

~/scipy/pandas/pandas/core/nanops.py in nanargmin(values, axis, skipna, mask)
    943     """
    944     values, mask, dtype, _, _ = _get_values(
--> 945         values, True, fill_value_typ="+inf", mask=mask
    946     )
    947     result = values.argmin(axis)

~/scipy/pandas/pandas/core/nanops.py in _get_values(values, skipna, fill_value, fill_value_typ, mask)
    311         # promote if needed
    312         else:
--> 313             values, _ = maybe_upcast_putmask(values, mask, fill_value)
    314 
    315     # return a platform independent precision dtype

~/scipy/pandas/pandas/core/dtypes/cast.py in maybe_upcast_putmask(result, mask, other)
    278     """
    279     if not isinstance(result, np.ndarray):
--> 280         raise ValueError("The result input must be a ndarray.")
    281     if not is_scalar(other):
    282         # We _could_ support non-scalar other, but until we have a compelling

ValueError: The result input must be a ndarray.

@jbrockmendel this is from no longer using values_from_object, but passing _values to the nanops function, while this expects to always receive an ndarray

@jbrockmendel
Copy link
Member

sounds like a 2-liner to check for non-ndarray and raise the old exception?

@jorisvandenbossche
Copy link
Member Author

Maybe not necessarily, you might have EAs that can be converted to numerical ndarrays?

@jbrockmendel
Copy link
Member

you might have EAs that can be converted to numerical ndarrays?

Should values_for_argsort work for that?

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 17, 2020

I suppose so? I suppose if argsort works on the values, also argmin/argmax should do the expected thing?

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas Regression Functionality that used to work in a prior pandas version ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 25, 2020
@mroeschke mroeschke added the Bug label May 11, 2020
@TomAugspurger
Copy link
Contributor

I'd like to fix this by having Series.idxmin() eventually call .array._reduce("argmin"). However, there's a potential issue with how we've defined argmin on our arrays for missing values.

In [52]: pd.array([1, 2, None], dtype="float64")._reduce("argmin", skipna=False)  # PandasArray
Out[52]: <NA>

In [53]: pd.array([1, 2, None], dtype="Int64")._reduce("argmin", skipna=False)
Out[53]: -1

IMO, _reduce should return the correct result, which in this case is I think NA (under the rule that skipna propagates NA).

@jorisvandenbossche
Copy link
Member Author

However, there's a potential issue with how we've defined argmin on our arrays for missing values.

@TomAugspurger see also #33941, #33942

@simonjayhawkins
Copy link
Member

moved off 1.1.1 milestone (scheduled for this week) as no PRs to fix in the pipeline

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Sep 7, 2020
@simonjayhawkins
Copy link
Member

first bad commit: [8c38283] CLN: avoid values_from_object in Series (#32426)

@simonjayhawkins
Copy link
Member

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

@jreback jreback modified the milestones: 1.1.5, Contributions Welcome Nov 25, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.3 Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas ExtensionArray Extending pandas with custom dtypes or arrays. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants