Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement min, argmin, max, argmax on ExtensionArrays? #24382

Closed
TomAugspurger opened this issue Dec 21, 2018 · 4 comments · Fixed by #27801
Closed

Implement min, argmin, max, argmax on ExtensionArrays? #24382

TomAugspurger opened this issue Dec 21, 2018 · 4 comments · Fixed by #27801
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@TomAugspurger
Copy link
Contributor

We already have ExtensionArray.argsort as part of the EA interface, so we should be able to to do min, argmin, max, and argmax as a composition of argsort and __getitem__. Do we want to do this?

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 21, 2018
@jreback
Copy link
Contributor

jreback commented Dec 21, 2018

i think this is a nice idea; shows a nice benefit of shared EA infrastructure

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 21, 2018 via email

@makbigc
Copy link
Contributor

makbigc commented May 9, 2019

When nan is invloved, argsort behaves in two ways.

The element of index 1 is always nan in the following.

  1. Putting the nan in the beginning
In [27]: arr = integer_array([1, np.nan, 2])                                    

In [28]: arr                                                                    
Out[28]: 
<IntegerArray>
[1, NaN, 2]
Length: 3, dtype: Int64

In [29]: arr.argsort()                                                          
Out[29]: array([1, 0, 2])
  1. Putting the nan in the end
In [20]: arr1 = integer_array([1, np.nan, 0], dtype='uint8')                    

In [21]: arr1                                                                   
Out[21]: 
<IntegerArray>
[1, NaN, 0]
Length: 3, dtype: UInt8

In [22]: arr1.argsort()                                                         
Out[22]: array([2, 0, 1])

In [23]: idx = pd.Index([1, np.nan, 2])                                         

In [24]: arr = idx.array                                                        

In [25]: arr                                                                    
Out[25]: 
<PandasArray>
[1.0, nan, 2.0]
Length: 3, dtype: float64

In [26]: arr.argsort()                                                          
Out[26]: array([0, 2, 1])

Should we standardize where nan to be placed? _values_for_argsort returns different value for nan in different EA.

@jorisvandenbossche
Copy link
Member

@makbigc there indeed still some inconsistencies there ... I opened #21801 for that some time ago. Can you move your comment there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants