Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: series.eq(other) does not equal series == other when the series contain pd.NA #36941

Open
2 of 3 tasks
connesy opened this issue Oct 7, 2020 · 6 comments
Open
2 of 3 tasks
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@connesy
Copy link
Contributor

connesy commented Oct 7, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Using pd.NA:
>>> import pandas as pd
>>> df = pd.DataFrame(data=[[1, 1], [2, 2], [3, 3], [pd.NA, pd.NA]], columns=['a','b'])
>>> df.a == df.b
0     True
1     True
2     True
3    False
dtype: bool
>>> df.a.eq(df.b)
0    False
1    False
2    False
3    False
dtype: bool

# Using np.nan instead of pd.NA:
>>> import numpy as np
>>> df_np = pd.DataFrame(data=[[1, 1], [2, 2], [3, 3], [np.nan, np.nan]], columns=['a','b'])
>>> df_np.a.eq(df_np.b)
0     True
1     True
2     True
3    False
dtype: bool

Problem description

From the documentation of Series.eq: "Equivalent to series == other", I would expect df.a == df.b to yield the same result as df.a.eq(df.b), but it doesn't when the series contain pd.NA.

Expected Output

df.a.eq(df.b) gives the same result as df.a == df.b even when the series contain pd.NA.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-48-generic
Version : #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1.post20200802
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.1
lxml.etree : 4.5.1
html5lib : None
pymysql : 0.10.0
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.18
tables : None
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
numba : 0.48.0

@connesy connesy added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 7, 2020
@connesy connesy changed the title BUG: BUG: series.eq(other) does not equal series == other when the series contain pd.NA Oct 7, 2020
@TomAugspurger
Copy link
Contributor

You're using object dtype. Try with the nullable integer dtype: https://pandas.pydata.org/docs/user_guide/integer_na.html

@connesy
Copy link
Contributor Author

connesy commented Oct 7, 2020

That doesn't change the fact that df.a.eq(df.b) doesn't equal df.a == df.b, as the documentation says it should?

@dsaxton
Copy link
Member

dsaxton commented Oct 7, 2020

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.

The documentation doesn't say that they're the same, and this is the behavior you'd get using np.nan as well. Maybe the docs could be updated to reflect that?

@dsaxton dsaxton added Docs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 7, 2020
@dsaxton dsaxton changed the title BUG: series.eq(other) does not equal series == other when the series contain pd.NA DOC: series.eq(other) does not equal series == other when the series contain pd.NA Oct 7, 2020
@jorisvandenbossche
Copy link
Member

Ideally you don't use object dtype, but I would say that it is still a bug, though.

I suppose this comes from the following numpy behaviour:

In [32]: df.a.values                                                                                                                                                                                               
Out[32]: array([1, 2, 3, <NA>], dtype=object)

In [33]: df.a.values == df.b.values                                                                                                                                                                                
/home/joris/miniconda3/envs/dev/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[33]: False

We can't control numpy, but we should still ensure that this incorrect behaviour is not broadcasted to the resulting Series.

@dsaxton
Copy link
Member

dsaxton commented Oct 7, 2020

Silly me, I was looking at the docstring for equals. Ignore everything I just said.

@dsaxton dsaxton added Bug and removed Docs labels Oct 7, 2020
@dsaxton dsaxton changed the title DOC: series.eq(other) does not equal series == other when the series contain pd.NA BUG: series.eq(other) does not equal series == other when the series contain pd.NA Oct 7, 2020
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 12, 2020
@lastcodestanding
Copy link

The documentation has a note "Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN)." So, I think this is intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

6 participants