Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas DataFrame.index property has wrong return type #387

Closed
Dr-Irv opened this issue Sep 17, 2020 · 9 comments
Closed

pandas DataFrame.index property has wrong return type #387

Dr-Irv opened this issue Sep 17, 2020 · 9 comments
Assignees
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version typestub Issue relating to our bundled type stubs

Comments

@Dr-Irv
Copy link

Dr-Irv commented Sep 17, 2020

Environment data

  • Language Server version: 2020.9.5
  • OS and version: Windows 10
  • Python version (& distribution if applicable, e.g. Anaconda): Anaconda python 3.7.5

Expected behaviour

No error

Actual behaviour

df = pd.DataFrame.from_records([[1,2,3], [4,5,6]], columns=["a", "b", "c"])
dfmi = df.set_index(["a", "b"])
mi: pd.MultiIndex = dfmi.index

Above reports

Expression of type "Index[int]" cannot be assigned to declared type "MultiIndex"
  "Index[int]" is incompatible with "MultiIndex"

This seems to fix it:

diff frame.pyi.ORIG frame.pyi
2778c2780
<     def index(self) -> Index[int]: ...
---
>     def index(self) -> MultiIndex: ...
@jakebailey jakebailey added the bug Something isn't working label Sep 17, 2020
@github-actions github-actions bot removed the triage label Sep 17, 2020
@Dr-Irv
Copy link
Author

Dr-Irv commented Sep 17, 2020

Also an issue with pd.Series as well, and this seems to fix it:

diff series.pyi.ORIG series.pyi
11a12
> from pandas.core.indexes.api import MultiIndex
187c188
<     def index(self) -> Index: ...
---
>     def index(self) -> MultiIndex: ...

@gramster
Copy link
Member

Seems like the result should be a union of Index or MultiIndex in this case. You set the index to a multiindex so your case makes sense, but before your call to set_index it would have just been a regular Index, no?

@gramster
Copy link
Member

This one is going to be problematic. If we say it is an index, then we will flag an error when it is assigned to MultiIndex, but the same would happen if we use Union[Index, MultiIndex]. As MultiIndex is a subtype of Index, it seems like the right thing here is to return an Index type, unless this is always a multiindex which I don't believe it is. I don't see a good solution.

@Dr-Irv
Copy link
Author

Dr-Irv commented Sep 19, 2020

I thought I tried a return type of Index and it didn't work. I don't understand why as MultiIndex is a subtype of Index. Maybe that's a different issue with pylance??

@Dr-Irv
Copy link
Author

Dr-Irv commented Sep 19, 2020

So here is code that breaks if you type it as Index

df = pd.DataFrame.from_records([[1, 2, 3], [4, 5, 6]], columns=["a", "b", "c"])
dfmi = df.set_index(["a", "b"])
levels = df.index.levels

In this case levels is only available on MultiIndex type, so there is naturally a typing error reported.

I think you're better off with MultiIndex as a return type in the stub because if you're using this in a method chain, then no error would be reported. I often write code where you know that the sequence of operations on a DataFrame has made the index of that frame a MultiIndex and then you depend on MultiIndex methods. The example above of set_index() is like that.

In my example above, if you use MultiIndex as the declared type in the stub, and then have:

ind : pd.Index = dfmi.index

then pylance doesn't complain.

On the other hand, if you knew your index was another subtype of Index (e.g., RangeIndex, IntervalIndex, etc.), then you'd have an issue if you used a method of one of those types, but I'm guessing you're less likely to use that in a method chain.

@gramster
Copy link
Member

MultiIndex is a subtype of Index: https://github.com/pandas-dev/pandas/blob/c33c3c03c227d53139d6b9a2b6bc1dc1f9400542/pandas/core/indexes/multi.py#L175

I think the correct thing here is to return Index. You can declare your variable as a MultiIndex and then use a # type: ignore comment on that line to silence the type check.

@Dr-Irv
Copy link
Author

Dr-Irv commented Sep 20, 2020

Yes, I get that. The issue is that an operation such as DataFrame.set_index() can change the type of DataFrame.index to MultiIndex .

If you end up releasing with the Index type (which should be changed from the current Index[int]), then the # type: ignore will be the only way to go

@judej judej added typestub Issue relating to our bundled type stubs and removed bug Something isn't working labels Sep 21, 2020
@gramster
Copy link
Member

Alternatively you can use typing.cast, which is more portable.

https://docs.python.org/3/library/typing.html#typing.cast

@gramster gramster added the fixed in next version (main) A fix has been implemented and will appear in an upcoming version label Sep 22, 2020
@jakebailey
Copy link
Member

This issue has been fixed in version 2020.9.6, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/master/CHANGELOG.md#202096-23-september-2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version typestub Issue relating to our bundled type stubs
Projects
None yet
Development

No branches or pull requests

4 participants