-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: groupby Series selection with as_index=False #50744
Conversation
pandas/tests/groupby/test_groupby.py
Outdated
# GH#?? - Columns after selection shouldn't retain names | ||
expected.columns.names = [None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behavior; currently .groupby(..., as_index=False)["B"]
retains the columns' name. We could still preserve the index name here, but it'd be a bit of a hack to add on. By not preserving the name, we make df["A"].groupby(..., as_index=False)
behave the same as df.groupby(..., as_index=False)["A"]
, which I think is desirable.
But would appreciate others thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I suppose in theory it would be nice if both operations kept the column name since the result always contains portion of the old column object?
But in this case I suppose it's minor and there's an argument that a groupby result produces a "new column"
results.append(new_res) | ||
is_groupby = isinstance(obj, (DataFrameGroupBy, SeriesGroupBy)) | ||
context_manager: ContextManager | ||
if is_groupby: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda unfortunate that groupby knowledge is leaking in pandas/core/apply.py
, guessing there's no way to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I don't have thoughts on how to improve this just yet, but after removing obj_with_exclusions (we're almost there I think), I plan to try to cleanup the paths through groupby.agg, etc. Without a refactor, we need this because other objects don't have as_index.
Thanks @rhshadrach |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Currently
.groupby(..., as_index=False)['column']
gives back a DataFrameGroupBy This is becauseSeriesGroupBy
doesn't implementas_index=False
. This gives rise to having aDataFrameGroupBy
object withself.obj
being a Series. It also might be a bit surprising to users that this gives them back a DataFrameGroupBy (though this is probably rarely noticed).Eliminating this case simplifies the states groupby code must consider. It also is a step toward #46944.
There should be no changes in behavior from a user perspective except getting back a SeriesGroupBy and the one noted in the comments below.
cc @jbrockmendel