Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: SeriesGroupBy.agg with dict argument #50684

Closed
rhshadrach opened this issue Jan 11, 2023 · 6 comments · Fixed by #52268
Closed

DEPR: SeriesGroupBy.agg with dict argument #50684

rhshadrach opened this issue Jan 11, 2023 · 6 comments · Fixed by #52268
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Groupby

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Jan 11, 2023

Edit: Instead of implementing the as_index=True case mentioned below, the as_index=False case should be deprecated, and the docs to SeriesGroupBy.agg should be updated. See the discussion below for details.

According to #15931 (comment), this was deprecated in 0.20.0 and raises on main now:

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
gb = df.groupby("a", as_index=True)["b"]
result = gb.agg({"c": "sum"})
print(result)
# pandas.errors.SpecificationError: nested renamer is not supported

However, the docs say SeriesGroupBy.agg supports dict arguments. Also, when as_index=False it works

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
gb = df.groupby("a", as_index=False)["b"]
result = gb.agg({"c": "sum"})
print(result)
#    a  c
# 0  1  7
# 1  2  5

This is because when as_index=False, using __getitem__ with "b" still returns a DataFrameGroupBy.

Assuming the implementation in this case isn't difficult, I'm thinking the easiest way forward is to support dictionaries in SeriesGroupBy.agg.

cc @jreback, @jorisvandenbossche

@rhshadrach rhshadrach added Groupby Apply Apply, Aggregate, Transform, Map API - Consistency Internal Consistency of API/Behavior labels Jan 11, 2023
@rhshadrach
Copy link
Member Author

Also #9052

@mroeschke
Copy link
Member

Hmm doesn't this mean dictionary inputs have different interpretations when __getitem__ is called?

I thought normally, a dictionary input e.g. {"b": sum} would mean "for axis label b apply the sum aggregation" while above example appears to make {"b": sum} mean "apply sum to the selected Series and name the result b"

@rhshadrach
Copy link
Member Author

rhshadrach commented Jan 12, 2023

Yes - DataFrameGroupBy.agg and SeriesGroupBy.agg will have different semantics. I'm not in favor for it also because users can just rename after the groupby. However, this partially works (with as_index=False) and this inconsistency is holding up further work on #46944.

I think we can resolve the inconsistency by enabling it for as_index=True in 2.0 since that isn't a breaking change, then deprecate passing a dict to SeriesGroupBy properly in 2.0. The other two options I see are less desirable:

  • Deprecate without enabling for as_index=True; this would make work on CLN/DOC: _selected_obj vs _obj_with_exclusions #46944 harder, to the point where I would probably shelf it until 3.0. It also leaves us with an inconsistent API for 2.x.
  • Raise when as_index=False in 2.0. This would be a backwards incompatible change without deprecation that I'd rather avoid.

Thoughts?

@mroeschke
Copy link
Member

I see. I would be okay with enabling it for 2.0 and hopefully deprecating passing dicts in agg when it's a SeriesGroupby in the same release so users know to transition.

@rhshadrach
Copy link
Member Author

Makes sense. I'll add the deprecation to the linked PR.

@rhshadrach
Copy link
Member Author

rhshadrach commented Jan 17, 2023

@mroeschke - I overlooked a better solution. We can maintain the current behavior and move forward with progress toward #46944. This is the approach I took in #50744. I think I overlooked this because it's odd to do if self.as_index: raise ValueError when the code would otherwise work fine.

I've reworked this issue to be about the remaining deprecation.

@rhshadrach rhshadrach changed the title API: groupby(...).agg with dict argument DEPR: SeriesGroupBy.agg with dict argument Jan 17, 2023
@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas and removed API - Consistency Internal Consistency of API/Behavior Bug labels Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Deprecate Functionality to remove in pandas Groupby
Projects
None yet
2 participants