Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support built-in names for Pandas GroupBy-Agg operations in Thicket's GroupBy #238

Open
ilumsden opened this issue Feb 20, 2025 · 0 comments
Labels
area-thicket Issues and PRs involving Thicket's core Thicket datastructure and associated classes priority-normal Normal priority issues and PRs type-feature Requests for new features or PRs which implement new features

Comments

@ilumsden
Copy link
Collaborator

In docstrings and docs, we refer users to pandas for documentation on aggregation functions. Despite this, we do not currently support an important way of specifying aggregation functions: string function names.

For example, currently, to use a "mean" operation in aggregation, we require users to do the following:

gb = thicket_obj.groupby(...)
gb.agg(numpy.mean)

In comparison, it is much more common to do the following for a pandas Groupby-Aggregate:

df.groupby(...).agg("mean")

We should also support string inputs to our GroupBy.agg method to be consistent with pandas.

Beyond consistency, there are 2 other reasons to do this:

  1. The logic behind a pandas mean (or similar operations) and a NumPy mean (or equivalent operations) are not the same. Current versions of pandas work around this by internally detecting when you pass NumPy functions in and replacing them with pandas' equivalents.
  2. Future versions of pandas (i.e., 3.0) will no longer replace NumPy functions with pandas' equivalents. That means there will be implications (e.g., performance) for using "mean" over numpy.mean. The behavior of the two will be different, and the NumPy functions may not produce correct output.
@ilumsden ilumsden added area-thicket Issues and PRs involving Thicket's core Thicket datastructure and associated classes priority-normal Normal priority issues and PRs type-feature Requests for new features or PRs which implement new features labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-thicket Issues and PRs involving Thicket's core Thicket datastructure and associated classes priority-normal Normal priority issues and PRs type-feature Requests for new features or PRs which implement new features
Projects
None yet
Development

No branches or pull requests

1 participant