Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

Closed
catalamarti opened this issue Dec 5, 2023 · 4 comments

Comments

@catalamarti
Copy link

We are building some packages on top of all the dplyr + dbplyr infrastructure (very grateful for that) and we build some classes like 'generated_cohort_set', 'cdm_reference', 'cdm_table', 'codelist' and so.

One problem that we are facing is that there are some functions (group_by, summarise, ...) that drop the classes (see reprex).
I guess that this is on purpose, but wondering why and if it is something that could be considered to be implemented in the future?

here are the packages if you are curious: https://cran.r-project.org/web/packages/CDMConnector/index.html, https://cran.r-project.org/web/packages/DrugUtilisation/index.html, https://cran.r-project.org/web/packages/PatientProfiles/index.html, https://cran.r-project.org/web/packages/IncidencePrevalence/index.html ...)

x <- dplyr::tibble(a = 1)
class(x) <- c("my_class", class(x))
class(x)
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::mutate(b = 1) |> class()
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::group_by(a) |> class()
#> [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

Created on 2023-12-05 with reprex v2.0.2

FYI @edward-burn @ablack3

@MSHelm
Copy link

MSHelm commented Jan 16, 2024

@catalamarti I was facing the same issue before and there is some documentation on how to extend tibbles here: https://dplyr.tidyverse.org/reference/dplyr_extending.html
There they also state that for example dplyr::group_by and dplyr::ungroup do drop attributes and classes.

Unfortunately, if you have custom attributes, they are dropped even if they don't depend on the rows or columns, contrary to what is documented on the vignette.

I am currently writing a small post on how I ended up solving it and will comment it here once I am done.

@edward-burn
Copy link

Just wondering if there is any potential resolution to this @hadley. In the link mentioned above, https://dplyr.tidyverse.org/reference/dplyr_extending.html it also says "These functions are a stop-gap measure" so I'm not sure whether to incorporate these in packages that depend on dplyr, or if the better approach (at least in the short-term) is to create method for every dplyr verb to handle the above situations?

@DavisVaughan
Copy link
Member

DavisVaughan commented Apr 19, 2024

  • group_by() creates a fundamentally different type of data structure, and we have no way of knowing if it is compatible with your class, so we have to drop it. If you want to supported a grouped data frame structure then you can write an S3 method for group_by(), but it is typically easier to use something like mutate(.by =) as that will preserve your class and let you do the grouped operation, so you don't have to worry about the grouped_df class at all, it never exists in that workflow
  • summarise() similarly builds off the data from group_data(), which is always a bare tibble or bare data frame. In the same vein as group_by(), we don't know if the summarized table (which has a very different structure that the original one) is still compatible with your class, so we drop it. You'd also need an S3 method for this.

This is documented here https://dplyr.tidyverse.org/reference/dplyr_extending.html and here https://dplyr.tidyverse.org/reference/summarise.html#value

tsibble is an example of a tibble subclass that has support for custom grouped data frames and a custom summarise method, if you want to look at that. They are also a good example of how dplyr can't know if the result of summarise() is valid for your class or not. In some cases the result is still a tsibble, in other cases they return a bare tibble. https://github.com/tidyverts/tsibble

@MSHelm
Copy link

MSHelm commented May 6, 2024

@catalamarti Took some time to write my article due to a lot of things going on, but if it still helps you, here it is: https://www.bio-ai.org/blog/extending-tibbles/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants