Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

catalamarti · 2023-12-05T17:10:59Z

We are building some packages on top of all the dplyr + dbplyr infrastructure (very grateful for that) and we build some classes like 'generated_cohort_set', 'cdm_reference', 'cdm_table', 'codelist' and so.

One problem that we are facing is that there are some functions (group_by, summarise, ...) that drop the classes (see reprex).
I guess that this is on purpose, but wondering why and if it is something that could be considered to be implemented in the future?

here are the packages if you are curious: https://cran.r-project.org/web/packages/CDMConnector/index.html, https://cran.r-project.org/web/packages/DrugUtilisation/index.html, https://cran.r-project.org/web/packages/PatientProfiles/index.html, https://cran.r-project.org/web/packages/IncidencePrevalence/index.html ...)

x <- dplyr::tibble(a = 1)
class(x) <- c("my_class", class(x))
class(x)
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::mutate(b = 1) |> class()
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::group_by(a) |> class()
#> [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

^{Created on 2023-12-05 with reprex v2.0.2}

FYI @edward-burn @ablack3

MSHelm · 2024-01-16T10:20:58Z

@catalamarti I was facing the same issue before and there is some documentation on how to extend tibbles here: https://dplyr.tidyverse.org/reference/dplyr_extending.html
There they also state that for example dplyr::group_by and dplyr::ungroup do drop attributes and classes.

Unfortunately, if you have custom attributes, they are dropped even if they don't depend on the rows or columns, contrary to what is documented on the vignette.

I am currently writing a small post on how I ended up solving it and will comment it here once I am done.

edward-burn · 2024-04-18T17:37:45Z

Just wondering if there is any potential resolution to this @hadley. In the link mentioned above, https://dplyr.tidyverse.org/reference/dplyr_extending.html it also says "These functions are a stop-gap measure" so I'm not sure whether to incorporate these in packages that depend on dplyr, or if the better approach (at least in the short-term) is to create method for every dplyr verb to handle the above situations?

DavisVaughan · 2024-04-19T15:03:48Z

group_by() creates a fundamentally different type of data structure, and we have no way of knowing if it is compatible with your class, so we have to drop it. If you want to supported a grouped data frame structure then you can write an S3 method for group_by(), but it is typically easier to use something like mutate(.by =) as that will preserve your class and let you do the grouped operation, so you don't have to worry about the grouped_df class at all, it never exists in that workflow
summarise() similarly builds off the data from group_data(), which is always a bare tibble or bare data frame. In the same vein as group_by(), we don't know if the summarized table (which has a very different structure that the original one) is still compatible with your class, so we drop it. You'd also need an S3 method for this.

This is documented here https://dplyr.tidyverse.org/reference/dplyr_extending.html and here https://dplyr.tidyverse.org/reference/summarise.html#value

tsibble is an example of a tibble subclass that has support for custom grouped data frames and a custom summarise method, if you want to look at that. They are also a good example of how dplyr can't know if the result of summarise() is valid for your class or not. In some cases the result is still a tsibble, in other cases they return a bare tibble. https://github.com/tidyverts/tsibble

MSHelm · 2024-05-06T10:23:45Z

@catalamarti Took some time to write my article due to a lot of things going on, but if it still helps you, here it is: https://www.bio-ai.org/blog/extending-tibbles/

DavisVaughan closed this as completed Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

catalamarti commented Dec 5, 2023

MSHelm commented Jan 16, 2024

edward-burn commented Apr 18, 2024

DavisVaughan commented Apr 19, 2024 •

edited

Loading

MSHelm commented May 6, 2024

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so #6973

Comments

catalamarti commented Dec 5, 2023

MSHelm commented Jan 16, 2024

edward-burn commented Apr 18, 2024

DavisVaughan commented Apr 19, 2024 • edited Loading

MSHelm commented May 6, 2024

DavisVaughan commented Apr 19, 2024 •

edited

Loading