-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Partial support for across()
in mutate()
and summarise()
#296
Conversation
Thanks, this looks good. I'll play with it a bit before merging. IIUC, the main argument for not adding it to the |
1eb360c
to
896315d
Compare
896315d
to
137d1fa
Compare
5eb3aef
to
9743558
Compare
Thanks. I added tests, and have an extension ready that I'll open a new PR for. |
across()
in mutate()
and summarise()
across()
in mutate()
and summarise()
Unfortunately I don't think we can expect downstream dependencies to adhere to this convention at this point? Interesting point about requiring |
Joint work with @lionel-
This PR adds partial support for inlining of
across()
inmutate()
andsummarise()
. If we can successfully inline theacross()
call, then we don't have to fall back to a dplyr implementation.It currently has the following restrictions:
You can select multiple
cols
, i.e.across(c(a, b),)
But you can only supply 1 function, i.e.
across(cols, fn)
across(cols, list(fn1 = fn1, fn2 = fn2))
across(cols, list(fn = fn))
Inlined lambdas can be expanded like
across(cols, \(x) mean(x, na.rm = TRUE))
fn <- \(x) mean(x, na.rm = TRUE)
followed byacross(cols, fn)
Using
.names
is supported, but not particularly useful right now with just 1 functionUsing
.unpack
forces a fallback (like dplyr)Using the
...
forces a fallback (like dplyr)In the future on the dplyr side we'd like to fully deprecate the usage of
...
inacross()
, which currently accepts extra arguments for the.fns
and is very confusing. After it is deprecated fully we can instead allow it to accept multiple functions, rather than using a list, so you can do:across(cols, mean, median, \(x) mean(x, na.rm = TRUE))
. This is when duckplyr could more easily start accepting >1 functions.In the details section below, there are a number of examples about when we do vs don't expand.
Some rationale for why we got here:
Why can't we inline across() in dplyr's generics directly?
Method dispatch makes this hard, S3 signatures currently look like this:
We could try something like:
But how do we know to not apply
across_expand()
to.method_specific
? It gets captured in the...
of the generic when the user calls something like:We would end up transforming all incoming arguments to quosures, even things like
.keep
that are not meant to be evaluated witheval_tidy()
.So why not export
dplyr::partial_across_expand()
for methods to call manually?We could try and make the following approach work:
This gets around the
...
problem from above, because we've matched the dots to the specific S3 method at this point. But there are a number of reasons we don't necessarily want to do this right now:across()
support, so it is also possible our "partial" support might not capture all the use cases they already support - in which case they'd still need their own version.dots
would likely be"dplyr_quosures"
which are currently internal dplyr details that we are free to change.Since we don't have enough experience with this new variant yet, we currently want duckplyr to be the only consumer for now. We can revisit exporting it from dplyr once we get a little experience with it and do a cross comparison between duckplyr, dbplyr, and dtplyr to see if we can find some common utility that everyone can happily use.