-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more splatting and appending of arguments in by #1620
Conversation
I added the method for Unless my interpretation is mistaken, which is likely is. |
A small comment (I have not analyzed the code yet): we want |
I think this should be done by |
You need |
Just to make sure I understand you correctly. Do you mean what I have recommended - to have |
Okay I added a method for I changed the |
This needs more work. I'm going to see if I can get this to work while also allowing (non-vector) keyword arguments. |
@@ -338,7 +338,8 @@ function combine(f::Any, gd::GroupedDataFrame) | |||
end | |||
end | |||
combine(gd::GroupedDataFrame, f::Any) = combine(f, gd) | |||
combine(gd::GroupedDataFrame, f::Pair...) = combine(f, gd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove this method, I guess you can also drop Tuple{Vararg{Pair}}
from the signature of _combine
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment still applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. It still works without this. It's deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out it is needed. combine(f, (:c => sum, :d => mean))
breaks without it.
I thought
combine(gd::GroupedDataFrame, f::Union{Pair, AbstractVector{<:Pair}}...)
would capture it, but it does not. If we wanted to avoid the dispatch we could add a method for combine
which collects a tuple of pairs into a Vector
.
I don't think so. Why would it be needed? |
Okay my strategy for this will be 4 methods for each
Here I take the non-keyword arguments and collect them into a named tuple somehow, meaning I move the the autogeneration of variable names to this stage, also making sure I don't overwrite names from the Does that make sense? |
I have implemented the above strategy, but only for Rather than splatting a vector of I have a big commented out block that will handle the re-naming in the case of duplicate names, but I need to put more work into understanding the logic of it and what I need to change. The code originally came from the Any feedback is welcome. |
Makes sense, but note you'll have to handle the case where a single |
a43f53c
to
28a4fa2
Compare
Rebased and revived this PR. However in the process the |
src/groupeddataframe/grouping.jl
Outdated
combine(identity, gd) | ||
else | ||
combine(values(f), gd) | ||
# handle pairs on their own so we can iterate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really too much code for a small feature IMHO. Is there any way to make this simpler? We don't have to allow mixing keyword and positional arguments.
If we don't mix positional and keyword arguments than things become simple again, and I doubt people will miss them. I would prefer to either close this or do a force-push of master to start anew. It would be nice to allow splatting for multiple arrays in a function call. Then we wouldn't have to worry about supporting vector arguments at all. However this isn't allowed, right?
|
d5301c3
to
359bf31
Compare
I have reset this a) make a function with |
Okay this PR should be ready for a review! I promise its a lot less to review than last time. I want to deprecate one method, and thats |
But tuple is lighter than vector so why would we want to prefer vectors over tuples? (but this is a corner case anyway - probably @nalimilan has a stronger opinion here what is right, because he spent many more hours than me on this).
I could not find the code with the deprecation (but you probably can add it later - first we should decide if we want to drop support for it).
Did you rather mean Finally - maybe you can add to tests the original reason for this PR, i.e. something like |
thanks, I will add those tests, and try to think of a good failing test.
I have not deprecated it and it works now, but is only type-specified at the I want to deprecate it because if we allow tuple, then we might as well have multiple tuples, along with tuple and scalar... |
AFAIK |
Oh! sorry. yes that was my mistake. |
I now have the naming working with my (somewhat hacky) solution. This solution seems more elegant than making the type of our I also added some tests that catch what went wrong. |
I think you can just replace |
That's a good suggestion, and I have incorporated it. Things should be good now. |
|
Thanks for the catch. I've updated it. |
Can this be merged even though some of the coverage tests didn't pass? |
Coverage passes. The problem is that it needs to be merged with master again (sorry for inconvenience). |
I think things are correct now? |
Thank you for the update. Let us wait for @nalimilan for a final review, as he is the "groupby" master 😄 (he is off for a few days, but we can go back to this PR very soon). |
@nalimilan - how do you see we should proceed with this (given 1.0 release plans?) |
Sorry for the delay. There are still open discussions above. Can you comment on or close them as appropriate? At least the one about the |
incol = gd.parent[!, first(p)] | ||
idx = gd.idx[gd.starts] | ||
outcol = agg(incol, gd) | ||
return idx, outcol | ||
else | ||
fun = do_f(last(p)) | ||
if p isa Pair{<:ColumnIndex} | ||
|
||
if p isa Pair && first(p) isa ColumnIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if p isa Pair && first(p) isa ColumnIndex | |
if p isa Pair && first(p) isa ColumnIndex |
Someone posted on discuss here. I need to finish this PR |
Would this PR avoid recompilation? (I have posted an answer how the problem on Discourse can be solved using API we have now) |
No, but it would have avoided that complicated named tuple expression with repeated pairs to the same functions. |
This PR might have to be dropped or significantly reviewed if we go for |
yeah i dropped the ball on this PR, and I apologize for that. This PR would fall into place really easily if we implemented #2016 and some infrastructure to handle this kind of thing. For example ideally
Should both work via some generic code that is not unique to One benefit of Milan's suggestion to avoid named tuples is that it reduces the gymnastics with arguments, support for which were dropped from this PR because it would have added a lot of boiler-plate code for getting all the different argument formats to play nice. |
Note that I didn't really suggest dropping keyword arguments, rather adding another possible syntax. |
This is how I understood this. My point is that it has to be rethought how broadcasting would come into play in |
I am closing it as it is now implemented (except for broadcasting of |
@pdeffebach - thank you for raising this functionality. I think it helped a lot to improve the package. |
This is a first attempt at resolving #1615
It just gets around this by assuming that if you have many different arguments we can
vcat
them together and pass it off tocombine
.combine
has a pretty good infrastructure for sorting out arguments so I don't think this needs to happen atby
.