-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
require AbstractVector from subset selectors #2744
Conversation
I classify this PR as a bug, since I understand we agree that the current behavior is unintended and confusing. |
NEWS.md
Outdated
|
||
* patch v1.0.3: make sure `subset` checks if the passed condition function | ||
returns a vector of values | ||
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not this qualifies for a patch release as this may break existing workflows, right?
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) | |
([#2744](https://github.com/JuliaData/DataFrames.jl/pull/2744)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is how we qualify this change. Since we added this feature in 1.0 release only I would prefer to say this is a bug (i.e. not intended functionality) so we patch it. Otherwise we should deprecate it and it could be changed only in 2.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in theory no breaking changes should be introduced until 2.0. In practice there are lots of grey areas like this one. Just like Julia allows "minor changes" across minor versions, I think we can do this kind of thing given that the risk of breakage is limited. But this PR doesn't fix any obviously incorrect behavior nor allows doing something that currently fails -- it's just an improvement to limit confusion. Better keep disruption in patch releases to the strict minimum IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in summary do we go for 1.1.0 or 1.0.3 release with this? I understand you feel 1.1.0 would be better. Right? I am OK with this (I guess this should not cause package update issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think 1.1.0 would be better. Not sure whether it's simpler to merge the PR and create a release-1.1 branch, or whether we should keep it open for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK to switch to 1.1.0 but I would make the release as soon as possible. I do not want people using the functionality we want to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it is weird, but it is not a problem so I would go for it (and explain it in my blog on Friday 😄).
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
src/abstractdataframe/subset.jl
Outdated
@@ -77,7 +92,8 @@ Each argument passed in `args` can be either a single column selector or a | |||
described for [`select`](@ref). | |||
|
|||
Note that as opposed to [`filter`](@ref) the `subset` function works on whole | |||
columns (or all rows in groups for `GroupedDataFrame`). | |||
columns and always makes a subset of rows (or all rows in groups for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "makes a subset of rows" mean? Also it's not mentioned for subset!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is intended to mean that selection happens row-wise always (not whole groups) - this was the original reason why this PR is made. In particular this is a crucial difference between subset
and filter
for GroupedDataFrame
(as the latter works per group). Could you please propose a better wording to convey this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then maybe this?
works on whole columns (and selects rows within groups for `GroupedDataFrame`
rather than whole groups).
And I'd mention that the result should be a vector of Boolean in the first paragraph as that's really an essential information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - added
src/abstractdataframe/subset.jl
Outdated
If you want to filter rows of an `AbstractDataFrame` or groups of | ||
`GroupedDataFrame` by a function that returns `Bool` value use [`filter!`](@ref) | ||
or [`filter`](@ref) (note that [`filter!`](@ref) is not supported for | ||
`GroupedDataFrame`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync this with subset
docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will sync once we finalize subset
docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
NEWS.md
Outdated
* patch v1.0.3: make sure `subset` checks if the passed condition function | ||
returns a vector of values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this if bumping to 1.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated and added more explanations. Fortunately the change we make will now throw an error so it will be easy to catch.
NEWS.md
Outdated
|
||
* patch v1.0.3: make sure `subset` checks if the passed condition function | ||
returns a vector of values | ||
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.
Thank you! I will make a 1.1. release now |
Fixes #2740
It was easier than I thought in the end. I have not implemented the
ByGroup
idea (as one can usefilter
for this), but we can discuss it here.Some other minor things I have changed: