-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check if input is a table before trying alternative constructors #2341
Conversation
Fixes JuliaData/CSV.jl#702. A DataFrame constructor was removed from CSV.jl in its latest release (0.7.6) as part of new deprecations for decoupling the two packages. What I forgot was that we were relying on that constructor because of an ambiguity in the generic DataFrame constructor fallback that mostly uses Tables.jl to try and turn anything into a DataFrame. The problem is "mostly"; one of the checks is if the input is an `AbstractVector` of `AbstractVectors`, which it turns out `CSV.File` is! This is because it's considered an `AbstractVector` of `Row`s, which themselves are `<: AbstractVector`. So we were trying to basically take each row of a `CSV.File` and treat each row as a column in that fallback constructor. While it was proposed to put the DataFrame constructor back in CSV.jl for now; that's pretty unsatisfying because we know we're going to remove the DataFrames deprecation at some point, and when we do, we want `DataFrame(CSV.File())` to work out of the box. The fix here is just adding an additional check up front in the fallback constructor if we already know the input is `Tables.istable(x)`. If so, we'll avoid these extra corner case checks and just let the Tables.jl machinery do the work.
Thank you for the fix. Can you please add a test of this? I have no problem with having CSV.jl as a dependency in extras for tests in particular and we have docs/src/assets/iris.csv data set to use for tests. In this way we could ensure that a standard test set for DataFrames.jl checks if CSV.jl works correctly with DataFrames.jl (ideally - put as many tests as needed to cover all scenarios how things can be passed between the packages). Would this be OK with you? Thank you! |
Quick correction:
This actually isn't accurate (my brain must have figured it out while I was sleeping, because I woke up thinking, "wait! this shouldn't be an issue!"). The real issue is that I still like the fix in place here, but we may consider wanting to add an additional check that the input also has a length > 0. |
Ok, added a test; I didn't add CSV.jl as a test dependency for now as this is a smaller issue than I originally thought (only affects empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some more tests now and it looks good. Thank you. I will merge and backport it when CI passes.
Thank you! |
Fixes JuliaData/CSV.jl#702. A DataFrame
constructor was removed from CSV.jl in its latest release (0.7.6) as
part of new deprecations for decoupling the two packages. What I forgot
was that we were relying on that constructor because of an ambiguity in
the generic DataFrame constructor fallback that mostly uses Tables.jl to
try and turn anything into a DataFrame. The problem is "mostly"; one of
the checks is if the input is an
AbstractVector
ofAbstractVectors
,which it turns out
CSV.File
is! This is because it's considered anAbstractVector
ofRow
s, which themselves are<: AbstractVector
. Sowe were trying to basically take each row of a
CSV.File
and treat eachrow as a column in that fallback constructor.
While it was proposed to put the DataFrame constructor back in CSV.jl
for now; that's pretty unsatisfying because we know we're going to
remove the DataFrames deprecation at some point, and when we do, we
want
DataFrame(CSV.File())
to work out of the box.The fix here is just adding an additional check up front in the fallback
constructor if we already know the input is
Tables.istable(x)
. If so,we'll avoid these extra corner case checks and just let the Tables.jl
machinery do the work.