Skip to content

Commit

Permalink
Rework complete_cases(!) functions and add dropnull(!) (#6)
Browse files Browse the repository at this point in the history
Deprecate complete_cases() if favor of completecases(), and complete_cases!() in favor of dropnull!(). Add a dropnull() variant.

Also change completecases() to return a BitArray instead of an Array{Bool}.
  • Loading branch information
cjprybol authored and quinnj committed Sep 2, 2017
1 parent 8c86dc0 commit 25316e4
Show file tree
Hide file tree
Showing 7 changed files with 58 additions and 16 deletions.
5 changes: 3 additions & 2 deletions docs/src/lib/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ Pages = ["utilities.md"]
```@docs
eltypes
head
complete_cases
complete_cases!
completecases
describe
dropnull
dropnull!
dump
names!
nonunique
Expand Down
4 changes: 3 additions & 1 deletion src/DataFrames.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module DataFrames

using Reexport
@reexport using StatsBase
import NullableArrays: dropnull, dropnull!
@reexport using NullableArrays
@reexport using CategoricalArrays
using GZip
Expand Down Expand Up @@ -55,10 +56,11 @@ export @~,
colwise,
combine,
completecases,
completecases!,
setcontrasts!,
deleterows!,
describe,
dropnull,
dropnull!,
eachcol,
eachrow,
eltypes,
Expand Down
50 changes: 40 additions & 10 deletions src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ The following are normally implemented for AbstractDataFrames:
* [`tail`](@ref) : last `n` rows
* `convert` : convert to an array
* `NullableArray` : convert to a NullableArray
* [`complete_cases`](@ref) : indexes of complete cases (rows with no nulls)
* [`complete_cases!`](@ref) : remove rows with nulls
* [`completecases`](@ref) : boolean vector of complete cases (rows with no nulls)
* [`dropnull`](@ref) : remove rows with null values
* [`dropnull!`](@ref) : remove rows with null values in-place
* [`nonunique`](@ref) : indexes of duplicate rows
* [`unique!`](@ref) : remove duplicate rows
* `similar` : a DataFrame with similar columns as `d`
Expand Down Expand Up @@ -447,31 +448,60 @@ completecases(df::AbstractDataFrame)
* `::Vector{Bool}` : indexes of complete cases
See also [`complete_cases!`](@ref).
See also [`dropnull`](@ref) and [`dropnull!`](@ref).
**Examples**
```julia
df = DataFrame(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
df[[1,4,5], :x] = Nullable()
df[[9,10], :y] = Nullable()
complete_cases(df)
completecases(df)
```
"""
function completecases(df::AbstractDataFrame)
res = fill(true, size(df, 1))
res = trues(size(df, 1))
for i in 1:size(df, 2)
_nonnull!(res, df[i])
end
res
end

"""
Delete rows with null values.
Remove rows with null values.
```julia
completecases!(df::AbstractDataFrame)
dropnull(df::AbstractDataFrame)
```
**Arguments**
* `df` : the AbstractDataFrame
**Result**
* `::AbstractDataFrame` : the updated copy
See also [`completecases`](@ref) and [`dropnull!`](@ref).
**Examples**
```julia
df = DataFrame(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
df[[1,4,5], :x] = Nullable()
df[[9,10], :y] = Nullable()
dropnull(df)
```
"""
dropnull(df::AbstractDataFrame) = deleterows!(copy(df), find(!, completecases(df)))

"""
Remove rows with null values in-place.
```julia
dropnull!(df::AbstractDataFrame)
```
**Arguments**
Expand All @@ -482,19 +512,19 @@ completecases!(df::AbstractDataFrame)
* `::AbstractDataFrame` : the updated version
See also [`complete_cases`](@ref).
See also [`dropnull`](@ref) and [`completecases`](@ref).
**Examples**
```julia
df = DataFrame(i = 1:10, x = rand(10), y = rand(["a", "b", "c"], 10))
df[[1,4,5], :x] = Nullable()
df[[9,10], :y] = Nullable()
complete_cases!(df)
dropnull!(df)
```
"""
completecases!(df::AbstractDataFrame) = deleterows!(df, find(!, completecases(df)))
dropnull!(df::AbstractDataFrame) = deleterows!(df, find(!, completecases(df)))

function Base.convert(::Type{Array}, df::AbstractDataFrame)
convert(Matrix, df)
Expand Down
3 changes: 3 additions & 0 deletions src/deprecated.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ import Base: keys, values, insert!
@deprecate pool categorical
@deprecate pool! categorical!

@deprecate complete_cases! dropnull!
@deprecate complete_cases completecases

@deprecate sub(df::AbstractDataFrame, rows) view(df, rows)

@deprecate stackdf stackdf
Expand Down
2 changes: 1 addition & 1 deletion src/statsmodels/formula.jl
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ end

## Default NULL handler. Others can be added as keyword arguments
function null_omit(df::DataFrame)
cc = complete_cases(df)
cc = completecases(df)
df[cc,:], cc
end

Expand Down
8 changes: 7 additions & 1 deletion test/data.jl
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,13 @@ module TestData
@test size(df6, 2) == 3

#test_group("null handling")
@test nrow(df5[complete_cases(df5), :]) == 3
@test nrow(df5[completecases(df5), :]) == 3
@test nrow(dropnull(df5)) == 3
returned = dropnull(df4)
@test df4 == returned && df4 !== returned
@test nrow(dropnull!(df5)) == 3
returned = dropnull!(df4)
@test df4 == returned && df4 === returned

#test_context("SubDataFrames")

Expand Down
2 changes: 1 addition & 1 deletion test/formula.jl
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ module TestFormula
d[:x1m] = NullableArray(Nullable{Int}[5, 6, Nullable(), 7])
mf = ModelFrame(y ~ x1m, d)
mm = ModelMatrix(mf)
@test isequal(NullableArray(mm.m[:, 2]), d[complete_cases(d), :x1m])
@test isequal(NullableArray(mm.m[:, 2]), d[completecases(d), :x1m])
@test mm.m == ModelMatrix{sparsetype}(mf).m

## Same variable on left and right side
Expand Down

0 comments on commit 25316e4

Please sign in to comment.