add drop to a case class #212

frosforever · 2017-11-29T01:25:49Z

Connects to #163 and #209

Similar to #208 adds support for dropping a column from a dataset with greater than 22 columns by returning a specified case class instead of a tuple.

codecov-io · 2017-11-29T01:51:37Z

Codecov Report

Merging #212 into master will increase coverage by 0.12%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #212      +/-   ##
==========================================
+ Coverage   95.86%   95.99%   +0.12%     
==========================================
  Files          42       42              
  Lines         847      848       +1     
  Branches       11       11              
==========================================
+ Hits          812      814       +2     
+ Misses         35       34       -1

Impacted Files	Coverage Δ
...ataset/src/main/scala/frameless/TypedDataset.scala	`93.89% <100%> (+0.04%)`	⬆️
...c/main/scala/frameless/TypedDatasetForwarded.scala	`54.16% <0%> (+4.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88b20c7...0fab834. Read the comment docs.

imarios

Look really good! Let's discuss a bit the comments and we can merge.

imarios · 2017-12-06T05:31:06Z

dataset/src/main/scala/frameless/TypedDataset.scala

+    *
+    * @see [[frameless.TypedDataset.dropApply#apply]]
+    */
+  def drop[U] = new dropApply[U]


Looks kind of strange that the types are documented here, but they are not defined here ...

Agreed that it looks strange but it's the same thing that I did

frameless/dataset/src/main/scala/frameless/TypedDataset.scala

Lines 669 to 702 in 4dbc2b3

/** Adds a column to a Dataset so long as the specified output type, `U`, has

* an extra column from `T` that has type `A`.

*

* @example

* {{{

* case class X(i: Int, j: Int)

* case class Y(i: Int, j: Int, k: Boolean)

* val f: TypedDataset[X] = TypedDataset.create(X(1,1) :: X(1,1) :: X(1,10) :: Nil)

* val fNew: TypedDataset[Y] = f.withColumn[Y](f('j) === 10)

* }}}

* @param ca The typed column to add

* @param i0 TypeEncoder for output type U

* @param i1 TypeEncoder for added column type A

* @param i2 the LabelledGeneric derived for T

* @param i3 the LabelledGeneric derived for U

* @param i4 proof no fields have been removed

* @param i5 diff from T to U

* @param i6 keys from newFields

* @param i7 the one and only new key

* @param i8 the one and only new field enforcing the type of A exists

* @param i9 the keys of U

* @param iA allows for traversing the keys of U

* @tparam U the output type

* @tparam A The added column type

* @tparam TRep shapeless' record representation of T

* @tparam URep shapeless' record representation of U

* @tparam UKeys the keys of U as an HList

* @tparam NewFields the added fields to T to get U

* @tparam NewKeys the keys of NewFields as an HList

* @tparam NewKey the first, and only, key in NewKey

*

* @see [[frameless.TypedDataset.withColumnApply#apply]]

*/

def withColumn[U] = new withColumnApply[U]

The reason for doing so is to get the nice scaladoc in IDEs etc on the method call site that the user will see. I don't have strong feelings either way but originally moved it due to the suggestion #208 (comment).

Whatever the decision, we should be consistent and update both this and withColumn.

imarios · 2017-12-06T05:33:53Z

dataset/src/test/scala/frameless/DropTest.scala

-    check(prop[SQLDate] _)
-    check(prop[Option[X1[Boolean]]] _)
+  test("remove column in the middle") {
+    val f: TypedDataset[X] = TypedDataset.create(X(1,1, false) :: X(1,1, false) :: X(1,10, false) :: Nil)


minor: spacing seems incostend. (same for other X literals)

imarios · 2017-12-06T05:36:26Z

dataset/src/main/scala/frameless/TypedDataset.scala

+    */
+  def drop[U] = new dropApply[U]
+
+  class dropApply[U] {


dropApply can have a regular class name DropApply.

sure, I'll update

frameless/dataset/src/main/scala/frameless/TypedDataset.scala

Line 704 in 4dbc2b3

class withColumnApply[U] {

as well then

frosforever · 2017-12-06T12:59:17Z

@imarios Thanks once again for the review!

Let me know how to proceed regarding #212 (comment)

imarios · 2017-12-06T16:32:08Z

@frosforever I had this thought after finishing the report: Isn't x.drop[U]('w) a more verbose version of x.project[U]? Actually, with project you can drop multiple columns at once instead of having to chain multiple .drops. What do you think? Are there cases where you would prefer to use drop over project?

frosforever · 2017-12-06T17:48:56Z

@imarios Ha that's an excellent point!
I would say there are two reasons to have drop but neither are imperative.

It can better convey intention. Adding a column, filtering on it, and then dropping it (often done with ranking to get the top by some category) is a common enough pattern that is better expressed using withColumn and then drop. Depending on how Named intermediate Datasets #188 goes this can be even more useful is HLists or Records are the intermediate type.
Improve coverage of available spark functions. A user coming from spark expects to see drop.

What do you think?

imarios · 2017-12-07T05:52:37Z

I think keeping drop for consistency with spark makes sense. How about we keep drop[U] as an alias to project[U]? That is, def drop[U] = project[U]. Someone may complain that with project you may not actual drop anything, but I think that's unlikely to happen accidentally. Project has the added benefit that you can drop more than one column; and as a bonus it let's you also reorder fields on the resulting class. What do you think?

frosforever · 2017-12-07T15:15:23Z

@imarios agreed. Updated PR.

imarios suggested changes Dec 6, 2017

View reviewed changes

add drop to a case class as alias to project

0fab834

frosforever force-pushed the dropTyped branch from 1c05392 to 0fab834 Compare December 7, 2017 15:14

imarios self-requested a review December 8, 2017 03:02

imarios approved these changes Dec 8, 2017

View reviewed changes

imarios merged commit f249172 into typelevel:master Dec 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add drop to a case class #212

add drop to a case class #212

frosforever commented Nov 29, 2017

codecov-io commented Nov 29, 2017 •

edited

Loading

imarios left a comment

imarios Dec 6, 2017

frosforever Dec 6, 2017

imarios Dec 6, 2017

imarios Dec 6, 2017

frosforever Dec 6, 2017

frosforever commented Dec 6, 2017

imarios commented Dec 6, 2017 •

edited

Loading

frosforever commented Dec 6, 2017

imarios commented Dec 7, 2017 •

edited

Loading

frosforever commented Dec 7, 2017

	/** Adds a column to a Dataset so long as the specified output type, `U`, has
	* an extra column from `T` that has type `A`.
	*
	* @example
	* {{{
	* case class X(i: Int, j: Int)
	* case class Y(i: Int, j: Int, k: Boolean)
	* val f: TypedDataset[X] = TypedDataset.create(X(1,1) :: X(1,1) :: X(1,10) :: Nil)
	* val fNew: TypedDataset[Y] = f.withColumn[Y](f('j) === 10)
	* }}}
	* @param ca The typed column to add
	* @param i0 TypeEncoder for output type U
	* @param i1 TypeEncoder for added column type A
	* @param i2 the LabelledGeneric derived for T
	* @param i3 the LabelledGeneric derived for U
	* @param i4 proof no fields have been removed
	* @param i5 diff from T to U
	* @param i6 keys from newFields
	* @param i7 the one and only new key
	* @param i8 the one and only new field enforcing the type of A exists
	* @param i9 the keys of U
	* @param iA allows for traversing the keys of U
	* @tparam U the output type
	* @tparam A The added column type
	* @tparam TRep shapeless' record representation of T
	* @tparam URep shapeless' record representation of U
	* @tparam UKeys the keys of U as an HList
	* @tparam NewFields the added fields to T to get U
	* @tparam NewKeys the keys of NewFields as an HList
	* @tparam NewKey the first, and only, key in NewKey
	*
	* @see [[frameless.TypedDataset.withColumnApply#apply]]
	*/
	def withColumn[U] = new withColumnApply[U]

add drop to a case class #212

add drop to a case class #212

Conversation

frosforever commented Nov 29, 2017

codecov-io commented Nov 29, 2017 • edited Loading

Codecov Report

imarios left a comment

Choose a reason for hiding this comment

imarios Dec 6, 2017

Choose a reason for hiding this comment

frosforever Dec 6, 2017

Choose a reason for hiding this comment

imarios Dec 6, 2017

Choose a reason for hiding this comment

imarios Dec 6, 2017

Choose a reason for hiding this comment

frosforever Dec 6, 2017

Choose a reason for hiding this comment

frosforever commented Dec 6, 2017

imarios commented Dec 6, 2017 • edited Loading

frosforever commented Dec 6, 2017

imarios commented Dec 7, 2017 • edited Loading

frosforever commented Dec 7, 2017

codecov-io commented Nov 29, 2017 •

edited

Loading

imarios commented Dec 6, 2017 •

edited

Loading

imarios commented Dec 7, 2017 •

edited

Loading