add withColumn to a case class #208

frosforever · 2017-11-16T19:56:51Z

Connects to #207

Allows the addition of a column similar to withColumn

frameless/dataset/src/main/scala/frameless/TypedDataset.scala

Line 616 in daca214

    
           def withColumn[A: TypedEncoder, H <: HList, FH <: HList, Out](ca: TypedColumn[T, A])(

but returning a case class rather than tuple. This is especially useful if your starting type has > 22 columns.

I'd love feedback on the general approach, and a better name for the method.

One thing that's still missing is the ability to add multiple columns in one go rather than requiring a case class representation for each new column added (see #188). I can take a stab at that if this is deemed reasonable.

codecov-io · 2017-11-16T21:44:33Z

Codecov Report

Merging #208 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
+ Coverage   95.72%   95.76%   +0.03%     
==========================================
  Files          39       39              
  Lines         796      802       +6     
  Branches       13       12       -1     
==========================================
+ Hits          762      768       +6     
  Misses         34       34

Impacted Files	Coverage Δ
...et/src/main/scala/frameless/ops/SmartProject.scala	`100% <ø> (ø)`	⬆️
...ataset/src/main/scala/frameless/TypedDataset.scala	`93.7% <100%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2709c1d...336586b. Read the comment docs.

OlivierBlanvillain

LGTM otherwise, thanks a lot for the PR!

OlivierBlanvillain · 2017-11-16T22:00:25Z

dataset/src/main/scala/frameless/TypedDataset.scala

+
+  class withColumnApply[U] {
+    /**
+      *  Adds a column to a Dataset so long as the specified output type, `U`, has


I would move this documentation to the method and leave the class undocumented, supposingly users are never going to see this type in IDE autocomplete so that should be fine.

OlivierBlanvillain · 2017-11-16T22:05:29Z

dataset/src/main/scala/frameless/TypedDataset.scala

+    * @see [[frameless.TypedDataset.withColumnApply#apply]]
+    * @tparam U the output type
+    */
+  def withColumnTyped[U] = new withColumnApply[U]


Is scalac happy if you call this withColumn? I wouldn't mind using the same name...

sadly it's not. withColumn would be the obvious choice. 2nd hardest thing in CS. I'm open to naming suggestions

It might actually be better reversed: withColumn & withColumnTupled?

Oh interesting. I think that makes sense, though returning tuples does seem to be the first class returned result in ex select etc.
I wonder if the same should be done there or if a different first class result (eg Record) should be chosen.

OlivierBlanvillain · 2017-11-16T22:08:54Z

Adding multiple columns at once sounds like a reasonable addition! I could probably implemented using shapeless.ProductArgs similarly to selectMany.

frosforever · 2017-11-16T22:42:31Z

@OlivierBlanvillain Thanks for the review! I was indeed thinking of using ProductArgs for N columns.

However, it seems like this is very much related to #188. While I understand the desire to hide the shapeless record complexity to the user, it seems like it would make things a lot easier if record types were returned instead.

imarios · 2017-11-18T03:00:35Z

Hey @frosforever , was testing this great stuff out and found the following:

case class X(i: Int, j: Int)
case class Y(i: Int, ji: Int, k: Boolean)

val f: TypedDataset[X] = TypedDataset.create(X(1,1) :: X(1,1) :: X(1,10) :: Nil)

val fNew: TypedDataset[Y] = f.withColumn[Y](f('j) === 10)
org.apache.spark.sql.AnalysisException: Cannot resolve column name "k" among (i, j, ji);
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:215)
  at org.apache.spark.sql.Dataset.col(Dataset.scala:1105)
  at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
  at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:296)
  at frameless.TypedDataset$withColumnApply.apply(TypedDataset.scala:710)
  ... 42 elided

frosforever · 2017-11-18T03:07:34Z

@imarios good catch! Completely missed that. I'll fix that when I get a chance. Thanks!

frosforever · 2017-11-18T23:20:57Z

@imarios thanks again for the review. This should fix the issue you found.

OlivierBlanvillain · 2017-11-21T08:47:50Z

Hmm I tried resolving conflicts from the github UI but it looks like something went wrong...

imarios · 2017-11-23T17:28:02Z

Hi @frosforever ! Do you need any help resolving the conflicts? I think resolving conflicts and rebase/squash and we are good to go. thank you for adding this, looks great!

rename existing withColumn to withColumnTupled

frosforever · 2017-11-23T21:51:26Z

@imarios sorry for the delay. All resolved

frosforever force-pushed the i#207-withColumn-arity22 branch from aec0ab7 to 5266c88 Compare November 16, 2017 21:44

OlivierBlanvillain approved these changes Nov 16, 2017

View reviewed changes

OlivierBlanvillain reviewed Nov 16, 2017

View reviewed changes

frosforever force-pushed the i#207-withColumn-arity22 branch from 5266c88 to 835b774 Compare November 16, 2017 22:39

frosforever added 2 commits November 23, 2017 16:43

fix slightly confusing documentation on smartproject

dd16e60

add withColumn to case class

336586b

rename existing withColumn to withColumnTupled

frosforever force-pushed the i#207-withColumn-arity22 branch from 604e472 to 336586b Compare November 23, 2017 21:50

imarios merged commit 1bd7e57 into typelevel:master Nov 25, 2017

frosforever deleted the i#207-withColumn-arity22 branch November 25, 2017 22:03

frosforever mentioned this pull request Nov 29, 2017

add drop to a case class #212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add withColumn to a case class #208

add withColumn to a case class #208

frosforever commented Nov 16, 2017 •

edited

Loading

codecov-io commented Nov 16, 2017 •

edited

Loading

OlivierBlanvillain left a comment

OlivierBlanvillain Nov 16, 2017

OlivierBlanvillain Nov 16, 2017

frosforever Nov 16, 2017

OlivierBlanvillain Nov 16, 2017

frosforever Nov 16, 2017

OlivierBlanvillain commented Nov 16, 2017

frosforever commented Nov 16, 2017

imarios commented Nov 18, 2017

frosforever commented Nov 18, 2017

frosforever commented Nov 18, 2017

OlivierBlanvillain commented Nov 21, 2017

imarios commented Nov 23, 2017

frosforever commented Nov 23, 2017

add withColumn to a case class #208

add withColumn to a case class #208

Conversation

frosforever commented Nov 16, 2017 • edited Loading

codecov-io commented Nov 16, 2017 • edited Loading

Codecov Report

OlivierBlanvillain left a comment

Choose a reason for hiding this comment

OlivierBlanvillain Nov 16, 2017

Choose a reason for hiding this comment

OlivierBlanvillain Nov 16, 2017

Choose a reason for hiding this comment

frosforever Nov 16, 2017

Choose a reason for hiding this comment

OlivierBlanvillain Nov 16, 2017

Choose a reason for hiding this comment

frosforever Nov 16, 2017

Choose a reason for hiding this comment

OlivierBlanvillain commented Nov 16, 2017

frosforever commented Nov 16, 2017

imarios commented Nov 18, 2017

frosforever commented Nov 18, 2017

frosforever commented Nov 18, 2017

OlivierBlanvillain commented Nov 21, 2017

imarios commented Nov 23, 2017

frosforever commented Nov 23, 2017

frosforever commented Nov 16, 2017 •

edited

Loading

codecov-io commented Nov 16, 2017 •

edited

Loading