-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add withColumn to a case class #208
add withColumn to a case class #208
Conversation
aec0ab7
to
5266c88
Compare
Codecov Report
@@ Coverage Diff @@
## master #208 +/- ##
==========================================
+ Coverage 95.72% 95.76% +0.03%
==========================================
Files 39 39
Lines 796 802 +6
Branches 13 12 -1
==========================================
+ Hits 762 768 +6
Misses 34 34
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise, thanks a lot for the PR!
|
||
class withColumnApply[U] { | ||
/** | ||
* Adds a column to a Dataset so long as the specified output type, `U`, has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this documentation to the method and leave the class undocumented, supposingly users are never going to see this type in IDE autocomplete so that should be fine.
* @see [[frameless.TypedDataset.withColumnApply#apply]] | ||
* @tparam U the output type | ||
*/ | ||
def withColumnTyped[U] = new withColumnApply[U] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is scalac happy if you call this withColumn
? I wouldn't mind using the same name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly it's not. withColumn
would be the obvious choice. 2nd hardest thing in CS. I'm open to naming suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually be better reversed: withColumn
& withColumnTupled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I think that makes sense, though returning tuples does seem to be the first class returned result in ex select
etc.
I wonder if the same should be done there or if a different first class result (eg Record
) should be chosen.
Adding multiple columns at once sounds like a reasonable addition! I could probably implemented using |
5266c88
to
835b774
Compare
@OlivierBlanvillain Thanks for the review! I was indeed thinking of using However, it seems like this is very much related to #188. While I understand the desire to hide the shapeless record complexity to the user, it seems like it would make things a lot easier if record types were returned instead. |
Hey @frosforever , was testing this great stuff out and found the following: case class X(i: Int, j: Int)
case class Y(i: Int, ji: Int, k: Boolean)
val f: TypedDataset[X] = TypedDataset.create(X(1,1) :: X(1,1) :: X(1,10) :: Nil)
val fNew: TypedDataset[Y] = f.withColumn[Y](f('j) === 10)
org.apache.spark.sql.AnalysisException: Cannot resolve column name "k" among (i, j, ji);
at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.Dataset.resolve(Dataset.scala:215)
at org.apache.spark.sql.Dataset.col(Dataset.scala:1105)
at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at frameless.TypedDataset$withColumnApply.apply(TypedDataset.scala:710)
... 42 elided |
@imarios good catch! Completely missed that. I'll fix that when I get a chance. Thanks! |
@imarios thanks again for the review. This should fix the issue you found. |
Hmm I tried resolving conflicts from the github UI but it looks like something went wrong... |
Hi @frosforever ! Do you need any help resolving the conflicts? I think resolving conflicts and rebase/squash and we are good to go. thank you for adding this, looks great! |
rename existing withColumn to withColumnTupled
604e472
to
336586b
Compare
@imarios sorry for the delay. All resolved |
Connects to #207
Allows the addition of a column similar to
withColumn
frameless/dataset/src/main/scala/frameless/TypedDataset.scala
Line 616 in daca214
I'd love feedback on the general approach, and a better name for the method.
One thing that's still missing is the ability to add multiple columns in one go rather than requiring a case class representation for each new column added (see #188). I can take a stab at that if this is deemed reasonable.