Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add withColumn to a case class #208

Merged
merged 2 commits into from
Nov 25, 2017

Conversation

frosforever
Copy link
Contributor

@frosforever frosforever commented Nov 16, 2017

Connects to #207

Allows the addition of a column similar to withColumn

def withColumn[A: TypedEncoder, H <: HList, FH <: HList, Out](ca: TypedColumn[T, A])(
but returning a case class rather than tuple. This is especially useful if your starting type has > 22 columns.

I'd love feedback on the general approach, and a better name for the method.

One thing that's still missing is the ability to add multiple columns in one go rather than requiring a case class representation for each new column added (see #188). I can take a stab at that if this is deemed reasonable.

@frosforever frosforever force-pushed the i#207-withColumn-arity22 branch from aec0ab7 to 5266c88 Compare November 16, 2017 21:44
@codecov-io
Copy link

codecov-io commented Nov 16, 2017

Codecov Report

Merging #208 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #208      +/-   ##
==========================================
+ Coverage   95.72%   95.76%   +0.03%     
==========================================
  Files          39       39              
  Lines         796      802       +6     
  Branches       13       12       -1     
==========================================
+ Hits          762      768       +6     
  Misses         34       34
Impacted Files Coverage Δ
...et/src/main/scala/frameless/ops/SmartProject.scala 100% <ø> (ø) ⬆️
...ataset/src/main/scala/frameless/TypedDataset.scala 93.7% <100%> (+0.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2709c1d...336586b. Read the comment docs.

Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise, thanks a lot for the PR!


class withColumnApply[U] {
/**
* Adds a column to a Dataset so long as the specified output type, `U`, has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this documentation to the method and leave the class undocumented, supposingly users are never going to see this type in IDE autocomplete so that should be fine.

* @see [[frameless.TypedDataset.withColumnApply#apply]]
* @tparam U the output type
*/
def withColumnTyped[U] = new withColumnApply[U]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scalac happy if you call this withColumn? I wouldn't mind using the same name...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly it's not. withColumn would be the obvious choice. 2nd hardest thing in CS. I'm open to naming suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually be better reversed: withColumn & withColumnTupled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. I think that makes sense, though returning tuples does seem to be the first class returned result in ex select etc.
I wonder if the same should be done there or if a different first class result (eg Record) should be chosen.

@OlivierBlanvillain
Copy link
Contributor

Adding multiple columns at once sounds like a reasonable addition! I could probably implemented using shapeless.ProductArgs similarly to selectMany.

@frosforever frosforever force-pushed the i#207-withColumn-arity22 branch from 5266c88 to 835b774 Compare November 16, 2017 22:39
@frosforever
Copy link
Contributor Author

@OlivierBlanvillain Thanks for the review! I was indeed thinking of using ProductArgs for N columns.

However, it seems like this is very much related to #188. While I understand the desire to hide the shapeless record complexity to the user, it seems like it would make things a lot easier if record types were returned instead.

@imarios
Copy link
Contributor

imarios commented Nov 18, 2017

Hey @frosforever , was testing this great stuff out and found the following:

case class X(i: Int, j: Int)
case class Y(i: Int, ji: Int, k: Boolean)

val f: TypedDataset[X] = TypedDataset.create(X(1,1) :: X(1,1) :: X(1,10) :: Nil)

val fNew: TypedDataset[Y] = f.withColumn[Y](f('j) === 10)
org.apache.spark.sql.AnalysisException: Cannot resolve column name "k" among (i, j, ji);
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
  at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(Dataset.scala:216)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:215)
  at org.apache.spark.sql.Dataset.col(Dataset.scala:1105)
  at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
  at frameless.TypedDataset$withColumnApply$$anonfun$10.apply(TypedDataset.scala:710)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:296)
  at frameless.TypedDataset$withColumnApply.apply(TypedDataset.scala:710)
  ... 42 elided

@frosforever
Copy link
Contributor Author

@imarios good catch! Completely missed that. I'll fix that when I get a chance. Thanks!

@frosforever
Copy link
Contributor Author

@imarios thanks again for the review. This should fix the issue you found.

@OlivierBlanvillain
Copy link
Contributor

Hmm I tried resolving conflicts from the github UI but it looks like something went wrong...

@imarios
Copy link
Contributor

imarios commented Nov 23, 2017

Hi @frosforever ! Do you need any help resolving the conflicts? I think resolving conflicts and rebase/squash and we are good to go. thank you for adding this, looks great!

@frosforever frosforever force-pushed the i#207-withColumn-arity22 branch from 604e472 to 336586b Compare November 23, 2017 21:50
@frosforever
Copy link
Contributor Author

@imarios sorry for the delay. All resolved

@imarios imarios merged commit 1bd7e57 into typelevel:master Nov 25, 2017
@frosforever frosforever deleted the i#207-withColumn-arity22 branch November 25, 2017 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants