-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypedColumn#year and LocalDateTime generator #228
Conversation
Codecov Report
@@ Coverage Diff @@
## master #228 +/- ##
==========================================
+ Coverage 96.95% 96.95% +<.01%
==========================================
Files 52 52
Lines 853 854 +1
Branches 8 8
==========================================
+ Hits 827 828 +1
Misses 26 26
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible for a String
to not parse out a year. eg:
val ds = TypedDataset.create(Seq(X1("hello")))
ds.select(year(ds('a))).show().run()
/*
+----+
| _1|
+----+
|null|
+----+
*/
* | ||
* apache/spark | ||
*/ | ||
def year[T](col: TypedColumn[T, String]): TypedColumn[T, Int] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably TypedColumn[T, Option[Int]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here. This should fail when there is years to be extracted. Is Spark returns null in this case, then this should be encoded as @frosforever suggests. Should be a trivial change.
.toList | ||
def dateTimeStringFuncProp[A: Encoder](strFunc: TypedColumn[X1[String], String] => TypedColumn[X1[String], A], | ||
sparkFunc: Column => Column): Prop = | ||
forAll { values: List[JavaLocalDateTime] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's not a nicely formatted date string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typesafely cover the case of malformed date strings.
@Avasil are you still having issues with the code? Can you paste some example error stack traces and the code to reproduce it here? I can try to help with that. |
@imarios def year[T](col: TypedColumn[T, String]): TypedColumn[T, Option[Int]] = {
new TypedColumn[T, Option[Int]](untyped.year(col.untyped))
} val ds = TypedDataset.create(Seq(X1("hello")))
val ds2 = TypedDataset.create(Seq(X1("2015-02-12")))
ds.select(year(ds('a))).show().run() // 1
ds.dataset.select(untyped.year($"a")).show() // 2
ds2.dataset.select(untyped.year($"a")).show() // 3
ds2.select(year(ds('a))).show().run() // 4 It fails in the runtime on 4:
|
Hey @Avasil I'm on mobile at the moment and can't test this, but I think what you've got should work. Is it possible that |
@frosforever oh man, that's exactly it, thank you! Still, can anyone point to me to the code responsible for this "conversion"? As I understand it and @imarios mentioned on gitter, it is encoded the same as |
@Avasil This looks fine to me, anything missing? (sorry I polluted your diff with reformatting) |
@OlivierBlanvillain I think everything is there :) |
Alright, LGTM on my side, @imarios do you want to add something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
Hi @Avasil, LGTM, we just have to resolve the merge conflicts and we are good to go. |
# Conflicts: # dataset/src/test/scala/frameless/functions/NonAggregateFunctionsTests.scala # dataset/src/test/scala/frameless/package.scala
Now it's failing on |
@Avasil is there still an issue with this PR? |
@imarios Not that I'm aware of |
@Avasil I wanted to merge your PR before merging the unification changes. I then forgotten and merged the other one first ... Do you think you can resolve the conflicts? It might also be good for you to see how the landscape has changed a bit after the unification changes. Just to bring you up to speed, before, all these functions where valid for projected columns, but you couldn't use them during aggregation, say, |
# Conflicts: # dataset/src/main/scala/frameless/functions/NonAggregateFunctions.scala # dataset/src/test/scala/frameless/functions/NonAggregateFunctionsTests.scala
@imarios But I have one suggestion about |
@OlivierBlanvillain In user code it's ok: val test: TypedColumn[X1[String], Option[Int]] = year(ds[String]('a))
// or
val test: UntypedExpression[X1[String]] = year(ds[String]('a)) If you choose Different story when checking out functions in the sources. I don't think that's too big of an issue though |
Ok good. |
@Avasil, do you see any conflicts on your side? |
@imarios |
Rebase gives me issues but squash and merge looks good. I will wait until master is done checking the other PR and I will merge this soon. Thanks! |
Connects to #164
It implements
year
method but I also addedlocalDateTime
generator to tests, I think it might be useful for testing functions which needs correct date as string.