[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818

aray · 2017-08-02T15:08:06Z

What changes were proposed in this pull request?

Allows BinaryComparison operators to work on any data type that actually supports ordering as verified by TypeUtils.checkForOrderingExpr instead of relying on the incomplete list TypeCollection.Ordered (which is removed by this PR).

How was this patch tested?

Updated unit tests to cover structs and arrays.

SparkQA · 2017-08-02T15:14:09Z

Test build #80161 has finished for PR 18818 at commit 0d1fd56.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class MyStruct(a: Long, b: String)

SparkQA · 2017-08-02T17:25:19Z

Test build #80162 has finished for PR 18818 at commit ec8dc95.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-02T20:10:26Z

Test build #80170 has finished for PR 18818 at commit caf74bf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nchammas

Not my area of expertise, but happy to see the changeset is very small. Maybe @HyukjinKwon, @viirya, or @gatorsmile can provide a more thorough review here.

@aray - Does the example from the original feature request work with this PR?

nchammas · 2017-08-07T12:24:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala

-   * Types that can be ordered/compared. In the long run we should probably make this a trait
-   * that can be mixed into each data type, and perhaps create an `AbstractDataType`.
-   */
-  // TODO: Should we consolidate this with RowOrdering.isOrderable?


Just curious: Do we need to do anything with RowOrdering.isOrderable given the change in this PR?

Nope, RowOrdering.isOrderable (which is used by TypeUtils.checkForOrderingExpr) returns true on a strict superset of this type collection as it works for complex types that need recursive checks.

gatorsmile · 2017-08-07T21:44:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

@@ -453,6 +453,14 @@ case class Or(left: Expression, right: Expression) extends BinaryOperator with P

 abstract class BinaryComparison extends BinaryOperator with Predicate {

+  override def inputType: AbstractDataType = AnyDataType


For LessThan-like operators, we should not accept AnyDataType , right?

Right, but the real type check is below in checkInputDataTypes

This will make the others confused. Could you revert them back?

We have to define inputType because it extends BinaryOperator. Previously the LessThan-like operators defined inputType was a subset of what they could actually support. This PR fixes that, but since the supported types can not be finitely specified as a type collection (there are a countably infinite number of legal StructType's), we need to give a superset of what is actually supported for inputType and then do the real recursive check in checkInputDataTypes. This is much like how the EqualTo and EqualNullSafe operators were previously implemented. In this PR we just move that logic up to BinaryComparison as it's really the same for equality and inequality operators.

Did that answer your concerns?

We have to write the code comments for explaining it. Given this assumption, inputType might be misused in the future.

In addition, this PR has a pretty critical change. We really need to check all the new data types we support in this PR. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala#L90-L97

Could you write a comprehensive test case coverage for this?

SparkQA · 2017-08-08T23:34:22Z

Test build #80416 has finished for PR 18818 at commit c4f43e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-08-09T06:17:49Z

Also add NULL in the test case?

viirya · 2017-08-09T08:26:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

@@ -465,7 +475,7 @@ abstract class BinaryComparison extends BinaryOperator with Predicate {
    }
  }

-  protected lazy val ordering = TypeUtils.getInterpretedOrdering(left.dataType)
+  protected lazy val ordering: Ordering[Any] = TypeUtils.getInterpretedOrdering(left.dataType)


The acceptable types in TypeUtils.getInterpretedOrdering are less than RowOrdering.isOrderable. It only accepts AtomicType, ArrayType and StructType.

NullType, UserDefinedType can cause problems.

As ordering is lazily accessed, and any nulls don't lead us to access it in those predicates,NullType should be safe. We should add related test.

addressed in cc2f3ec

…e and UDT.

SparkQA · 2017-08-14T21:17:15Z

Test build #80644 has finished for PR 18818 at commit cc2f3ec.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

aray · 2017-08-14T23:11:11Z

retest this please

SparkQA · 2017-08-15T01:46:30Z

Test build #80647 has finished for PR 18818 at commit cc2f3ec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

aray · 2017-08-15T04:55:47Z

@viirya @gatorsmile I have addressed your comments, could you take another look.

aray · 2017-08-31T14:55:13Z

ping @viirya @gatorsmile

gatorsmile · 2017-08-31T18:42:23Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

@@ -582,6 +582,7 @@ class CodegenContext {
    case array: ArrayType => genComp(array, c1, c2) + " == 0"
    case struct: StructType => genComp(struct, c1, c2) + " == 0"
    case udt: UserDefinedType[_] => genEqual(udt.sqlType, c1, c2)
+    case NullType => "true"


Is this required? Will it be covered by any test?

BTW, the value should be false.

I found the test case, but the test case is not affected by the value we generate here since it is under nullSafeCodeGen.

However, we should still return false when doing null = null

Yea, codegen fails without this. I had originally made the value false but when i noticed the codegen for comparison (https://github.com/aray/spark/blob/cc2f3eca28ee6b9faa87853568205307567827cc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L606) returned 0, I changed it to be consistent. Happy to change it back though.

gatorsmile · 2017-08-31T19:16:38Z

LGTM except one comment. Thanks for working on it!

SparkQA · 2017-08-31T22:05:15Z

Test build #81294 has finished for PR 18818 at commit 6e01186.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-08-31T22:07:39Z

Thanks! Merging to master.

aray added 2 commits August 1, 2017 15:52

should work needs tests

cf29fc5

update unit test with array and struct types

0d1fd56

aray added 2 commits August 2, 2017 10:22

fix style

d1c7565

make MyStruct private

ec8dc95

fix test

caf74bf

nchammas reviewed Aug 7, 2017

View reviewed changes

gatorsmile reviewed Aug 7, 2017

View reviewed changes

additional unit tests and comment for inputType

c4f43e9

gatorsmile mentioned this pull request Aug 9, 2017

[SPARK-21654][SQL] Complement SQL predicates expression description #18869

Closed

viirya reviewed Aug 9, 2017

View reviewed changes

Fix codegen fix for NullType, ordering for UDT's. Testing for NullTyp…

cc2f3ec

…e and UDT.

gatorsmile reviewed Aug 31, 2017

View reviewed changes

true => false

6e01186

asfgit closed this in cba69ae Aug 31, 2017

imarios mentioned this pull request Dec 22, 2017

[WIP] Window functions and column sorting typelevel/frameless#225

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818

[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818

aray commented Aug 2, 2017

SparkQA commented Aug 2, 2017

SparkQA commented Aug 2, 2017

SparkQA commented Aug 2, 2017

nchammas left a comment •

edited

Loading

nchammas Aug 7, 2017

aray Aug 7, 2017

gatorsmile Aug 7, 2017

aray Aug 8, 2017

gatorsmile Aug 8, 2017

aray Aug 8, 2017

gatorsmile Aug 8, 2017

SparkQA commented Aug 8, 2017

gatorsmile commented Aug 9, 2017

viirya Aug 9, 2017

viirya Aug 9, 2017

aray Aug 15, 2017

SparkQA commented Aug 14, 2017

aray commented Aug 14, 2017

SparkQA commented Aug 15, 2017

aray commented Aug 15, 2017

aray commented Aug 31, 2017

gatorsmile Aug 31, 2017

gatorsmile Aug 31, 2017

aray Aug 31, 2017

gatorsmile commented Aug 31, 2017

SparkQA commented Aug 31, 2017

gatorsmile commented Aug 31, 2017

		@@ -453,6 +453,14 @@ case class Or(left: Expression, right: Expression) extends BinaryOperator with P

		abstract class BinaryComparison extends BinaryOperator with Predicate {

		override def inputType: AbstractDataType = AnyDataType

[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818

[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818

Conversation

aray commented Aug 2, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Aug 2, 2017

SparkQA commented Aug 2, 2017

SparkQA commented Aug 2, 2017

nchammas left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 8, 2017

gatorsmile commented Aug 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 14, 2017

aray commented Aug 14, 2017

SparkQA commented Aug 15, 2017

aray commented Aug 15, 2017

aray commented Aug 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Aug 31, 2017

SparkQA commented Aug 31, 2017

gatorsmile commented Aug 31, 2017

nchammas left a comment •

edited

Loading