[SPARK-21654][SQL] Complement SQL predicates expression description #18869

viirya · 2017-08-07T08:44:39Z

What changes were proposed in this pull request?

SQL predicates don't have complete expression description. This patch goes to complement the description by adding arguments, examples.

This change also adds related test cases for the SQL predicate expressions.

How was this patch tested?

Existing tests. And added predicate test.

viirya · 2017-08-07T08:45:12Z

cc @HyukjinKwon @gatorsmile Please help to take a look when you have time. Thanks.

SparkQA · 2017-08-07T08:49:04Z

Test build #80332 has finished for PR 18869 at commit abfa867.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-07T11:33:12Z

Test build #80334 has finished for PR 18869 at commit 9895843.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-08-07T12:11:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

 @ExpressionDescription(
-  usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than or equal to `expr2`.")
+  usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than `expr2`.",


Looks there is a typo, missing or equal to.

Thanks. Fixed.

gatorsmile · 2017-08-07T20:23:30Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

+  usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` equals `expr2`, or false otherwise.",
+  arguments = """
+    Arguments:
+      * expr1, expr2 - the two expressions must be same type.


Regarding the type issues, could you check whether this statement is true? I think we can support different types.

SELECT 1 = '1'

This will return true.

Also, could you add a test case?

Right. The rule ImplicitTypeCasts will specially do type casting for BinaryOperator.

I've modified the statement.

gatorsmile · 2017-08-07T20:25:40Z

Could you check all of binary predicates you are trying to modify? Thanks!

…cases.

viirya · 2017-08-09T03:35:05Z

@gatorsmile I've checked the binary predicates in this change. Related test cases are added.

HyukjinKwon · 2017-08-09T04:32:09Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala

+  usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is less than `expr2`.",
+  arguments = """
+    Arguments:
+      * expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared.


@viirya, BTW, I think we could also do this as below:

- * expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared. + * expr1, expr2 - the two expressions must be same type or can be casted to a common type, + and must be a type that can be ordered/compared.

Just double checked it renders fine in the doc as well.

Thanks. Fixed.

gatorsmile · 2017-08-09T06:06:12Z

@viirya Could you hold this PR at first? There is a related PR #18818 that is changing the underlying semantics. We also need to update the description after that PR. Also, welcome to review that PR too.

SparkQA · 2017-08-09T06:16:46Z

Test build #80435 has finished for PR 18869 at commit 1369fd5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-08-09T06:17:29Z

@gatorsmile Sure. Let's wait after #18818. I'll look at it too.

SparkQA · 2017-08-09T07:04:51Z

Test build #80440 has finished for PR 18869 at commit b64c9e6.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-09T07:04:51Z

Test build #80439 has finished for PR 18869 at commit bca2b0b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-09-01T03:53:52Z

@gatorsmile @HyukjinKwon #18818 is merged now. Can this PR go ahead?

As what #18818 did is to allow structs, arrays to be input expression for predicates, currently looks like we don't have description to say those types are not supported.

viirya · 2017-09-01T03:55:10Z

retest this please.

gatorsmile · 2017-09-01T05:42:12Z

map is not supported, right?

viirya · 2017-09-01T06:08:09Z

@gatorsmile Right. Isn't too verbose if we describe map is not supported in each description?

SparkQA · 2017-09-01T06:30:12Z

Test build #81306 has finished for PR 18869 at commit b64c9e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… Equal/EqualNullSafe..

viirya · 2017-09-01T08:29:10Z

Previously Equal and EqualNullSafe don't state that the type must can be used in equality comparison. Now added the note.

SparkQA · 2017-09-01T10:55:31Z

Test build #81311 has finished for PR 18869 at commit 099c671.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-09-01T16:08:50Z

@gatorsmile, I understand the previously discussed point, the documentation should be correct, tested well and in details. However, if I observed this correctly, this argument descriptions have not been well filled up for the last year (since I proposed Arguments description) - if I checked correctly, we have three 3 instances fixed, replace, rlike and like, out of 266 instances.

I mean, practically, it sounds a bit demanding to me. (In this way, for the perfectness, I think we should also require to add test cases with all the possible types and check if there is any duplicated test too). I wonder if we really should add the examples about all the cases we assume users might be confused of, and describe it that precisely for the current state, given the information did not exist before.

Of course, it is important and it should be correct but I have been thinking of incrementally improving this, rather than requesting the perfectly correct documentation for PRs, practically to roll it better.

What do you think about minimising the test cases and rather trying to describe it a bit more in an abstract, short and general manner for now? We could ask additional test cases for additional information in the followup PRs.

For this PR, it looks good enough to me (to be honest, the examples and test cases look too much) but I believe I need your opinion and I think we should have a compromised position.

gatorsmile · 2017-09-01T16:38:01Z

sql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql

@@ -0,0 +1,42 @@
+-- In


Could we move these to PredicateSuite and then we can use checkEvaluation?

Actually I feel the test coverage might be out of the scope of this PR a little bit. This PR is for the expression description. If you don't insist we add test here together, I'd like to submit other PR to improve the test coverage.

If we do not have a test case, how can we know the description is correct?

Let me explain it a little bit more. No need to add more and more test cases in this PR. We need to ensure anything claimed in the descriptions is well tested; if no test case exists, we need to add it into this PR. It should include both negative and positive test cases. For example, map is not supported; and then this needs to be documented and tested.

Ideally, the test cases should be added along with the codes which defines the semantics or the code changes such as #18818 that changes the underlying semantics.

For example, I saw there are tests added to test map is not supported by #18818:

spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala

Lines 112 to 122 in cba69ae

assertError(EqualTo('mapField, 'mapField), "EqualTo does not support ordering on type MapType")

assertError(EqualNullSafe('mapField, 'mapField),

"EqualNullSafe does not support ordering on type MapType")

assertError(LessThan('mapField, 'mapField),

"LessThan does not support ordering on type MapType")

assertError(LessThanOrEqual('mapField, 'mapField),

"LessThanOrEqual does not support ordering on type MapType")

assertError(GreaterThan('mapField, 'mapField),

"GreaterThan does not support ordering on type MapType")

assertError(GreaterThanOrEqual('mapField, 'mapField),

"GreaterThanOrEqual does not support ordering on type MapType")

More tests are always good and not harm, I agreed. If the necessary tests are not added before, we should add and improve them.

gatorsmile · 2017-09-01T16:52:20Z

If @viirya were a new contributor, I am fine to merge it.

In contrast, @viirya is experienced and knew the code base pretty well. Thus, I hope we can make our docs and test coverage much better.

Below is the doc I am comparing. You can easily see the gap we have. The test cases we have are so few. We will see more bugs when adding new test cases. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000746.html

viirya · 2017-09-02T05:14:17Z

sql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql

+select 1.0 = '1';
+
+-- GreaterThan
+select 1 > '1';


The following test cases are intended to test the end-to-end comparison between different types. It doesn't make much sense to re-write them with checkEvaluation. We'd have something like checkEvaluation(GreaterThan(Cast(...), ...), true) with manually added Cast. We have unit tests against Cast and GreaterThan individually.

For the test cases that trigger implicit type casting, we can keep them here.

viirya · 2017-09-02T05:17:36Z

sql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql

+select 1 in(1, 2, 3);
+select 1 in(2, 3, 4);
+select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
+select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));


For In, I've locally complemented its test coverage in PredicateSuite and use checkEvaluation. I'd prefer to have it in a following PR if you don't insist to add it in this PR.

Looks like the existing test for In is not well covered. We should improve it. I will commit the tests I added locally.

SparkQA · 2017-09-03T05:19:40Z

Test build #81347 has finished for PR 18869 at commit a8bb457.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-03T06:14:47Z

Test build #81348 has finished for PR 18869 at commit 54c71e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-03T06:17:21Z

Test build #81349 has finished for PR 18869 at commit 444c64d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-09-03T06:18:48Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala

+  test("IN with different types") {
+    def testWithRandomDataGeneration(dataType: DataType, nullable: Boolean): Unit = {
+      val dataGen = RandomDataGenerator.forType(dataType, nullable = nullable)
+      if (dataGen.isDefined) {


If the data type is not supported, it will silently skip the test. We can do something like

val maybeDataGen = RandomDataGenerator.forType(dataType, nullable = nullable) val dataGen = maybeDataGen.getOrElse(fail(s"Failed to create data generator for type $dataType"))

Although I already filter out the unsupported data types when constructing atomicTypes, this is no harm. So ok for me. I will update it.

gatorsmile · 2017-09-03T06:21:58Z

Could you also add negative test cases for map?

viirya · 2017-09-03T06:40:55Z

@gatorsmile As I said in #18869 (comment), we already have tests in ExpressionTypeCheckingSuite to check the predicates don't support map type. Do we need to add more negative test cases for map?

SparkQA · 2017-09-03T12:46:37Z

Test build #81356 has finished for PR 18869 at commit ec9199a.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-09-03T13:04:10Z

retest this please

SparkQA · 2017-09-03T15:14:24Z

Test build #81359 has finished for PR 18869 at commit ec9199a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-09-03T15:23:50Z

retest this please.

SparkQA · 2017-09-03T18:10:11Z

Test build #81362 has finished for PR 18869 at commit ec9199a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-09-04T03:51:29Z

@HyukjinKwon @gatorsmile Any more comments on this? The added tests should be enough.

gatorsmile · 2017-09-04T04:54:58Z

LGTM

gatorsmile · 2017-09-04T04:55:32Z

Thanks! Merging to master.

viirya · 2017-09-04T04:56:44Z

Thanks @HyukjinKwon @gatorsmile

viirya force-pushed the SPARK-21654 branch from abfa867 to 7bf800b Compare August 7, 2017 08:57

Complement SQL predicates document.

9895843

viirya force-pushed the SPARK-21654 branch from 7bf800b to 9895843 Compare August 7, 2017 09:01

HyukjinKwon reviewed Aug 7, 2017

View reviewed changes

gatorsmile reviewed Aug 7, 2017

View reviewed changes

More clear statement on binary comparison expressions. Add some test …

1369fd5

…cases.

HyukjinKwon reviewed Aug 9, 2017

View reviewed changes

viirya added 2 commits August 9, 2017 04:36

Adjust line break.

bca2b0b

Remove unnecessary scalastyle off comments.

b64c9e6

Merge remote-tracking branch 'upstream/master' into SPARK-21654

d8f1479

Note that only accept the type can be used for equality comparison in…

099c671

… Equal/EqualNullSafe..

gatorsmile reviewed Sep 1, 2017

View reviewed changes

viirya commented Sep 2, 2017

View reviewed changes

viirya force-pushed the SPARK-21654 branch from a8bb457 to 54c71e1 Compare September 3, 2017 03:31

Improve test case for In.

444c64d

viirya force-pushed the SPARK-21654 branch from 54c71e1 to 444c64d Compare September 3, 2017 03:38

gatorsmile reviewed Sep 3, 2017

View reviewed changes

Add negative test for map type.

ec9199a

asfgit closed this in 9f30d92 Sep 4, 2017

viirya deleted the SPARK-21654 branch December 27, 2023 18:34

	assertError(EqualTo('mapField, 'mapField), "EqualTo does not support ordering on type MapType")
	assertError(EqualNullSafe('mapField, 'mapField),
	"EqualNullSafe does not support ordering on type MapType")
	assertError(LessThan('mapField, 'mapField),
	"LessThan does not support ordering on type MapType")
	assertError(LessThanOrEqual('mapField, 'mapField),
	"LessThanOrEqual does not support ordering on type MapType")
	assertError(GreaterThan('mapField, 'mapField),
	"GreaterThan does not support ordering on type MapType")
	assertError(GreaterThanOrEqual('mapField, 'mapField),
	"GreaterThanOrEqual does not support ordering on type MapType")

[SPARK-21654][SQL] Complement SQL predicates expression description #18869

[SPARK-21654][SQL] Complement SQL predicates expression description #18869

Conversation

viirya commented Aug 7, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

viirya commented Aug 7, 2017

SparkQA commented Aug 7, 2017

SparkQA commented Aug 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Aug 7, 2017

viirya commented Aug 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Aug 9, 2017 • edited Loading

SparkQA commented Aug 9, 2017

viirya commented Aug 9, 2017

SparkQA commented Aug 9, 2017

SparkQA commented Aug 9, 2017

viirya commented Sep 1, 2017 • edited Loading

viirya commented Sep 1, 2017

gatorsmile commented Sep 1, 2017

viirya commented Sep 1, 2017

SparkQA commented Sep 1, 2017

viirya commented Sep 1, 2017

SparkQA commented Sep 1, 2017

HyukjinKwon commented Sep 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Sep 3, 2017 • edited Loading

Choose a reason for hiding this comment

gatorsmile commented Sep 1, 2017

viirya Sep 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Sep 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 3, 2017

SparkQA commented Sep 3, 2017

SparkQA commented Sep 3, 2017

gatorsmile Sep 3, 2017 • edited Loading

Choose a reason for hiding this comment

viirya Sep 3, 2017 • edited Loading

Choose a reason for hiding this comment

gatorsmile commented Sep 3, 2017

viirya commented Sep 3, 2017

SparkQA commented Sep 3, 2017

HyukjinKwon commented Sep 3, 2017

SparkQA commented Sep 3, 2017

viirya commented Sep 3, 2017

SparkQA commented Sep 3, 2017

viirya commented Sep 4, 2017

gatorsmile commented Sep 4, 2017

gatorsmile commented Sep 4, 2017

viirya commented Sep 4, 2017

viirya commented Aug 7, 2017 •

edited

Loading

gatorsmile commented Aug 9, 2017 •

edited

Loading

viirya commented Sep 1, 2017 •

edited

Loading

viirya Sep 3, 2017 •

edited

Loading

viirya Sep 2, 2017 •

edited

Loading

viirya Sep 2, 2017 •

edited

Loading

gatorsmile Sep 3, 2017 •

edited

Loading

viirya Sep 3, 2017 •

edited

Loading