Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21654][SQL] Complement SQL predicates expression description #18869

Closed
wants to merge 8 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 7, 2017

What changes were proposed in this pull request?

SQL predicates don't have complete expression description. This patch goes to complement the description by adding arguments, examples.

This change also adds related test cases for the SQL predicate expressions.

How was this patch tested?

Existing tests. And added predicate test.

@viirya
Copy link
Member Author

viirya commented Aug 7, 2017

cc @HyukjinKwon @gatorsmile Please help to take a look when you have time. Thanks.

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80332 has finished for PR 18869 at commit abfa867.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2017

Test build #80334 has finished for PR 18869 at commit 9895843.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ExpressionDescription(
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than or equal to `expr2`.")
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than `expr2`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks there is a typo, missing or equal to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` equals `expr2`, or false otherwise.",
arguments = """
Arguments:
* expr1, expr2 - the two expressions must be same type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the type issues, could you check whether this statement is true? I think we can support different types.

SELECT 1 = '1'

This will return true.

Also, could you add a test case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The rule ImplicitTypeCasts will specially do type casting for BinaryOperator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the statement.

@gatorsmile
Copy link
Member

Could you check all of binary predicates you are trying to modify? Thanks!

@viirya
Copy link
Member Author

viirya commented Aug 9, 2017

@gatorsmile I've checked the binary predicates in this change. Related test cases are added.

usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is less than `expr2`.",
arguments = """
Arguments:
* expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya, BTW, I think we could also do this as below:

-      * expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared.
+      * expr1, expr2 - the two expressions must be same type or can be casted to a common type,
+          and must be a type that can be ordered/compared.

Just double checked it renders fine in the doc as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@gatorsmile
Copy link
Member

gatorsmile commented Aug 9, 2017

@viirya Could you hold this PR at first? There is a related PR #18818 that is changing the underlying semantics. We also need to update the description after that PR. Also, welcome to review that PR too.

@SparkQA
Copy link

SparkQA commented Aug 9, 2017

Test build #80435 has finished for PR 18869 at commit 1369fd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 9, 2017

@gatorsmile Sure. Let's wait after #18818. I'll look at it too.

@SparkQA
Copy link

SparkQA commented Aug 9, 2017

Test build #80440 has finished for PR 18869 at commit b64c9e6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2017

Test build #80439 has finished for PR 18869 at commit bca2b0b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 1, 2017

@gatorsmile @HyukjinKwon #18818 is merged now. Can this PR go ahead?

As what #18818 did is to allow structs, arrays to be input expression for predicates, currently looks like we don't have description to say those types are not supported.

@viirya
Copy link
Member Author

viirya commented Sep 1, 2017

retest this please.

@gatorsmile
Copy link
Member

map is not supported, right?

@viirya
Copy link
Member Author

viirya commented Sep 1, 2017

@gatorsmile Right. Isn't too verbose if we describe map is not supported in each description?

@SparkQA
Copy link

SparkQA commented Sep 1, 2017

Test build #81306 has finished for PR 18869 at commit b64c9e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 1, 2017

Previously Equal and EqualNullSafe don't state that the type must can be used in equality comparison. Now added the note.

@SparkQA
Copy link

SparkQA commented Sep 1, 2017

Test build #81311 has finished for PR 18869 at commit 099c671.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@gatorsmile, I understand the previously discussed point, the documentation should be correct, tested well and in details. However, if I observed this correctly, this argument descriptions have not been well filled up for the last year (since I proposed Arguments description) - if I checked correctly, we have three 3 instances fixed, replace, rlike and like, out of 266 instances.

I mean, practically, it sounds a bit demanding to me. (In this way, for the perfectness, I think we should also require to add test cases with all the possible types and check if there is any duplicated test too). I wonder if we really should add the examples about all the cases we assume users might be confused of, and describe it that precisely for the current state, given the information did not exist before.

Of course, it is important and it should be correct but I have been thinking of incrementally improving this, rather than requesting the perfectly correct documentation for PRs, practically to roll it better.

What do you think about minimising the test cases and rather trying to describe it a bit more in an abstract, short and general manner for now? We could ask additional test cases for additional information in the followup PRs.

For this PR, it looks good enough to me (to be honest, the examples and test cases look too much) but I believe I need your opinion and I think we should have a compromised position.

@@ -0,0 +1,42 @@
-- In
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these to PredicateSuite and then we can use checkEvaluation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I feel the test coverage might be out of the scope of this PR a little bit. This PR is for the expression description. If you don't insist we add test here together, I'd like to submit other PR to improve the test coverage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not have a test case, how can we know the description is correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explain it a little bit more. No need to add more and more test cases in this PR. We need to ensure anything claimed in the descriptions is well tested; if no test case exists, we need to add it into this PR. It should include both negative and positive test cases. For example, map is not supported; and then this needs to be documented and tested.

Copy link
Member Author

@viirya viirya Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the test cases should be added along with the codes which defines the semantics or the code changes such as #18818 that changes the underlying semantics.

For example, I saw there are tests added to test map is not supported by #18818:

assertError(EqualTo('mapField, 'mapField), "EqualTo does not support ordering on type MapType")
assertError(EqualNullSafe('mapField, 'mapField),
"EqualNullSafe does not support ordering on type MapType")
assertError(LessThan('mapField, 'mapField),
"LessThan does not support ordering on type MapType")
assertError(LessThanOrEqual('mapField, 'mapField),
"LessThanOrEqual does not support ordering on type MapType")
assertError(GreaterThan('mapField, 'mapField),
"GreaterThan does not support ordering on type MapType")
assertError(GreaterThanOrEqual('mapField, 'mapField),
"GreaterThanOrEqual does not support ordering on type MapType")

More tests are always good and not harm, I agreed. If the necessary tests are not added before, we should add and improve them.

@gatorsmile
Copy link
Member

If @viirya were a new contributor, I am fine to merge it.

In contrast, @viirya is experienced and knew the code base pretty well. Thus, I hope we can make our docs and test coverage much better.

Below is the doc I am comparing. You can easily see the gap we have. The test cases we have are so few. We will see more bugs when adding new test cases. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000746.html

select 1.0 = '1';

-- GreaterThan
select 1 > '1';
Copy link
Member Author

@viirya viirya Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following test cases are intended to test the end-to-end comparison between different types. It doesn't make much sense to re-write them with checkEvaluation. We'd have something like checkEvaluation(GreaterThan(Cast(...), ...), true) with manually added Cast. We have unit tests against Cast and GreaterThan individually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test cases that trigger implicit type casting, we can keep them here.

select 1 in(1, 2, 3);
select 1 in(2, 3, 4);
select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));
Copy link
Member Author

@viirya viirya Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For In, I've locally complemented its test coverage in PredicateSuite and use checkEvaluation. I'd prefer to have it in a following PR if you don't insist to add it in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the existing test for In is not well covered. We should improve it. I will commit the tests I added locally.

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81347 has finished for PR 18869 at commit a8bb457.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81348 has finished for PR 18869 at commit 54c71e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81349 has finished for PR 18869 at commit 444c64d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("IN with different types") {
def testWithRandomDataGeneration(dataType: DataType, nullable: Boolean): Unit = {
val dataGen = RandomDataGenerator.forType(dataType, nullable = nullable)
if (dataGen.isDefined) {
Copy link
Member

@gatorsmile gatorsmile Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data type is not supported, it will silently skip the test. We can do something like

      val maybeDataGen = RandomDataGenerator.forType(dataType, nullable = nullable)
      val dataGen = maybeDataGen.getOrElse(fail(s"Failed to create data generator for type $dataType"))

Copy link
Member Author

@viirya viirya Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I already filter out the unsupported data types when constructing atomicTypes, this is no harm. So ok for me. I will update it.

@gatorsmile
Copy link
Member

Could you also add negative test cases for map?

@viirya
Copy link
Member Author

viirya commented Sep 3, 2017

@gatorsmile As I said in #18869 (comment), we already have tests in ExpressionTypeCheckingSuite to check the predicates don't support map type. Do we need to add more negative test cases for map?

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81356 has finished for PR 18869 at commit ec9199a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81359 has finished for PR 18869 at commit ec9199a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 3, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 3, 2017

Test build #81362 has finished for PR 18869 at commit ec9199a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 4, 2017

@HyukjinKwon @gatorsmile Any more comments on this? The added tests should be enough.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@viirya
Copy link
Member Author

viirya commented Sep 4, 2017

Thanks @HyukjinKwon @gatorsmile

@asfgit asfgit closed this in 9f30d92 Sep 4, 2017
@viirya viirya deleted the SPARK-21654 branch December 27, 2023 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants