-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21654][SQL] Complement SQL predicates expression description #18869
Conversation
cc @HyukjinKwon @gatorsmile Please help to take a look when you have time. Thanks. |
Test build #80332 has finished for PR 18869 at commit
|
Test build #80334 has finished for PR 18869 at commit
|
@ExpressionDescription( | ||
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than or equal to `expr2`.") | ||
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is greater than `expr2`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks there is a typo, missing or equal to
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` equals `expr2`, or false otherwise.", | ||
arguments = """ | ||
Arguments: | ||
* expr1, expr2 - the two expressions must be same type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the type issues, could you check whether this statement is true? I think we can support different types.
SELECT 1 = '1'
This will return true.
Also, could you add a test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The rule ImplicitTypeCasts
will specially do type casting for BinaryOperator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the statement.
Could you check all of binary predicates you are trying to modify? Thanks! |
@gatorsmile I've checked the binary predicates in this change. Related test cases are added. |
usage = "expr1 _FUNC_ expr2 - Returns true if `expr1` is less than `expr2`.", | ||
arguments = """ | ||
Arguments: | ||
* expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya, BTW, I think we could also do this as below:
- * expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be ordered/compared.
+ * expr1, expr2 - the two expressions must be same type or can be casted to a common type,
+ and must be a type that can be ordered/compared.
Just double checked it renders fine in the doc as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
Test build #80435 has finished for PR 18869 at commit
|
@gatorsmile Sure. Let's wait after #18818. I'll look at it too. |
Test build #80440 has finished for PR 18869 at commit
|
Test build #80439 has finished for PR 18869 at commit
|
@gatorsmile @HyukjinKwon #18818 is merged now. Can this PR go ahead? As what #18818 did is to allow structs, arrays to be input expression for predicates, currently looks like we don't have description to say those types are not supported. |
retest this please. |
|
@gatorsmile Right. Isn't too verbose if we describe map is not supported in each description? |
Test build #81306 has finished for PR 18869 at commit
|
… Equal/EqualNullSafe..
Previously |
Test build #81311 has finished for PR 18869 at commit
|
@gatorsmile, I understand the previously discussed point, the documentation should be correct, tested well and in details. However, if I observed this correctly, this argument descriptions have not been well filled up for the last year (since I proposed I mean, practically, it sounds a bit demanding to me. (In this way, for the perfectness, I think we should also require to add test cases with all the possible types and check if there is any duplicated test too). I wonder if we really should add the examples about all the cases we assume users might be confused of, and describe it that precisely for the current state, given the information did not exist before. Of course, it is important and it should be correct but I have been thinking of incrementally improving this, rather than requesting the perfectly correct documentation for PRs, practically to roll it better. What do you think about minimising the test cases and rather trying to describe it a bit more in an abstract, short and general manner for now? We could ask additional test cases for additional information in the followup PRs. For this PR, it looks good enough to me (to be honest, the examples and test cases look too much) but I believe I need your opinion and I think we should have a compromised position. |
@@ -0,0 +1,42 @@ | |||
-- In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move these to PredicateSuite
and then we can use checkEvaluation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I feel the test coverage might be out of the scope of this PR a little bit. This PR is for the expression description. If you don't insist we add test here together, I'd like to submit other PR to improve the test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have a test case, how can we know the description is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain it a little bit more. No need to add more and more test cases in this PR. We need to ensure anything claimed in the descriptions is well tested; if no test case exists, we need to add it into this PR. It should include both negative and positive test cases. For example, map is not supported; and then this needs to be documented and tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the test cases should be added along with the codes which defines the semantics or the code changes such as #18818 that changes the underlying semantics.
For example, I saw there are tests added to test map is not supported by #18818:
Lines 112 to 122 in cba69ae
assertError(EqualTo('mapField, 'mapField), "EqualTo does not support ordering on type MapType") | |
assertError(EqualNullSafe('mapField, 'mapField), | |
"EqualNullSafe does not support ordering on type MapType") | |
assertError(LessThan('mapField, 'mapField), | |
"LessThan does not support ordering on type MapType") | |
assertError(LessThanOrEqual('mapField, 'mapField), | |
"LessThanOrEqual does not support ordering on type MapType") | |
assertError(GreaterThan('mapField, 'mapField), | |
"GreaterThan does not support ordering on type MapType") | |
assertError(GreaterThanOrEqual('mapField, 'mapField), | |
"GreaterThanOrEqual does not support ordering on type MapType") |
More tests are always good and not harm, I agreed. If the necessary tests are not added before, we should add and improve them.
If @viirya were a new contributor, I am fine to merge it. In contrast, @viirya is experienced and knew the code base pretty well. Thus, I hope we can make our docs and test coverage much better. Below is the doc I am comparing. You can easily see the gap we have. The test cases we have are so few. We will see more bugs when adding new test cases. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000746.html |
select 1.0 = '1'; | ||
|
||
-- GreaterThan | ||
select 1 > '1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following test cases are intended to test the end-to-end comparison between different types. It doesn't make much sense to re-write them with checkEvaluation
. We'd have something like checkEvaluation(GreaterThan(Cast(...), ...), true)
with manually added Cast
. We have unit tests against Cast
and GreaterThan
individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test cases that trigger implicit type casting, we can keep them here.
select 1 in(1, 2, 3); | ||
select 1 in(2, 3, 4); | ||
select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3)); | ||
select named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For In
, I've locally complemented its test coverage in PredicateSuite
and use checkEvaluation
. I'd prefer to have it in a following PR if you don't insist to add it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the existing test for In
is not well covered. We should improve it. I will commit the tests I added locally.
Test build #81347 has finished for PR 18869 at commit
|
Test build #81348 has finished for PR 18869 at commit
|
Test build #81349 has finished for PR 18869 at commit
|
test("IN with different types") { | ||
def testWithRandomDataGeneration(dataType: DataType, nullable: Boolean): Unit = { | ||
val dataGen = RandomDataGenerator.forType(dataType, nullable = nullable) | ||
if (dataGen.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data type is not supported, it will silently skip the test. We can do something like
val maybeDataGen = RandomDataGenerator.forType(dataType, nullable = nullable)
val dataGen = maybeDataGen.getOrElse(fail(s"Failed to create data generator for type $dataType"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I already filter out the unsupported data types when constructing atomicTypes
, this is no harm. So ok for me. I will update it.
Could you also add negative test cases for map? |
@gatorsmile As I said in #18869 (comment), we already have tests in |
Test build #81356 has finished for PR 18869 at commit
|
retest this please |
Test build #81359 has finished for PR 18869 at commit
|
retest this please. |
Test build #81362 has finished for PR 18869 at commit
|
@HyukjinKwon @gatorsmile Any more comments on this? The added tests should be enough. |
LGTM |
Thanks! Merging to master. |
Thanks @HyukjinKwon @gatorsmile |
What changes were proposed in this pull request?
SQL predicates don't have complete expression description. This patch goes to complement the description by adding arguments, examples.
This change also adds related test cases for the SQL predicate expressions.
How was this patch tested?
Existing tests. And added predicate test.