-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21110][SQL] Structs, arrays, and other orderable datatypes should be usable in inequalities #18818
Conversation
Test build #80161 has finished for PR 18818 at commit
|
Test build #80162 has finished for PR 18818 at commit
|
Test build #80170 has finished for PR 18818 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not my area of expertise, but happy to see the changeset is very small. Maybe @HyukjinKwon, @viirya, or @gatorsmile can provide a more thorough review here.
@aray - Does the example from the original feature request work with this PR?
* Types that can be ordered/compared. In the long run we should probably make this a trait | ||
* that can be mixed into each data type, and perhaps create an `AbstractDataType`. | ||
*/ | ||
// TODO: Should we consolidate this with RowOrdering.isOrderable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Do we need to do anything with RowOrdering.isOrderable
given the change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, RowOrdering.isOrderable
(which is used by TypeUtils.checkForOrderingExpr
) returns true on a strict superset of this type collection as it works for complex types that need recursive checks.
@@ -453,6 +453,14 @@ case class Or(left: Expression, right: Expression) extends BinaryOperator with P | |||
|
|||
abstract class BinaryComparison extends BinaryOperator with Predicate { | |||
|
|||
override def inputType: AbstractDataType = AnyDataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LessThan
-like operators, we should not accept AnyDataType
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the real type check is below in checkInputDataTypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the others confused. Could you revert them back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to define inputType
because it extends BinaryOperator
. Previously the LessThan
-like operators defined inputType
was a subset of what they could actually support. This PR fixes that, but since the supported types can not be finitely specified as a type collection (there are a countably infinite number of legal StructType
's), we need to give a superset of what is actually supported for inputType
and then do the real recursive check in checkInputDataTypes
. This is much like how the EqualTo
and EqualNullSafe
operators were previously implemented. In this PR we just move that logic up to BinaryComparison
as it's really the same for equality and inequality operators.
Did that answer your concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to write the code comments for explaining it. Given this assumption, inputType
might be misused in the future.
In addition, this PR has a pretty critical change. We really need to check all the new data types we support in this PR. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala#L90-L97
Could you write a comprehensive test case coverage for this?
Test build #80416 has finished for PR 18818 at commit
|
Also add |
@@ -465,7 +475,7 @@ abstract class BinaryComparison extends BinaryOperator with Predicate { | |||
} | |||
} | |||
|
|||
protected lazy val ordering = TypeUtils.getInterpretedOrdering(left.dataType) | |||
protected lazy val ordering: Ordering[Any] = TypeUtils.getInterpretedOrdering(left.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The acceptable types in TypeUtils.getInterpretedOrdering
are less than RowOrdering.isOrderable
. It only accepts AtomicType
, ArrayType
and StructType
.
NullType
, UserDefinedType
can cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As ordering
is lazily accessed, and any nulls don't lead us to access it in those predicates,NullType
should be safe. We should add related test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in cc2f3ec
Test build #80644 has finished for PR 18818 at commit
|
retest this please |
Test build #80647 has finished for PR 18818 at commit
|
@viirya @gatorsmile I have addressed your comments, could you take another look. |
ping @viirya @gatorsmile |
@@ -582,6 +582,7 @@ class CodegenContext { | |||
case array: ArrayType => genComp(array, c1, c2) + " == 0" | |||
case struct: StructType => genComp(struct, c1, c2) + " == 0" | |||
case udt: UserDefinedType[_] => genEqual(udt.sqlType, c1, c2) | |||
case NullType => "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? Will it be covered by any test?
BTW, the value should be false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the test case, but the test case is not affected by the value we generate here since it is under nullSafeCodeGen
.
However, we should still return false
when doing null = null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, codegen fails without this. I had originally made the value false
but when i noticed the codegen for comparison (https://github.com/aray/spark/blob/cc2f3eca28ee6b9faa87853568205307567827cc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L606) returned 0
, I changed it to be consistent. Happy to change it back though.
LGTM except one comment. Thanks for working on it! |
Test build #81294 has finished for PR 18818 at commit
|
Thanks! Merging to master. |
What changes were proposed in this pull request?
Allows
BinaryComparison
operators to work on any data type that actually supports ordering as verified byTypeUtils.checkForOrderingExpr
instead of relying on the incomplete listTypeCollection.Ordered
(which is removed by this PR).How was this patch tested?
Updated unit tests to cover structs and arrays.