-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace TypeSignature::String with TypeSignature::Coercible for starts_with #14812
Conversation
Hi, @jayzhan211. I encountered some problems when trying to discard After a simple direct replacement, the original test failed. The specific test failure items are shown in this https://github.com/apache/datafusion/actions/runs/13457031878/job/37603204917?pr=14812. I think the problem may be caused by the need to modify the logic of this part of the code in this function https://github.com/zjregee/datafusion/blob/replace-type-signature/datafusion/expr/src/type_coercion/functions.rs#L603-L636, but I am not sure, and hope to get some help and suggestions. |
datafusion/datafusion/functions/src/string/starts_with.rs Lines 98 to 138 in 9ca09cf
datafusion/datafusion/physical-expr/src/expressions/like.rs Lines 159 to 179 in 9ca09cf
The problem seems like you need to cast |
9c7ce7d
to
0a84fdc
Compare
Hi, @jayzhan211. I'm sorry to bother you again. Before I start replacing other functions, I'd like to get some confirmation from you that my current implementation and handling are as expected. Thanks a lot! datafusion/datafusion/functions/src/string/starts_with.rs Lines 98 to 138 in 9ca09cf
After some experimentation, I ended up not handling the type conversion in The above is my superficial understanding. If there is anything that does not meet expectations, I hope to be corrected. |
@zjregee How about wrapping with Expr::Cast? The logic should be something similar to this let data_type_a = complex_expr.get_data_type();
let data_type_b = complex_expr.get_data_type();
if data_type_a != data_type_b {
target type = determine_common_type(data_type_a, data_type_b)
Expr::Cast(complex_expr, target type)
} |
datafusion/datafusion/optimizer/src/analyzer/type_coercion.rs Lines 391 to 415 in 6c5f214
This is how types coerced in like expression, I guess we might need a similar coerce logic
|
@jayzhan211. Got this, thank you very much for your help. In addition, some functions will also have similar problems that need to be solved during the replacement process. Maybe we can split it into several PRs. I will try to complete all of this in the next few days. |
sure |
I think this PR is ready for review now. cc: @jayzhan211 |
use datafusion_macros::user_doc; | ||
|
||
/// Returns true if string starts with prefix. | ||
/// starts_with('alphabet', 'alph') = 't' | ||
pub fn starts_with(args: &[ArrayRef]) -> Result<ArrayRef> { | ||
let result = arrow::compute::kernels::comparison::starts_with(&args[0], &args[1])?; | ||
let arg0 = arrow::compute::kernels::cast::cast(&args[0], args[1].data_type())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to use string_coercion
and binary_to_string_coercion
@@ -127,7 +137,7 @@ impl ScalarUDFImpl for StartsWithFunc { | |||
|
|||
return Ok(ExprSimplifyResult::Simplified(Expr::Like(Like { | |||
negated: false, | |||
expr: Box::new(args[0].clone()), | |||
expr: Box::new(cast(args[0].clone(), scalar_value.data_type())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to use string_coercion and binary_to_string_coercion
@@ -619,13 +619,13 @@ query TT | |||
explain select * from foo where starts_with(column1, 'f'); | |||
---- | |||
logical_plan | |||
01)Filter: foo.column1 LIKE Utf8View("f%") | |||
02)--TableScan: foo projection=[column1], partial_filters=[foo.column1 LIKE Utf8View("f%")] | |||
01)Filter: CAST(foo.column1 AS Utf8) LIKE Utf8("f%") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, casting array is less efficient, since you only has 2 characters "f%" on right hand side.
Even more, instead of calling make_scalar_function
, we can wrap scalar case to arrow::Scalar and call arrow::compute::kernels::comparison::starts_with
directly.
It accepts Datum
pub fn starts_with(left: &dyn Datum, right: &dyn Datum) -> Result<BooleanArray, ArrowError> {
like_op(Op::StartsWith, left, right)
}
4cc2188
to
9c37c65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
9c37c65
to
4df8fbe
Compare
Hi, @jayzhan211. I resubmitted the PR, and now we not only don't have redundant CAST, but also reduce the use of redundant CAST in some cases. I think the current treatment is appropriate, thank you very much for your help! |
12e5d4d
to
81e82b0
Compare
81e82b0
to
fefc503
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
Thanks @zjregee |
Which issue does this PR close?
TypeSignature::String
withTypeSignature::Coercible
#14759.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?