Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add ArrayAndElementAndOptionalIndex for proper casting in array_position #9233

Merged
merged 9 commits into from
Feb 19, 2024
2 changes: 1 addition & 1 deletion datafusion/expr/src/built_in_function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -965,7 +965,7 @@ impl BuiltinScalarFunction {
BuiltinScalarFunction::ArrayNdims => Signature::any(1, self.volatility()),
BuiltinScalarFunction::ArrayDistinct => Signature::any(1, self.volatility()),
BuiltinScalarFunction::ArrayPosition => {
Signature::variadic_any(self.volatility())
Signature::array_and_element_and_optional_index(self.volatility())
}
BuiltinScalarFunction::ArrayPositions => {
Signature::array_and_element(self.volatility())
Expand Down
15 changes: 14 additions & 1 deletion datafusion/expr/src/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ pub enum TypeSignature {
/// DataFusion attempts to coerce all argument types to match the first argument's type
///
/// # Examples
/// Given types in signature should be coericible to the same final type.
/// Given types in signature should be coercible to the same final type.
/// A function such as `make_array` is `VariadicEqual`.
///
/// `make_array(i32, i64) -> make_array(i64, i64)`
Expand Down Expand Up @@ -133,6 +133,7 @@ pub enum ArrayFunctionSignature {
/// The first argument's list dimension should be one dimension less than the second argument's list dimension.
ElementAndArray,
ArrayAndIndex,
ArrayAndElementAndOptionalIndex,
}

impl std::fmt::Display for ArrayFunctionSignature {
Expand All @@ -141,6 +142,9 @@ impl std::fmt::Display for ArrayFunctionSignature {
ArrayFunctionSignature::ArrayAndElement => {
write!(f, "array, element")
}
ArrayFunctionSignature::ArrayAndElementAndOptionalIndex => {
write!(f, "array, element, [index]")
}
ArrayFunctionSignature::ElementAndArray => {
write!(f, "element, array")
}
Expand Down Expand Up @@ -292,6 +296,15 @@ impl Signature {
volatility,
}
}
/// Specialized Signature for Array functions with an optional index
pub fn array_and_element_and_optional_index(volatility: Volatility) -> Self {
Signature {
type_signature: TypeSignature::ArraySignature(
ArrayFunctionSignature::ArrayAndElementAndOptionalIndex,
),
volatility,
}
}
/// Specialized Signature for ArrayPrepend and similar functions
pub fn element_and_array(volatility: Volatility) -> Self {
Signature {
Expand Down
28 changes: 28 additions & 0 deletions datafusion/expr/src/type_coercion/functions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,31 @@ fn get_valid_types(
_ => Ok(vec![vec![]]),
}
}
fn array_append_and_optional_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_element_and_optional_index seems a better name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in 5086f31

current_types: &[DataType],
) -> Result<Vec<Vec<DataType>>> {
// make sure there's at least 2 arguments
if !(current_types.len() == 2 || current_types.len() == 3) {
return Ok(vec![vec![]]);
}

let first_two_types = &current_types[0..2];
let valid_types = array_append_or_prepend_valid_types(first_two_types, true)?;

let valid_types_with_index = valid_types
Copy link
Contributor

@jayzhan211 jayzhan211 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optionally append valid_types_with_index if the length is 3, otherwise, return here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in 5086f31.

.iter()
.map(|t| {
let mut t = t.clone();
t.push(DataType::Int64);
t
})
.collect::<Vec<_>>();

let mut valid_types = valid_types;
valid_types.extend(valid_types_with_index);

Ok(valid_types)
}
fn array_and_index(current_types: &[DataType]) -> Result<Vec<Vec<DataType>>> {
if current_types.len() != 2 {
return Ok(vec![vec![]]);
Expand Down Expand Up @@ -184,6 +209,9 @@ fn get_valid_types(
ArrayFunctionSignature::ArrayAndElement => {
return array_append_or_prepend_valid_types(current_types, true)
}
ArrayFunctionSignature::ArrayAndElementAndOptionalIndex => {
return array_append_and_optional_index(current_types)
}
ArrayFunctionSignature::ArrayAndIndex => {
return array_and_index(current_types)
}
Expand Down
10 changes: 10 additions & 0 deletions datafusion/sqllogictest/test_files/array.slt
Original file line number Diff line number Diff line change
Expand Up @@ -2603,6 +2603,16 @@ select array_position(arrow_cast(make_array([1, 2, 3], [4, 5, 6], [5, 5, 5], [4,
----
2 2

query I
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, this test fails on main like this:

Error during planning: array_position received incompatible types: '[Int32, Int64]'.

----
1

query I
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a few more tests:

  1. For LargeList
  2. Error cases (e.g. for types that are still not compatible like array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')

Copy link
Contributor Author

@tshauck tshauck Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback... LargeList is added and works as expected. For array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo'), this actually doesn't fail right now on this branch and returns NULL:

❯ SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo') IS NULL;
+----------------------------------------------------------------------------------------------+
| array_position(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) IS NULL |
+----------------------------------------------------------------------------------------------+
| true                                                                                         |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.004 seconds.

Because this PR is leveraging the append's casting functionality, this seems to be the current behavior on main for append:

arrow-datafusion/datafusion-cli main ➜ ./target/debug/datafusion-cli 
DataFusion CLI v35.0.0
❯ SELECT array_append(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo');
+------------------------------------------------------------------------------------+
| array_append(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) |
+------------------------------------------------------------------------------------+
| [5, 2, 3, 4, 5, foo]                                                               |
+------------------------------------------------------------------------------------+
1 row in set. Query took 0.028 seconds.

I guess as I user I'd think if the array_append case would work, I'd also expect array_position to work like that. FWIW duckdb works for both array_append and array_position([1,2,3], 'foo') it just returns 0 rather than NULL.

Anyways, happy to go either way, just thought I'd bring it up before delving further.

Copy link
Contributor

@jayzhan211 jayzhan211 Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second case, we return NULL which is compatible with Postgres behavior, so I think it is acceptable. You can just add the test case to make sure we got NULL

query I
select array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')
----
NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good... added in 4ec6289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests and doing the background research

----
5

# list_position scalar function #5 (function alias `array_position`)
query III
select list_position(['h', 'e', 'l', 'l', 'o'], 'l'), list_position([1, 2, 3, 4, 5], 5), list_position([1, 1, 1], 1);
Expand Down
Loading