Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend lambda support for ClickHouse and DuckDB dialects #1686

Merged
merged 7 commits into from
Jan 31, 2025

Conversation

gstvg
Copy link
Contributor

@gstvg gstvg commented Jan 29, 2025

Extend lambda support landed on #1257 to more dialects (but not Snowflake, see #1273)

Returns true for supports_lambda_functions for ClickHouse and DuckDB and Generic dialects
Adds supports_parensless_lambda_functions to Dialect
Returns true for supports_parensless_lambda_functions for ClickHouse, Databricks and DuckDB dialects, but not Generic to no conflict with Postgres JSON

This is a breaking change because now to parse a parensless lambda x -> x + 1 a dialect must also implement supports_parensless_lambda_functions

@@ -13285,3 +13285,98 @@ fn test_trailing_commas_in_from() {
"SELECT 1, 2 FROM (SELECT * FROM t1), (SELECT * FROM t2)",
);
}

#[test]
fn test_lambdas() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from databricks tests

@@ -83,69 +83,6 @@ fn test_databricks_exists() {
);
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into common tests

/// ```sql
/// SELECT transform(array(1, 2, 3), x -> x + 1); -- returns [2,3,4]
/// ```
fn supports_parensless_lambda_functions(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the idea with introducing this method is to avoid the Generic dialect's syntax conflict due to its pg json syntax support? If so I'm thinking it could more sense to turn off supports_lambda_functions for the Generic dialect instead, idea with the dialect is that it gets feature support by default only if there aren't conflicting syntax. So that if its not expected that a dialect supports (x) -> y but not x -> y then maybe Generic dialect shouldn't support lambdas after all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it right. I think your point makes sense and I'm fine with turning off supports_lambda_functions for the Generic dialect, but should supports_parensless_lambda_functions be removed from the Dialect trait too? I want to run Datafusion with support for both lambdas (even if with limited syntax) and pg json syntax from datafusion-functions-json, and supports_parensless_lambda_functions allows me to use a custom dialect to do so. In the worst case I can run without pg syntax and support only direct function calls (json_get, etc)

cc @samuelcolvin in case you have interest in this too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much context here, personally I think we'll want to switch off lambdas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, given expressions x->y or (x)->y: if a dialect supports both the pg json -> operator and the lambda syntax then it seems to suggest an ambiguous grammar (it doesn't seem like there's a way to tell what either expression should be parsed into)?

My thinking was indeed to potentially turn off generic dialect and remove supports_parensless_lambda_functions. Latter sounds like we'd be introducing a behavior to the parser that is only relevant in certain cominbations, neither covered by any of the parser's supported dialects nor a sql spec for reference which spontaneously feels like a slippery slope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The lambda syntax looks identical to Pg JSON lookup syntax with a completely different meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. supports_parensless_lambda_functions is removed and lambda support restricted to clickhouse, databricks and duckdb. I actually forgot the nested expr (expr), so yes, this conflicted with pg, it just wasn't caught by any test, plain wrong. Thanks 🙏

@gstvg gstvg changed the title Extend lambda support for ClickHouse, DuckDB and Generic dialects Extend lambda support for ClickHouse and DuckDB dialects Jan 30, 2025
@gstvg
Copy link
Contributor Author

gstvg commented Jan 30, 2025

Looks like the CI errors are related to #1693

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @gstvg and @samuelcolvin!
cc @alamb

@iffyio iffyio merged commit aeaafbe into apache:main Jan 31, 2025
9 checks passed
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin added a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants