Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend lambda support for ClickHouse and DuckDB dialects #1686

Merged
merged 7 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1045,7 +1045,9 @@ pub enum Expr {
/// param -> expr | (param1, ...) -> expr
/// ```
///
/// See <https://docs.databricks.com/en/sql/language-manual/sql-ref-lambda-functions.html>.
/// [ClickHouse](https://clickhouse.com/docs/en/sql-reference/functions#higher-order-functions---operator-and-lambdaparams-expr-function)
/// [Databricks](https://docs.databricks.com/en/sql/language-manual/sql-ref-lambda-functions.html)
/// [DuckDb](https://duckdb.org/docs/sql/functions/lambda.html)
Lambda(LambdaFunction),
}

Expand Down
9 changes: 9 additions & 0 deletions src/dialect/clickhouse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,13 @@ impl Dialect for ClickHouseDialect {
fn supports_dictionary_syntax(&self) -> bool {
true
}

/// See https://clickhouse.com/docs/en/sql-reference/functions#higher-order-functions---operator-and-lambdaparams-expr-function
fn supports_lambda_functions(&self) -> bool {
true
}

fn supports_parensless_lambda_functions(&self) -> bool {
true
}
}
4 changes: 4 additions & 0 deletions src/dialect/databricks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ impl Dialect for DatabricksDialect {
true
}

fn supports_parensless_lambda_functions(&self) -> bool {
true
}

// https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select.html#syntax
fn supports_select_wildcard_except(&self) -> bool {
true
Expand Down
9 changes: 9 additions & 0 deletions src/dialect/duckdb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,15 @@ impl Dialect for DuckDbDialect {
true
}

/// See https://duckdb.org/docs/sql/functions/lambda.html
fn supports_lambda_functions(&self) -> bool {
true
}

fn supports_parensless_lambda_functions(&self) -> bool {
true
}

// DuckDB is compatible with PostgreSQL syntax for this statement,
// although not all features may be implemented.
fn supports_explain_with_utility_options(&self) -> bool {
Expand Down
4 changes: 4 additions & 0 deletions src/dialect/generic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ impl Dialect for GenericDialect {
true
}

fn supports_lambda_functions(&self) -> bool {
true
}

fn allow_extract_custom(&self) -> bool {
true
}
Expand Down
11 changes: 10 additions & 1 deletion src/dialect/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -340,12 +340,21 @@ pub trait Dialect: Debug + Any {
/// Returns true if the dialect supports lambda functions, for example:
///
/// ```sql
/// SELECT transform(array(1, 2, 3), x -> x + 1); -- returns [2,3,4]
/// SELECT transform(array(1, 2, 3), (x) -> x + 1); -- returns [2,3,4]
/// ```
fn supports_lambda_functions(&self) -> bool {
false
}

/// Returns true if the dialect supports lambda functions without parentheses for a single argument, for example:
///
/// ```sql
/// SELECT transform(array(1, 2, 3), x -> x + 1); -- returns [2,3,4]
/// ```
fn supports_parensless_lambda_functions(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the idea with introducing this method is to avoid the Generic dialect's syntax conflict due to its pg json syntax support? If so I'm thinking it could more sense to turn off supports_lambda_functions for the Generic dialect instead, idea with the dialect is that it gets feature support by default only if there aren't conflicting syntax. So that if its not expected that a dialect supports (x) -> y but not x -> y then maybe Generic dialect shouldn't support lambdas after all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it right. I think your point makes sense and I'm fine with turning off supports_lambda_functions for the Generic dialect, but should supports_parensless_lambda_functions be removed from the Dialect trait too? I want to run Datafusion with support for both lambdas (even if with limited syntax) and pg json syntax from datafusion-functions-json, and supports_parensless_lambda_functions allows me to use a custom dialect to do so. In the worst case I can run without pg syntax and support only direct function calls (json_get, etc)

cc @samuelcolvin in case you have interest in this too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much context here, personally I think we'll want to switch off lambdas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, given expressions x->y or (x)->y: if a dialect supports both the pg json -> operator and the lambda syntax then it seems to suggest an ambiguous grammar (it doesn't seem like there's a way to tell what either expression should be parsed into)?

My thinking was indeed to potentially turn off generic dialect and remove supports_parensless_lambda_functions. Latter sounds like we'd be introducing a behavior to the parser that is only relevant in certain cominbations, neither covered by any of the parser's supported dialects nor a sql spec for reference which spontaneously feels like a slippery slope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The lambda syntax looks identical to Pg JSON lookup syntax with a completely different meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. supports_parensless_lambda_functions is removed and lambda support restricted to clickhouse, databricks and duckdb. I actually forgot the nested expr (expr), so yes, this conflicted with pg, it just wasn't caught by any test, plain wrong. Thanks 🙏

false
}

/// Returns true if the dialect supports method calls, for example:
///
/// ```sql
Expand Down
2 changes: 1 addition & 1 deletion src/parser/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1268,7 +1268,7 @@ impl<'a> Parser<'a> {
value: self.parse_introduced_string_value()?,
})
}
Token::Arrow if self.dialect.supports_lambda_functions() => {
Token::Arrow if self.dialect.supports_parensless_lambda_functions() => {
self.expect_token(&Token::Arrow)?;
Ok(Expr::Lambda(LambdaFunction {
params: OneOrManyWithParens::One(w.clone().into_ident(w_span)),
Expand Down
95 changes: 95 additions & 0 deletions tests/sqlparser_common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13285,3 +13285,98 @@ fn test_trailing_commas_in_from() {
"SELECT 1, 2 FROM (SELECT * FROM t1), (SELECT * FROM t2)",
);
}

#[test]
fn test_lambdas() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from databricks tests

let dialects = all_dialects_where(|d| d.supports_lambda_functions());

#[rustfmt::skip]
let sql = concat!(
"SELECT array_sort(array('Hello', 'World'), ",
"(p1, p2) -> CASE WHEN p1 = p2 THEN 0 ",
"WHEN reverse(p1) < reverse(p2) THEN -1 ",
"ELSE 1 END)",
);
pretty_assertions::assert_eq!(
SelectItem::UnnamedExpr(call(
"array_sort",
[
call(
"array",
[
Expr::Value(Value::SingleQuotedString("Hello".to_owned())),
Expr::Value(Value::SingleQuotedString("World".to_owned()))
]
),
Expr::Lambda(LambdaFunction {
params: OneOrManyWithParens::Many(vec![Ident::new("p1"), Ident::new("p2")]),
body: Box::new(Expr::Case {
operand: None,
conditions: vec![
Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("p1"))),
op: BinaryOperator::Eq,
right: Box::new(Expr::Identifier(Ident::new("p2")))
},
Expr::BinaryOp {
left: Box::new(call(
"reverse",
[Expr::Identifier(Ident::new("p1"))]
)),
op: BinaryOperator::Lt,
right: Box::new(call(
"reverse",
[Expr::Identifier(Ident::new("p2"))]
))
}
],
results: vec![
Expr::Value(number("0")),
Expr::UnaryOp {
op: UnaryOperator::Minus,
expr: Box::new(Expr::Value(number("1")))
}
],
else_result: Some(Box::new(Expr::Value(number("1"))))
})
})
]
)),
dialects.verified_only_select(sql).projection[0]
);

dialects.verified_expr(
"map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2))",
);
dialects.verified_expr("transform(array(1, 2, 3), (x) -> x + 1)");
}

#[test]
fn test_parensless_lambdas() {
let dialects = all_dialects_where(|d| d.supports_parensless_lambda_functions());

pretty_assertions::assert_eq!(
call(
"transform",
[
call(
"array",
[
Expr::Value(Value::Number("1".to_owned(), false)),
Expr::Value(Value::Number("2".to_owned(), false)),
Expr::Value(Value::Number("3".to_owned(), false)),
]
),
Expr::Lambda(LambdaFunction {
params: OneOrManyWithParens::One(Ident::new("x")),
body: Box::new(Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("x"))),
op: BinaryOperator::Plus,
right: Box::new(Expr::Value(Value::Number("1".to_owned(), false)))
})
})
]
),
dialects.verified_expr("transform(array(1, 2, 3), x -> x + 1)")
);
}
63 changes: 0 additions & 63 deletions tests/sqlparser_databricks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,69 +83,6 @@ fn test_databricks_exists() {
);
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into common tests

fn test_databricks_lambdas() {
#[rustfmt::skip]
let sql = concat!(
"SELECT array_sort(array('Hello', 'World'), ",
"(p1, p2) -> CASE WHEN p1 = p2 THEN 0 ",
"WHEN reverse(p1) < reverse(p2) THEN -1 ",
"ELSE 1 END)",
);
pretty_assertions::assert_eq!(
SelectItem::UnnamedExpr(call(
"array_sort",
[
call(
"array",
[
Expr::Value(Value::SingleQuotedString("Hello".to_owned())),
Expr::Value(Value::SingleQuotedString("World".to_owned()))
]
),
Expr::Lambda(LambdaFunction {
params: OneOrManyWithParens::Many(vec![Ident::new("p1"), Ident::new("p2")]),
body: Box::new(Expr::Case {
operand: None,
conditions: vec![
Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("p1"))),
op: BinaryOperator::Eq,
right: Box::new(Expr::Identifier(Ident::new("p2")))
},
Expr::BinaryOp {
left: Box::new(call(
"reverse",
[Expr::Identifier(Ident::new("p1"))]
)),
op: BinaryOperator::Lt,
right: Box::new(call(
"reverse",
[Expr::Identifier(Ident::new("p2"))]
))
}
],
results: vec![
Expr::Value(number("0")),
Expr::UnaryOp {
op: UnaryOperator::Minus,
expr: Box::new(Expr::Value(number("1")))
}
],
else_result: Some(Box::new(Expr::Value(number("1"))))
})
})
]
)),
databricks().verified_only_select(sql).projection[0]
);

databricks().verified_expr(
"map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2))",
);
databricks().verified_expr("transform(array(1, 2, 3), x -> x + 1)");
}

#[test]
fn test_values_clause() {
let values = Values {
Expand Down
10 changes: 5 additions & 5 deletions tests/sqlparser_postgres.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2749,7 +2749,7 @@ fn test_json() {
);

let sql = "SELECT params -> 'name' FROM events";
let select = pg().verified_only_select(sql);
let select = pg_and_generic().verified_only_select(sql);
assert_eq!(
SelectItem::UnnamedExpr(Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("params"))),
Expand All @@ -2760,7 +2760,7 @@ fn test_json() {
);

let sql = "SELECT info -> 'items' ->> 'product' FROM orders";
let select = pg().verified_only_select(sql);
let select = pg_and_generic().verified_only_select(sql);
assert_eq!(
SelectItem::UnnamedExpr(Expr::BinaryOp {
left: Box::new(Expr::BinaryOp {
Expand All @@ -2778,7 +2778,7 @@ fn test_json() {

// the RHS can be a number (array element access)
let sql = "SELECT obj -> 42";
let select = pg().verified_only_select(sql);
let select = pg_and_generic().verified_only_select(sql);
assert_eq!(
SelectItem::UnnamedExpr(Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("obj"))),
Expand All @@ -2790,7 +2790,7 @@ fn test_json() {

// the RHS can be an identifier
let sql = "SELECT obj -> key";
let select = pg().verified_only_select(sql);
let select = pg_and_generic().verified_only_select(sql);
assert_eq!(
SelectItem::UnnamedExpr(Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("obj"))),
Expand All @@ -2802,7 +2802,7 @@ fn test_json() {

// -> operator has lower precedence than arithmetic ops
let sql = "SELECT obj -> 3 * 2";
let select = pg().verified_only_select(sql);
let select = pg_and_generic().verified_only_select(sql);
assert_eq!(
SelectItem::UnnamedExpr(Expr::BinaryOp {
left: Box::new(Expr::Identifier(Ident::new("obj"))),
Expand Down
Loading