Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support underscore separators in numbers for Clickhouse. Fixes #1659 #1677

Merged
merged 9 commits into from
Jan 28, 2025

Conversation

graup
Copy link
Contributor

@graup graup commented Jan 22, 2025

Fixes #1659

Adds support for underscore separators in numeric values by allowing them to appear among digits during tokenization.

This comes out of the parser as a Value with the same string content. I considered removing the underscore during parsing but since this project errs more on the side of "no semantics" I think it's maybe better to just keep it.

Not sure that additional test on the AST level is really needed, it basically just tests that the parser doesn't do anything with or throw on these expressions.

@graup graup marked this pull request as draft January 22, 2025 12:50
@graup graup marked this pull request as ready for review January 22, 2025 13:10
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @graup! Left some minor comments, this looks good to me overall!

@graup
Copy link
Contributor Author

graup commented Jan 23, 2025

Thanks for the review @iffyio, I have applied your suggestions.

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @graup!
cc @alamb

@iffyio
Copy link
Contributor

iffyio commented Jan 23, 2025

@graup could you take a look at the ci failures when you get a chance?

@graup
Copy link
Contributor Author

graup commented Jan 23, 2025

Thanks, should be fixed. (I found the helpful HINT: use test_utils::number 😄)

@iffyio
Copy link
Contributor

iffyio commented Jan 24, 2025

@graup there seems to a failing test on the PR

@graup
Copy link
Contributor Author

graup commented Jan 24, 2025

Yeah, seems like this number behavior is still failing under the bignumbers feature. I can reproduce the failing test locally with cargo test --all-features. I don't know enough to fix this sadly :/

src/tokenizer.rs Outdated

s += &peeking_take_while(chars, |ch| {
ch.is_ascii_digit()
|| self.dialect.supports_numeric_literal_underscores() && ch == '_'
Copy link
Contributor

@hansott hansott Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if starts or ends with _?

I think ClickHouse rejects that: https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/StringUtils.h#L171-L182

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth having a test to demonstrate the behavior with a trailing underscore

@mwylde
Copy link

mwylde commented Jan 24, 2025

This is great! Underscores are also supported in Postgres 16 (https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-DATATYPES), so maybe it could be added to the postgres dialect as well?

@graup
Copy link
Contributor Author

graup commented Jan 24, 2025

Thanks for the comments! Seems Postgres (patch) is using the same syntax rule as Clickhouse (code). I'll fix this up on Monday.

In the meantime, if anyone has an idea how to address the test failure, I'd love some help with that.

Maybe it's better to drop the underscores in the parsing stage, after all?

@graup
Copy link
Contributor Author

graup commented Jan 27, 2025

I fixed the test and implemented the syntax logic (one underscore at time, must be followed by another digit). We could consider throwing a syntax error (as the engines that supports this do), but on the other hand, we also didn't throw an error before – I have added a test that verifies the previous behavior).

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @graup! One minor edit on the doc formatting otherwise I think this looks good!

@iffyio iffyio merged commit 269967a into apache:main Jan 28, 2025
9 checks passed
iffyio added a commit to validio-io/sqlparser-rs that referenced this pull request Jan 28, 2025
Minor follow up from apache#1677
Support was added for both Clickhouse and Postgres but the
test only covered Clickhouse. This move the test to common
and to use the `all_dialects_where` for auto test coverage.
@graup graup deleted the support-underscore-numbers branch January 31, 2025 10:50
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin pushed a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Vedin added a commit to Embucket/datafusion-sqlparser-rs that referenced this pull request Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clickhouse: Support underscore in numbers
4 participants