-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support underscore separators in numbers for Clickhouse. Fixes #1659 #1677
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @graup! Left some minor comments, this looks good to me overall!
Thanks for the review @iffyio, I have applied your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@graup could you take a look at the ci failures when you get a chance? |
Thanks, should be fixed. (I found the helpful |
@graup there seems to a failing test on the PR |
Yeah, seems like this number behavior is still failing under the bignumbers feature. I can reproduce the failing test locally with |
src/tokenizer.rs
Outdated
|
||
s += &peeking_take_while(chars, |ch| { | ||
ch.is_ascii_digit() | ||
|| self.dialect.supports_numeric_literal_underscores() && ch == '_' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if starts or ends with _
?
I think ClickHouse rejects that: https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/StringUtils.h#L171-L182
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be worth having a test to demonstrate the behavior with a trailing underscore
This is great! Underscores are also supported in Postgres 16 (https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-DATATYPES), so maybe it could be added to the postgres dialect as well? |
Thanks for the comments! Seems Postgres (patch) is using the same syntax rule as Clickhouse (code). I'll fix this up on Monday. In the meantime, if anyone has an idea how to address the test failure, I'd love some help with that. Maybe it's better to drop the underscores in the parsing stage, after all? |
# Conflicts: # src/dialect/postgresql.rs
I fixed the test and implemented the syntax logic (one underscore at time, must be followed by another digit). We could consider throwing a syntax error (as the engines that supports this do), but on the other hand, we also didn't throw an error before – I have added a test that verifies the previous behavior). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @graup! One minor edit on the doc formatting otherwise I think this looks good!
Minor follow up from apache#1677 Support was added for both Clickhouse and Postgres but the test only covered Clickhouse. This move the test to common and to use the `all_dialects_where` for auto test coverage.
…apache#1659 (apache#1677)" This reverts commit 8fb7a02.
Fixes #1659
Adds support for underscore separators in numeric values by allowing them to appear among digits during tokenization.
This comes out of the parser as a Value with the same string content. I considered removing the underscore during parsing but since this project errs more on the side of "no semantics" I think it's maybe better to just keep it.
Not sure that additional test on the AST level is really needed, it basically just tests that the parser doesn't do anything with or throw on these expressions.