-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow numeric tokens containing 'e' that aren't exponents be passed to proc macros #111615
Comments
Chiming in that this is also absolutely a problem for our project (github.com/makepad/makepad) and would really appreciate a fix!. All proc-macro embedded UI systems for Rust need this. |
This error comes out at macro_rules! print_token {
($x:tt) => { println!("{}", stringify!($x)) }
}
fn main() {
print_token!("hello"_);
} Output:
Seems not easy to fix, we need to change to report this kind of error after macro expanding. |
I have a partial implementation that works for most literals involving an 'e', including CSS colours of the form From an implementation point of view it shouldn't be hard, but the change will need sign-off from the lang team. The relevant section of the Reference will need changing; in particular the Reserved forms similar to literals section as well as the SUFFIX_NO_E nonterminal in the grammar. Working through all this will take some time. @chenyukang |
Integers with arbitrary suffixes are allowed as inputs to proc macros. A number of real-world crates use this capability in interesting ways, as seen in rust-lang#103872. For example: - Suffixes representing units, such as `8bits`, `100px`, `20ns`, `30GB` - CSS hex colours such as `#7CFC00` (LawnGreen) - UUIDs, e.g. `785ada2c-f2d0-11fd-3839-b3104db0cb68` The hex cases may be surprising. - `#7CFC00` is tokenized as a `#` followed by a `7` integer with a `CFC00` suffix. - `785ada2c` is tokenized as a `785` integer with an `ada2c` suffix. - `f2d0` is tokenized as an identifier. - `3839` is tokenized as an integer literal. A proc macro will immediately stringify such tokens and reparse them itself, and so won't care that the token types vary. All suffixes must be consumed by the proc macro, of course; the only suffixes allowed after macro expansion are the numeric ones like `u8`, `i32`, and `f64`. Currently there is an annoying inconsistency in how integer literal suffixes are handled, which is that no suffix starting with `e` is allowed, because that it interpreted as a float literal with an exponent. For example: - Units: `1eV` and `1em` - CSS colours: `#90EE90` (LightGreen) - UUIDs: `785ada2c-f2d0-11ed-3839-b3104db0cb68` In each case, a sequence of digits followed by an 'e' or 'E' followed by a letter results in an "expected at least one digit in exponent" error. This is an annoying inconsistency in general, and a problem in practice. It's likely that some users haven't realized this inconsistency because they've gotten lucky and never used a token with an 'e' that causes problems. Other users *have* noticed; it's causing problems when embedding DSLs into proc macros, as seen in rust-lang#111615, where the CSS colours case is causing problems for two different UI frameworks (Slint and Makepad). We can do better. This commit changes the lexer so that, when it hits a possible exponent, it looks ahead and only produces an exponent if a valid one is present. Otherwise, it produces a non-exponent form, which may be a single token (e.g. `1eV`) or multiple tokens (e.g. `1e+a`). Consequences of this: - All the proc macro problem cases mentioned above are fixed. - The "expected at least one digit in exponent" error is no longer possible. A few tests that only worked in the presence of that error have been removed. - The lexer requires unbounded lookahead due to the presence of '_' chars in exponents. E.g. to distinguish `1e+_______3` (a float literal with exponent) from `1e+_______a` (previously invalid, but now the tokenised as `1e`, `+`, `_______a`). This is a backwards compatible language change: all existing valid programs will be treated in the same way, and some previously invalid programs will become valid. The tokens chapter of the language reference (https://doc.rust-lang.org/reference/tokens.html) will need changing to account for this. In particular, the "Reserved forms similar to number literals" section will need updating, and grammar rules involving the SUFFIX_NO_E nonterminal will need adjusting. Fixes rust-lang#111615.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Such as `1e_3`, `1E+__3`, `1e-_________3_3`. - They are ugly and never used in practice. (The test suite and compiler code have no examples of them.) - They don't match normal decimal literals. (You can't write `x = _3;`.) - They complicate attempts to allow integers with suffixes beginning with `e`, such as `1em` (currently disallowed, but desired in rust-lang#111615). Because when given a char sequence like `1e` the lexer must decide whether what follows the `e` is a decimal integer (in which case it's a float with exponent) or something else (in which case it's an integer with a suffix). But unbounded char lookahead is required to get past the possibly unlimited number of leading underscores. Disallowing the leading underscores reduces the lookahead to two: one for a possible `+`/`-`, and then one more for a digit or non-digit.
I'm going to re-nominate this for lang. To the best of my knowledge, the primary tradeoff here is that it would prevent us from ever defining any integer suffix that starts with That seems like a potentially reasonable restriction, and I don't think it will arise in practice. |
This does not follow from the proposal. There would be no issue with defining such a suffix. Any restrictions to it just need to be applied after macro expansion. For example, exactly the way that |
Deferring the restrictions until after macro expansion would work, that's true. And since we already do that in some cases, it's reasonable to expect that we'll do it in future cases as well. |
Although hex strings were the focus of the original comment, a more general solution would be ideal, allowing cases like this to be supported as well: |
#131656 should close this issue as fixed. |
I tried this code:
I expected to see this happen: It should compile in print the token verbatim. If unkown/invalid suffix are OK, so should invalid float, because the macro DSL my use it for something else
Instead, this happened: Compilation error :
(tested with Rust 1.69 as well as current nightly, this always has been a problem)
Context
This was discussed with @nnethercote and others last week and I wanted to make sure there is a track of this.
The usecase is so that Slint's color literal work in the
slint!
maco DSL.Currently one can do
color: #0e9
but notcolor:#0ea
in the slint! macro. And Makepad DSL has to workaround the same issue.The text was updated successfully, but these errors were encountered: