-
-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(ast): squash nested enums #3115
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @overlookmotel and the rest of your teammates on |
You always teach me on things that I don't understand, so ahh ... shippit? |
CodSpeed Performance ReportMerging #3115 will not alter performanceComparing Summary
|
c5d511c
to
6baa202
Compare
No perf gain for the time being, please promise to me that I'll see a huge perf gain in the future ❤️ |
This introduces a learning curve on the AST, I'll check with I likes the changes to user land code (simpler), but introduces a barrier for downstream tool authors, so we'll need to at least put a mental image somewhere to show what And again ... we probably need to start codegen the ast at some point 😭 |
Perf is very slightly better as a few types have less indirection, but most of the AST remains unchanged. The purpose of this change isn't to unlock performance gains - it's just replicating what the compiler already does internally. But it does unlock features - ways of manipulating the AST outside of what Rust gives you natively.
Yes, you're right, I should add a lengthy comment in I have documented how the macro expands in src/ast/macros.rs, but you're right, the comments don't give the wider context or justification.
How do you mean "barrier for downstream tool authors"?
I agree it'd be good to start codegen-ing, as an alternative to macros without the compile time cost. But is the AST the best candidate? It might have been a quick way to build out the AST types originally, but since it's all built now, and the AST is not undergoing much churn, I'm not sure what's to gain now. On this PR, what could be good is to generate the |
And thanks for considering accepting this. It's taken me 4 x 14 hour days to put this together, and I was worried you'd completely hate it! |
@@ -41,8 +41,10 @@ impl<'a> GatherNodeParts<'a> for MemberExpression<'a> { | |||
impl<'a> GatherNodeParts<'a> for AssignmentTarget<'a> { | |||
fn gather<F: FnMut(Atom<'a>)>(&self, f: &mut F) { | |||
match self { | |||
AssignmentTarget::SimpleAssignmentTarget(t) => t.gather(f), | |||
AssignmentTarget::AssignmentTargetPattern(_) => {} | |||
match_simple_assignment_target!(Self) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only part I don't like about this is these match patterns, using this much macro gives me C PTSD; it is a bit less hygienic than what I would prefer. Otherwise, it looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@overlookmotel Is there any specific case that these match macros solve? I'm having a hard time figuring out what we are trying to achieve with this syntax sugar. I think x.is_simple_assignment_target
is a much better interface to work with Since we won't have to know an obscure macro. In some cases we might want to write the variants manually instead of using is_*
methods but the user should decide and I believe if they understand the @inherit
then they would have no issues using it.
On another note, Please consider using codegen
not as a tool that makes writing code simpler but as a tool for documentation since I most of the time find myself reading the source code instead of documentation and it is my biggest issue with complex macros, It makes following the code a bit obfuscated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the definition at my fingertips is a superpower even AI tools wouldn't match. Comments get outdated but the code is always valid. Seeing a macro definition when I'm looking for the code is almost always a turn-down to me since I know it wouldn't be a simple code and I have to spend a few minutes just figuring out the macro and creating a mental model of what it looks like at the runtime.
This isn't really true for small macros, But as the macro complexity increases so does the time a new developer needs to have a good mental model when using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing.
Yes, it's not ideal. There is no syntax for matching against multiple variants without either a macro, or writing out all the variants by hand. In the case of Expression
, obviously the latter is impractical.
The alternative which does work is:
match self {
_ if self.is_simple_assignment_target() => { }
/* ... */
}
But problem with that is it doesn't count as an exhaustive match.
I also suspect the macro (which expands to all the variants) may help the compiler to optimize the match, as that syntax allows the match arms to be re-ordered. So I used it everywhere, even where exhaustive matching isn't required and _ if self.is_simple_assignment_target()
would work.
I tried to find a better way, but only thing I could think of was match!
macro:
// Current
match self {
match_expression!(Self) => self.to_expression().do_stuff(),
}
// With `match!` macro
match!(self, {
Self::@Expression(expr) => expr.do_stuff(),
})
But I thought that was too much magic. Do you think current syntax is so bad, we should reconsider that option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, Now I get your reasoning behind this.
I would love to see the performance difference between this macro and having the is_*
methods #[inline(always]
, I'm hoping that the compiler treats them the same after inlining the function call.
By the way, Great work! There is a lot of thought behind this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above match!
example would expand to:
match self {
_temp @ match_expression!(Self) => {
let expr = _temp.to_expression();
expr.do_stuff()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above
match!
example would expand to:match self { _temp @ match_expression!(Self) => { let expr = _temp.to_expression(); expr.do_stuff() } }
FYI, I have no problem with the syntax of the first example(without match!
), What I'm trying to say is that if the performance penalty of using methods over macros is small we might as well do it that way to avoid having these highly specialized match macros all over the place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match!
macro idea is growing on me! Perhaps if the naming was more explicit, it would be less magical?
match_with_subtypes!(prop_key, {
PropertyKey::StaticIdentifier(id) => id.do_stuff(),
PropertyKey::PrivateIdentifier(id) = id.do_stuff(),
PropertyKey::@subtype::Expression(expr) => expr.do_stuff(),
})
I know this goes in the opposite direction of your original comment - that you want less macros not more! I'm just thinking of being able to write tight expressive code without boilerplate. This at least looks more like a normal match
expression.
I would love to see the performance difference between this macro and having the is_* methods #[inline(always], I'm hoping that the compiler treats them the same after inlining the function call.
I haven't tried it. Feel free to measure if you like. _ if self.is_expression()
might be just as fast (even without #[inline_always]
- the compiler is very aggressive in inlining small functions anyway). To find out would be mostly just be find-and-replace.
However, _ if self.is_expression()
doesn't work in an exhaustive match, which personally I think is a major downside. When your match
has no catch-all _ => {}
arm, it guarantees you've handled every case. If another variant is added to the enum later, then when using exhaustive matching, the compiler will flag everywhere where you need to handle this new case. Not so if using _ if self.is_expression()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 parts to these macros, One side is the generation which is done with macros like inherit_variants
I strongly think these should be done with a codegen or at least a proc-macro
- yes although it is slower to build it is more friendly with tools - As it is right now it messes up the LSP and rust-analyzer
and things, like jump to definition, documentation and completion-suggestions, are broken in a bunch of scenarios.
As for the match macros, I think we can live with or without it. Most cases can be written manually (except the Statement
and Expression
enums which are huge) so maybe we can limit the scope of these to bigger types?
To be honest I think the match macros can be used without any issues(other than the learning curve) as long as we are generating the AST so we can have the code and tooling around it.
If we generate everything, Then we only have match macros to worry about and they don't do @inherit
/@subtype
magic, They are just copying the boilerplate for you, Which is a totally valid use-case in my opinion; If it's not right for this then what is for after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked this as unresolved just so can see and don't forget these considerations.
6baa202
to
25da804
Compare
Ah I see. Yes, not good. If
OK great. If intent is to merge, it would be ideal if you'd be able to do that as soon as possible. This PR touches so much of the codebase that it will very quickly end up with merge conflicts. I promise to follow up in next few days with:
Does that work? (I am nervous to hit the "Merge" button myself as this is such a wide-reaching change) |
|
Thanks Boshen! Really happy to have this merged. |
Add more docs for AST type enum inheritance and the `inherit_variants!` macro. This covers the changes made in #3115.
OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (oxc-project#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as oxc-project#2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (oxc-project#2987). 2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (oxc-project#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```
OK, this is a big one...
I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR.
What this PR does
This PR squashes all nested AST enum types (#2685).
e.g.: Previously:
After this PR:
All
Declaration
's variants are combined intoStatement
, butDeclaration
type still exists.As both types are
#[repr(C, u8)]
, and the discriminants are aligned, aDeclaration
can be transmuted to aStatement
at zero cost.This is the same thing as #2847, but here applied to all nested enums in the AST, and with improved helper methods.
No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing).
Why?
#[repr(C)]
.Why is it a good thing for the AST to be
#[repr(C)]
?Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for
Box
andVec
, our ownIndexVec
(TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction.Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as pure data (just bytes). That data can be moved around and manipulated beyond what Rust natively allows.
However, to enable that, the types need to be well-specified, with completely stable layouts.
#[repr(C)]
is the only tool Rust provides to do this.Once the types are
#[repr(C)]
, various features become possible:Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind).
The problem with
#[repr(C)]
It's not required to squash nested enums to make the AST
#[repr(C)]
.But the problem with
#[repr(C)]
is that it disables some compiler optimizations. Without#[repr(C)]
, the compiler squashes enums itself in some cases (which is howStatement
is currently 16 bytes). But making the types#[repr(C)]
as they are currently disables this optimization.So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers).
Implementation
One enum "inheriting" variants from another is implemented with
inherit_variants!
macro.The macro is fairly lightweight, and I think the above is quite easy to understand. No proc macros.
The macro also implements utility methods for converting between enums e.g.
Statement::as_declaration
. These methods are all zero-cost (essentially transmutes).New patterns for dealing with nested enums are introduced:
Creation:
Conversion:
Testing:
Branching:
Matching:
New syntax has pluses and minuses vs the old.
match
syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read.This PR removes 200 lines from the linter with changes like this:
https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95