Auto-generate more of the AST representation #15655

dcreager · 2025-01-21T21:41:51Z

In #15544 we added a script to auto-generate large parts of the Rust data model that we use to store the AST of parsed Python code. That script consumes a TOML file, which describes all of the possible syntax nodes (e.g., StmtIf, ExprBinOp) and any groups those nodes belong to (e.g. Stmt, Expr). The details of each syntax node are still defined manually in Rust.

We could go further with auto-generation, with existing art that we could build on. rust-analyzer uses ungrammar, while Python itself uses ASDL (asdl, parser, codegen). This would eliminate even more tedious hand-written Rust code — not just the struct/enum definitions themselves, but even things like the visit_source_order methods for each syntax node. It would also allow us to experiment more easily with other internal representations for the parsed AST — such as using IndexVec to store the syntax node content (as alluded to in #12419 (comment)).

The text was updated successfully, but these errors were encountered:

Glyphack · 2025-02-13T20:49:22Z

Hey, I'd like to help with this and the linked issue.
First I went through the original PR and tried resolving one of the comments to see how the code generation works(#16144).

I'll look into other files and see what other stuff can we auto generate with the current information.

I want to continue with the implementation, I appreciate your help with a few questions:

Is the goal use ASDL for generating the AST node structs and enums? If I'm not wrong by using ASDL we don't need the ast.toml anymore. So should we still keep ast.toml?
Is the plan to use ungrammar only for generating visit_source_order for nodes? Or the mention of ungrammar is for something else?

Update: I decided to extend the toml file you created. I just borrowed some names from ASDL to make it possible to generate AST nodes.

dcreager added internal An internal refactor or improvement help wanted Contributions especially welcome labels Jan 21, 2025

dcreager mentioned this issue Jan 21, 2025

Consider a more data-oriented AST representation #15657

Open

Glyphack mentioned this issue Feb 20, 2025

Auto generate ast expression nodes #16285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate more of the AST representation #15655

Auto-generate more of the AST representation #15655

dcreager commented Jan 21, 2025

Glyphack commented Feb 13, 2025 •

edited

Loading

Auto-generate more of the AST representation #15655

Auto-generate more of the AST representation #15655

Comments

dcreager commented Jan 21, 2025

Glyphack commented Feb 13, 2025 • edited Loading

Glyphack commented Feb 13, 2025 •

edited

Loading