Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for multiple trees per file #51

Merged
merged 23 commits into from
Feb 24, 2024

Conversation

giovannidisiena
Copy link
Contributor

@giovannidisiena giovannidisiena commented Dec 15, 2023

WIP attempt at handling multiple trees per .tree file. Currently, it appears I have broken some of the existing tests but I'm not yet sure how/why.

Steps taken:

  • Find all roots in a .tree by adding a new TokenKind::Root.
  • Parse & translate each AST individually.
  • Combiner: Check that all roots belong to the same contract, i.e. Contract for each root Contract.function is the same.
  • Combiner: Merge all the HIRs by collecting the function nodes and appending them to the root contract node.
  • Combine and report any errors.
  • Add tests for all of the above.
  • Add roots check to bulloak check - this is done by hir::translate when check::context::Context is created since check mostly works with parse trees derived from HIRs.

@alexfertel it would be great to get some initial feedback here, when you get some time please, to make sure I'm not completely barking up the wrong tree or making too much of a mess. I have left a few follow-up comments for both my and your benefit - I'm sure there's lots here that could be improved or otherwise written in a more idiomatic fashion so please do let me know where that is the case as I'm definitely still getting to grips with Rust!

Partially closes #50.

@alexfertel
Copy link
Owner

alexfertel commented Dec 15, 2023

This looks very promising! Thanks for such an effort.

Reading the code made me realize that we might be able to do something simpler: What if we parallelized the inner phases? That is, we currently roughly have the following compiler pipeline:

.tree -> &str -> scaffold entrypoint -> tokenize & parse -> semantic analysis -> HIR -> PT -> emit 

The above works for one (1) tree. Would you be willing to try feeding each separate tree to the pipeline? You would only touch two points in the pipeline:

.tree -> &str -> split input -> scaffold entrypoint -> ... -> HIR -> Combine -> PT -> emit
                 ^^^^^^^^^^^                                         ^^^^^^^

Which ends up looking like:

                                scaffold entrypoint -> ... -> Hir 
                              /                                   \
.tree -> &str -> split input -> scaffold entrypoint -> ... -> Hir -> Combine -> PT -> emit
                              \                                   /
                                scaffold entrypoint -> ... -> Hir 

Where the split is a simple:

let trees = tree.split("\n\n").collect::<Vec<_>>();

And the Combine phase looks similar to what you have.

If this makes sense to you I think it would simplify a lot. Wdyt?

On the other hand, I think the combine step is a good place for verifying that all the trees have the same contract name 👌🏻

Copy link
Owner

@alexfertel alexfertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments answering @follow-up.

If you have any questions, please do reach out. This is a great feature, and I'm happy to answer whatever!

@giovannidisiena
Copy link
Contributor Author

Thanks @alexfertel, the review was very helpful - I have just pushed a new commit with the simplified pipeline. I have also identified the cause of most of the existing test failures due to some slight modifications I made to the use of identifier mode in syntax::tokenizer, which I believe to be correct but have downstream effects, so if you could take a look at the remaining updated follow-up comments and provide your thoughts please that would be awesome.

Copy link
Owner

@alexfertel alexfertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heyo! This is indeed a step forward, thanks for taking the time.

I gave it a closer look now since I think that we can go forward with this design. Lmk if you have any questions!

Comment on lines 45 to 48
return Err(format!(
"Contract name mismatch: expected '{}', found '{}'",
first_contract_name, root.contract_name
));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first step, but we probably will want to report more information with this error, like the exact line where it happened, so we'll need to add custom Combiner errors, like what Parser et al have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexfertel, I believe I have made some progress on this front, although I'm not quite sure what additional information we want to report and how. For example, I am having some difficulty working out how we can get the span of the original text from the HIR (more context on L125 of combiner.rs). Some input here would be great!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having some difficulty working out how we can get the span of the original text from the HIR

Ah, yes, this is a limitation of the current design, since a HIR doesn't really map to a tree, this is something to figure out. No worries, we don't have to tackle this here since it is unrelated and uphill.

@@ -140,7 +140,8 @@ impl Tokenizer {
Self {
pos: Cell::new(Position::new(0, 1, 1)),
// Starts as `true` because the first token must always be a contract name.
identifier_mode: Cell::new(true),
// identifier_mode: Cell::new(true), // @follow-up - this is correct, but it breaks the tests because of invalid . character in root
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .s in a root should be valid now, since I am calling sanitize on contract names. We have multiple tests for this, but the main one that covers this functionality you can find here:

fn scaffolds_trees() {

Which, scaffolds this file:

https://github.com/alexfertel/bulloak/blob/c38e7795c6a018019ed2e2e0712868b5db0202be/tests/scaffold/basic.tree

Which contains a contract name with a ..

Copy link
Contributor Author

@giovannidisiena giovannidisiena Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexfertel, based on the latest commit, all existing tests pass. The issue above occurs when changing the reset function to set identifier_mode to true. The rationale here that, given identifier_mode is set to true by default since:

The first token must always be a contract name, which has to be a valid Solidity identifier.

it should also be true here as we don't want this state to be changed to false when reset is called within tokenize; however, since the contract name is not sanitized before the tokenization step then the tests fail with IdentifierCharInvalid('.').

Do you agree with this assessment? If so, how should we handle this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you were right; it's fine not to treat contract names as identifiers since we sanitize them later. Lmk if you find any bugs with this behavior.

let identifier = get_contract_name_from_identifier(&text);
let accumulated_identifier = contract_definition.identifier.clone();
if identifier != accumulated_identifier {
let span = Span::default(); // @follow-up - how can we get the span from the HIR? Is it even necessary? This would be easier to do with verification of the AST. One option is to use the index of the HIR in the vector of HIRs since we know the identifier is the start of a given tree.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good idea, but I'm hesitant to commit until I think more about it. We can report with the empty span for now.

@alexfertel
Copy link
Owner

I don't have much time right now to review, but I will do so soon after New Year's!

Copy link

codecov bot commented Jan 14, 2024

Codecov Report

Attention: Patch coverage is 98.38275% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 92.5%. Comparing base (02c171e) to head (8f80e68).

Additional details and impacted files
Files Coverage Δ
src/check/rules/structural_match.rs 96.1% <ø> (ø)
src/check/violation.rs 88.4% <100.0%> (+0.2%) ⬆️
src/error.rs 91.3% <100.0%> (+0.6%) ⬆️
src/hir/mod.rs 100.0% <100.0%> (ø)
src/hir/translator.rs 98.3% <100.0%> (ø)
src/scaffold/emitter.rs 97.9% <100.0%> (-0.1%) ⬇️
src/scaffold/mod.rs 87.9% <100.0%> (+2.6%) ⬆️
src/scaffold/modifiers.rs 90.8% <ø> (ø)
src/syntax/tokenizer.rs 94.6% <100.0%> (-0.1%) ⬇️
src/utils.rs 99.2% <100.0%> (+4.6%) ⬆️
... and 1 more

@alexfertel
Copy link
Owner

Heyo @giovannidisiena ! Sorry about the delay, I've been very busy. Finally had some time to look at this.

Very good job! This is basically what I had in mind. What's left is testing, which I'll leave in your hands. I'd expect unit tests + integration tests in the tests folder.

I pushed a couple of commits: one with a few structural changes that are basically nits from me and another one fixing what you brought up -- you were right that identifier_mode should start as false. This was a bug given that we sanitize contract names now.

One more thing we need to add to the TODO list is that we need to handle trees that are more than \n\n apart. I think it's reasonable to just ignore any whitespace around and between the \n\n.

Thank you very much, excited about merging this PR!

@giovannidisiena
Copy link
Contributor Author

Hey @alexfertel, that's not a problem at all. I'm also very busy for the foreseeable so will have to attempt to chip away at the testing as and when possible.

Brilliant, thank you for the assist there, and yes agreed regarding the additional whitespace.

Thank you very much, excited about merging this PR!

Likewise!

@alexfertel
Copy link
Owner

Hey @giovannidisiena, I just wanted to check how we're feeling about this one. I'd really like this not to get stale, so if you're not up for it I'm happy to take over!

@giovannidisiena
Copy link
Contributor Author

Absolutely @alexfertel, thanks for checking in. I may get some time toward the end of this week, so will keep you updated; otherwise, it may be best for you to take over since the next opportunity I have will likely be toward the end of March/April, which is obviously not ideal for keeping this from getting stale. I'll let you know how it goes!

Copy link
Contributor Author

@giovannidisiena giovannidisiena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking for some clarification on this. Additionally, both check and scaffold are currently erroring with "a Corner must be the last child" – I'd thought this wouldn't be an issue since we are splitting the trees before forwarding on to the parser, but for some reason it is, and I haven't yet been able to track down the cause. If you have any ideas that would be great:)

@giovannidisiena
Copy link
Contributor Author

@alexfertel I think that's almost everything you asked for now, except Combiner unit tests that I will add. The only other thing I am unsure about is whether we need to handle any violation cases to be fixed when running bulloak check --fix?

Copy link
Owner

@alexfertel alexfertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking awesome! I want to do some checks myself locally, which I'll do as soon as I get some time, hopefully today. Thanks for all this work ❤️

@giovannidisiena
Copy link
Contributor Author

This is looking awesome! I want to do some checks myself locally, which I'll do as soon as I get some time, hopefully today. Thanks for all this work ❤️

Great! And thanks for all the help ❤️

@alexfertel
Copy link
Owner

alexfertel commented Feb 22, 2024

Okay, local checks done!

any violation cases to be fixed when running bulloak check --fix

Nothing comes to mind right now, but any work there can be done in a different PR, it doesn't need to be a part of this one. If you think of anything feel free to open an issue.

I think what's missing are the Combiner unit tests, and then I think we need to check the README and update it to reflect the new functionality. Does that make sense to you?

@giovannidisiena
Copy link
Contributor Author

Nothing comes to mind right now, but any work there can be done in a different PR, it doesn't need to be a part of this one. If you think of anything feel free to open an issue.

The only thing I can think of at the moment is perhaps fixing the contract name mismatch by applying the first identifier to all subsequent roots, but I'm happy to open it as a separate issue if you think it's not worth tackling right now.

@alexfertel
Copy link
Owner

The only thing I can think of at the moment is perhaps fixing the contract name mismatch by applying the first identifier to all subsequent roots, but I'm happy to open it as a separate issue if you think it's not worth tackling right now.

Ah, you're right. Yeah, I think we can tackle it separately. If you feel like working on it, don't let me hold you back, though. I'm happy either way.

@giovannidisiena
Copy link
Contributor Author

Ah, you're right. Yeah, I think we can tackle it separately. If you feel like working on it, don't let me hold you back, though. I'm happy either way.

Okay, no problem; I'll see if I have time left over after the Combiner tests and README are done 👍

@giovannidisiena
Copy link
Contributor Author

giovannidisiena commented Feb 22, 2024

@alexfertel, Combiner tests have been added, and the corresponding logic has been updated accordingly. I'm not sure I'll get around to refactoring the structural match violation logic, although if you are willing to provide some pointers then that could speed things up and I may be able to take a crack at it.

Copy link
Owner

@alexfertel alexfertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@alexfertel
Copy link
Owner

I'm not sure I'll get around to refactoring the structural match violation logic

That's okay, that part of the code is not polished and needs lots of love. I'm merging this in!

When you're happy with this, feel free to change it to ready for review.

@giovannidisiena giovannidisiena marked this pull request as ready for review February 23, 2024 10:05
@alexfertel alexfertel changed the title WIP: Handle multiple trees per .tree file feat: add support for multiple trees per file Feb 23, 2024
@giovannidisiena
Copy link
Contributor Author

@alexfertel just realised I haven't updated the README, doing that now.

@alexfertel
Copy link
Owner

I forgot about that as well, thank you!

@giovannidisiena
Copy link
Contributor Author

@alexfertel While writing up an example for the README, I noticed that the resulting Solidity test function names did not include the root identifier function name, so I have also added that and refactored the Combiner. May be worth another review to make sure you're still happy with it.

@alexfertel
Copy link
Owner

Ah, I see, yeah that makes sense. I'll take a better look tomorrow morn, and if everything looks good, I'll merge 👍🏻

Copy link
Owner

@alexfertel alexfertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@alexfertel alexfertel merged commit 72bcfcd into alexfertel:main Feb 24, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: add support for different targeting behaviours
2 participants