-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT Refactor trusted #338
MNT Refactor trusted #338
Conversation
See discussion: https://discord.com/channels/879548962464493619/1047505653603774584/1085550033207836783 Right now, all nodes are always initialized with trusted=False, so they are not "aware" of what is trusted and what isn't (which means the argument might as well not exist). This refactor would pass the actual trusted argument the user provides to the nodes. As a consequence, the constructed tree is aware of what nodes are considered trusted or not. Some unit tests are still failing, so this is WIP.
RFC @skops-dev/maintainers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, otherwise I like the change.
skops/io/_general.py
Outdated
} | ||
self.func_name = state["content"]["func"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move this out of children
? you have previously argued they should all be in there, and I agree with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument was that all children
values should be specific types: Node
s, list of nodes, etc. This is a str, hence I moved it out. The problem with having str here is that when get_unsafe_set
is called, this fails. We just haven't covered this case, which is why it didn't come up. I can move it back into children
, but then we need to change get_unsafe_type
.
(I still plan to do a PR to test the types of children
more rigorously, this may not be the last bug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe then we want to make sure it's a string? like, cast it to str
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #340 merged, we now allow str
to be set as a child (with no further check), so I reverted this change.
- make all but 1st arg to visualize kw only - don't set default for trusted in get_tree - use PRIMITIVE_TYPE_NAMES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed your points, please review again.
skops/io/_general.py
Outdated
} | ||
self.func_name = state["content"]["func"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My argument was that all children
values should be specific types: Node
s, list of nodes, etc. This is a str, hence I moved it out. The problem with having str here is that when get_unsafe_set
is called, this fails. We just haven't covered this case, which is why it didn't come up. I can move it back into children
, but then we need to change get_unsafe_type
.
(I still plan to do a PR to test the types of children
more rigorously, this may not be the last bug)
Two measures to harden the auditing (a little bit): - Type annotate the Node's children to prevent setting invalid types. - Change all the tests that use loads to only load trusted types instead of using trusted=True The latter is importent because when setting trusted=True, the whole machinery of checking types is not executed, so any bugs that may be contained there will not be revealed. In particular, this shows that for persisting methods, we had a child with a str type and that would raise an error, i.e. loading method types was not possible for users who passed trusted!=True. Additional changes As a consequence of the last point, the auditing code has been changed to accept str as type. Alternatively, we can make the change explained here: skops-dev#338 (comment) i.e. not storing the method name in children. Another "victim" of this change is that the so far dead code of checking for primitive types inside of get_unsafe_set has been removed. This code was supposed to check if the type is a primitive type but it was defective. get_module(child) would raise an error if an instance of the type would be passed. We could theoretically fix that code, but it would still be dead code because primitive types are stored as json. Another small change is to exclude the code in skops/io/old from mypy checks. Otherwise, we would have to update its type signatures if signatures in the persistence code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
skops/io/_general.py
Outdated
} | ||
self.func_name = state["content"]["func"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe then we want to make sure it's a string? like, cast it to str
maybe?
See discussion:
https://discord.com/channels/879548962464493619/1047505653603774584/1085550033207836783
Description
Right now, all nodes are always initialized with
trusted=False
, so they are not "aware" of what is trusted and what isn't (which means the argument might as well not exist). This refactor would pass the actualtrusted
argument the user provides to the nodes. As a consequence, the constructed tree is aware of what nodes are considered trusted or not.As a consequence, when calling
node.get_unsafe_set
, the returned set takes thetrusted
argument into consideration. Before this change, it would return all untrusted types, even if the user explicitly trusted them via thetrusted
argument. As a consequence,audit_tree
now no longer requires thetrusted
argument, as the tree already is aware.A nice benefit of this change is that now,
visualize
can take atrusted
argument, which is passed to the tree and is taken into account when visiting the nodes and determining if they can be trusted.