-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No download? No build dir? No problem! #305
No download? No build dir? No problem! #305
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No download? No build dir? No problem!
Yay! 🥳
nextstrain/cli/argparse.py
Outdated
@@ -78,7 +79,7 @@ def register_commands(parser, commands): | |||
|
|||
for cmd in commands: | |||
subparser = cmd.register_parser(subparsers) | |||
subparser.set_defaults( __command__ = cmd ) | |||
subparser.set_defaults( __command__ = cmd, __parser__ = weakref.proxy(subparser) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading through the weakref docs, but would like to understand choice to use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weak references are used with automatic reference counting systems for memory management. They're a limited form of garbage collection (GC), though many folks only use GC to mean something different like mark-and-sweep systems. Reference counting systems track how many times (the refcount) a thing (e.g. a value/object) is referred to (e.g. by variables) and when that drops to 0 then the system knows it can destroy it and return its memory usage to the available pool for use by something else.
A problem arises with cyclical references because refcounts never drop to 0 for affected objects. This results in memory leaks because affected values aren't ever destroyed. Consider an object A that stores a reference to itself in its properties/attributes; the refcount of A will never be less than 1 even when nothing else outside of A refers to it. In practice, I think some reference counting with additional bookkeeping can identify simple cycles (e.g. A → A → A → …) but not more complex/indirect ones (e.g. A → B → A → B → …).
A weak reference is one which doesn't increase the reference count, which means it doesn't ensure the value stays in existence, and thus can be used to break cyclical references and avoid memory leaks.
Here, I'm assuming that subparser.set_defaults()
ends up storing the passed values in itself somewhere so it can use it later when parse_args()
is called. Since I'm passing in subparser
to itself and expecting it to store what I pass, I'm assuming a cyclical reference will be created and that I should break that with weakref. That said, the Nextstrain CLI processes are not very long-lived and the argument parsing code is called once, maybe twice, but not repeatedly many times, so the consequence of a cyclical reference is probably very low. I probably could omit weakening and not worry about it! Maybe I should not worry.
This got me thinking about it some more, and I remembered that Python (well, CPython) also complements reference-counting with a traversal-based garbage collector. This is specifically to handle the case of cyclical references. So I guess I really shouldn't worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to play around with this to get a sense of it, try this patch:
diff --git a/nextstrain/cli/__init__.py b/nextstrain/cli/__init__.py
index 5c1b110..4f388df 100644
--- a/nextstrain/cli/__init__.py
+++ b/nextstrain/cli/__init__.py
@@ -32,6 +32,15 @@ def run(args):
parser = make_parser()
opts = parser.parse_args(args)
+ # Removing "parser" which holds references to the "subparser" referred to
+ # weakly by "opts.__parser__".
+ del parser
+
+ # Trigger GC now, before we use opts.__parser__ below. This will render
+ # the weakref invalid.
+ import gc
+ gc.collect()
+
try:
return opts.__command__.run(opts)
alone and in combination with this one:
diff --git a/nextstrain/cli/argparse.py b/nextstrain/cli/argparse.py
index fa7b035..c654c0d 100644
--- a/nextstrain/cli/argparse.py
+++ b/nextstrain/cli/argparse.py
@@ -79,7 +79,7 @@ def register_commands(parser, commands):
for cmd in commands:
subparser = cmd.register_parser(subparsers)
- subparser.set_defaults( __command__ = cmd, __parser__ = weakref.proxy(subparser) )
+ subparser.set_defaults( __command__ = cmd, __parser__ = subparser )
# Ensure all subparsers format like the top-level parser
subparser.formatter_class = parser.formatter_class
Also try commenting out/deleting the del parser
line in the first patch. Automatic reference counting isn't guaranteed to destroy objects immediately upon refcount going to 0, hence why the first patch triggers GC immediately.
In any case, I'm going to remove the weakref usage as the traversal-based GC should make it unnecessary (and consequences are very low anyway due to short process lifetimes). Thanks for getting me to think more about this beyond my initial "oh, let's not create a reference cycle here"!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Bookmarking this to read and play with more later. Thank you for the in depth explanation 🌟)
…or message Normally argparse does this when it detects a usage error, but this lets us handle other cases it doesn't in a similar way. Similar to how we track which command class the args parse to, we track the command parser the args parse to so we can emit appropriate usage information. For consistency with argparse, the exit status of a usage error is 2 instead of 1, but the distinction is unlikely to be actually useful in practice.
This allows usages which just want to check job status/logs to stop passing in a meaningless/unused directory.
1fc0a0e
to
7979f6f
Compare
Make the
nextstrain build
pathogen<directory>
optional when--attach
+--no-download
are usedThis allows usages which just want to check job status/logs to stop passing in a meaningless/unused directory.
Testing