-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce per-file timing-stats #12639
Conversation
When profiling mypy over a large codebase, it can be useful to know which files are slowest to typecheck. Gather per-file timing stats and expose them through a new (hidden) command line switch
baa880e
to
e191a32
Compare
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I've wanted to have this feature myself when debugging performance issues.
Can you add at least one integration test that ensures that mypy can be run with the flag successfully? Possibly also check the format of the generated file. Obviously the timings aren't repeatable, so they would have to be ignored.
Also, the builds were failing.
Done. I added a way to specify test output files via regex to achieve that.
Yeah, I didn't realize mypy still supported py3.6 |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few optional comments.
mypy/build.py
Outdated
with self.wrap_context(): | ||
self.type_checker().check_first_pass() | ||
self.time_spent_us += int((time.perf_counter() - t0) * 1e6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do int((time.perf_counter() - t0) * 1e6)
a lot, it could be a bit cleaner to refactor (most of) it to a helper function. E.g. something like this:
def time_us() -> int:
return int(time.perf_counter() * 1e6)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that seems fine. That can then be refactored to use perf_counter_ns
whenever support for py3.6 is dropped
Dump timing stats for each file in the given graph | ||
""" | ||
with open(path, 'w') as f: | ||
for k in sorted(graph.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about sorting by the time spent in file instead of the module name? It will be easy to search by module name within the stats dump, but sorting by the time is slightly more involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A straightforward sort -n -k2
on the output can give you time-sorting from the current format.
I originally thought about the module order being nice to get diffs between runs but the jitter is going to be high enough that such a diff would probably need a little extra massaging (e.g only show changes above a certain % threshold) to actually be helpful.
TL;DR I don't think the sort order makes much of a difference in practice so I would be fine going with either order, or even letting it be unsorted. In follow-up commit we can introduce a script similar to go's benchstat
to get more value out of the raw data.
This comment has been minimized.
This comment has been minimized.
edeed8d
to
caddce9
Compare
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
caddce9
to
e2ce6c7
Compare
According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good.
cool feature! |
On Thu, Apr 21, 2022, 7:54 AM Jukka Lehtosalo ***@***.***> wrote:
***@***.**** commented on this pull request.
Cool, I've wanted to have this feature myself when debugging performance
issues.
Can you add at least one integration test that ensures that mypy can be
run with the flag successfully? Possibly also check the format of the
generated file. Obviously the timings aren't repeatable, so they would have
to be ignored.
will do
Also, the builds were failing.
yeah,I didn't realize mypy still supported running under 3.6. I will need
to use a different timing API
… ------------------------------
In mypy/build.py
<#12639 (comment)>:
> @@ -1808,6 +1810,9 @@ class State:
fine_grained_deps_loaded = False
+ # Cumulative time spent on this file (for profiling stats)
Include the unit of time in the name of the variable (e.g. time_spent_ns).
------------------------------
In mypy/build.py
<#12639 (comment)>:
> @@ -3091,6 +3120,8 @@ def process_graph(graph: Graph, manager: BuildManager) -> None:
manager.log("No fresh SCCs left in queue")
+
+
These extra empty lines seem redundant.
—
Reply to this email directly, view it on GitHub
<#12639 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABX3CSN56YWIOZNXXKVPOG3VGFT2ZANCNFSM5T6P2QDA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@huguesb did you comment on the wrong PR? For what it's worth we dropped 3.6 support. |
uh, that's very weird. I did not intentionally send that... it looks relevant to the history of this PR but not after its closure, and I don't even see this message as a "sent" item my email client... |
Maybe some email queue in GitHub had a very long delay :) |
When profiling mypy over a large codebase, it can be useful
to know which files are slowest to typecheck.
Gather per-file timing stats and expose them through a new
(hidden) command line switch