Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce per-file timing-stats #12639

Merged
merged 5 commits into from
Apr 29, 2022
Merged

introduce per-file timing-stats #12639

merged 5 commits into from
Apr 29, 2022

Conversation

huguesb
Copy link
Contributor

@huguesb huguesb commented Apr 21, 2022

When profiling mypy over a large codebase, it can be useful
to know which files are slowest to typecheck.

Gather per-file timing stats and expose them through a new
(hidden) command line switch

When profiling mypy over a large codebase, it can be useful
to know which files are slowest to typecheck.

Gather per-file timing stats and expose them through a new
(hidden) command line switch
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I've wanted to have this feature myself when debugging performance issues.

Can you add at least one integration test that ensures that mypy can be run with the flag successfully? Possibly also check the format of the generated file. Obviously the timings aren't repeatable, so they would have to be ignored.

Also, the builds were failing.

@huguesb
Copy link
Contributor Author

huguesb commented Apr 22, 2022

Can you add at least one integration test that ensures that mypy can be run with the flag successfully? Possibly also check the format of the generated file. Obviously the timings aren't repeatable, so they would have to be ignored.

Done. I added a way to specify test output files via regex to achieve that.

Also, the builds were failing.

Yeah, I didn't realize mypy still supported py3.6
Fixed

@huguesb huguesb mentioned this pull request Apr 22, 2022
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@huguesb huguesb requested a review from JukkaL April 26, 2022 04:03
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few optional comments.

mypy/build.py Outdated
with self.wrap_context():
self.type_checker().check_first_pass()
self.time_spent_us += int((time.perf_counter() - t0) * 1e6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do int((time.perf_counter() - t0) * 1e6) a lot, it could be a bit cleaner to refactor (most of) it to a helper function. E.g. something like this:

def time_us() -> int:
    return int(time.perf_counter() * 1e6)

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that seems fine. That can then be refactored to use perf_counter_ns whenever support for py3.6 is dropped

Dump timing stats for each file in the given graph
"""
with open(path, 'w') as f:
for k in sorted(graph.keys()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about sorting by the time spent in file instead of the module name? It will be easy to search by module name within the stats dump, but sorting by the time is slightly more involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A straightforward sort -n -k2 on the output can give you time-sorting from the current format.

I originally thought about the module order being nice to get diffs between runs but the jitter is going to be high enough that such a diff would probably need a little extra massaging (e.g only show changes above a certain % threshold) to actually be helpful.

TL;DR I don't think the sort order makes much of a difference in practice so I would be fine going with either order, or even letting it be unsorted. In follow-up commit we can introduce a script similar to go's benchstat to get more value out of the raw data.

@github-actions

This comment has been minimized.

@huguesb huguesb force-pushed the pr-timing-stats branch 2 times, most recently from edeed8d to caddce9 Compare April 27, 2022 20:17
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good.

@JukkaL JukkaL merged commit d48d548 into python:master Apr 29, 2022
@huguesb huguesb deleted the pr-timing-stats branch April 29, 2022 18:48
@97littleleaf11
Copy link
Collaborator

cool feature!

@huguesb
Copy link
Contributor Author

huguesb commented Oct 11, 2022 via email

@JelleZijlstra
Copy link
Member

@huguesb did you comment on the wrong PR? For what it's worth we dropped 3.6 support.

@huguesb
Copy link
Contributor Author

huguesb commented Oct 11, 2022

uh, that's very weird. I did not intentionally send that... it looks relevant to the history of this PR but not after its closure, and I don't even see this message as a "sent" item my email client...

@JelleZijlstra
Copy link
Member

Maybe some email queue in GitHub had a very long delay :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants