-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add some convenience classes for node count JIT metrics, and automatic dumping from compShutdown #92112
Conversation
- Add a NodeCounts class that can be used to easily record the count of node types seen - Add a DumpOnShutdown class that can be used to easily register either a Histogram class or NodeCounts class to be dumped as part of Compiler::compShutdown
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Details
For example, I was investigating #91862 and wanted to know what types of nodes we usually see in the index/length positions of --- a/src/coreclr/jit/optimizer.cpp
+++ b/src/coreclr/jit/optimizer.cpp
@@ -8849,6 +8849,11 @@ void Compiler::AddModifiedElemTypeAllContainingLoops(unsigned lnum, CORINFO_CLAS
}
}
+NodeCounts s_rangeCheckLengthNodes;
+NodeCounts s_rangeCheckIndexNodes;
+DumpOnShutdown d1("BOUNDS_CHECK length nodes", &s_rangeCheckLengthNodes);
+DumpOnShutdown d2("BOUNDS_CHECK index nodes", &s_rangeCheckIndexNodes);
+
//------------------------------------------------------------------------------
// optRemoveRangeCheck : Given an indexing node, mark it as not needing a range check.
//
@@ -8878,6 +8883,9 @@ GenTree* Compiler::optRemoveRangeCheck(GenTreeBoundsChk* check, GenTree* comma,
(check != nullptr && check->OperIs(GT_BOUNDS_CHECK) && comma == nullptr));
noway_assert(check->OperIs(GT_BOUNDS_CHECK));
+ s_rangeCheckIndexNodes.record(check->GetIndex()->OperGet());
+ s_rangeCheckLengthNodes.record(check->GetArrayLength()->OperGet());
+
GenTree* tree = comma != nullptr ? comma : check;
#ifdef DEBUG After that, a superpmi.exe replay over libraries_tests.run outputs the following when it finishes: Loaded 674679 Jitted 674679 FailedCompile 0 Excluded 0 Missing 0
Total time: 565472.612700ms
BOUNDS_CHECK length nodes
ARR_LENGTH : 51990
LCL_VAR : 47159
CNS_INT : 6751
COMMA : 9
LCL_FLD : 5
CAST : 4
BOUNDS_CHECK index nodes
LCL_VAR : 62221
CNS_INT : 42139
COMMA : 623
CAST : 460
ADD : 397
RSH : 72
NEG : 5
UDIV : 1
|
cc @dotnet/jit-contrib PTAL @AndyAyersMS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice. Would be useful to have some comments for the classes showing how to use them.
entry.Dumpable = dumpable; | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an assert that the DumpOnShutdownEntry table isn't all full?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
src/coreclr/jit/utils.cpp
Outdated
void NodeCounts::record(genTreeOps oper) | ||
{ | ||
assert(oper < GT_COUNT); | ||
m_counts[oper]++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this do an interlocked add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that would depend on whether this class is instantiated as global, or as a member of Compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for good measure it should be interlocked add. NodeCounts
is meant to be used as a static field (like Histogram
) so you can collect stats for multiple compilations, so it can be racing with other threads, although I expect we mainly use it through SPMI where it won't due to the way parallelization works there.
Looks like Histogram
should also be changed to use an interlocked add.
For example, I was investigating #91862 and wanted to know what types of nodes we usually see in the index/length positions of
BOUNDS_CHECK
insideoptRemoveRangeCheck
. I made the following change:After that, a superpmi.exe replay over libraries_tests.run outputs the following when it finishes: