Skip to content

Commit

Permalink
JIT: Add a pass of early liveness and use it for forward sub and last…
Browse files Browse the repository at this point in the history
…-use copy elision for implicit byrefs (#79346)

This runs a pass of liveness right after local morph and uses it for forward sub and to omit copies of structs when passed as implicit byrefs at their last use.

Fix #76069
Fix #75206
Fix #65025
Fix #9839

This PR introduces the following new JIT invariants:

* When optimizing, local morph will now thread all locals into a tree list accessed by Statement::LocalsTreeList. This tree list is kept valid starting from local morph and ending with forward sub. There is no memory impact of this since we reuse the GenTree::gtPrev and GenTree::gtNext fields.
* Early liveness information (GTF_VAR_DEATH and the promoted struct death vars map) is kept valid (sound) starting from early liveness and ending with morph.
  There are asserts that the tree list is up to date when it is accessed. This is done through a new member fgNodeThreading that replaces the preexisting fgStmtListThreaded and keeps information about what the current kind of node threading is.

The benefits are large, -2 MB on win-x64 collections (-0.85% on libraries.pmi that only has optimized contexts), with a number of regressions as expected when removing locals.
The improvements primarily come from the omission of copies for implicit byrefs, so the benefits on platforms with fewer implicit byrefs is smaller, but the forward sub change alone is still very impactful (e.g. -300K on linux-x64).

The throughput impact is around 1% in optimized contexts and below 0.1% in unoptimized contexts, the latter due to local morph needing to check if it should be threading nodes.
  • Loading branch information
jakobbotsch authored Jan 11, 2023
1 parent 7c265c3 commit db717e3
Show file tree
Hide file tree
Showing 29 changed files with 1,585 additions and 270 deletions.
9 changes: 9 additions & 0 deletions src/coreclr/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3651,6 +3651,15 @@ GenTree* Compiler::optCopyAssertionProp(AssertionDsc* curAssertion,
tree->SetLclNum(copyLclNum);
tree->SetSsaNum(copySsaNum);

// Copy prop and last-use copy elision happens at the same time in morph.
// This node may potentially not be a last use of the new local.
//
// TODO-CQ: It is probably better to avoid doing this propagation if we
// would otherwise omit an implicit byref copy since this propagation will
// force us to create another copy anyway.
//
tree->gtFlags &= ~GTF_VAR_DEATH;

#ifdef DEBUG
if (verbose)
{
Expand Down
4 changes: 3 additions & 1 deletion src/coreclr/jit/block.h
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,8 @@ enum BasicBlockFlags : unsigned __int64
BBF_BACKWARD_JUMP_SOURCE = MAKE_BBFLAG(41), // Block is a source of a backward jump
BBF_HAS_MDARRAYREF = MAKE_BBFLAG(42), // Block has a multi-dimensional array reference

BBF_RECURSIVE_TAILCALL = MAKE_BBFLAG(43), // Block has recursive tailcall that may turn into a loop

// The following are sets of flags.

// Flags that relate blocks to loop structure.
Expand All @@ -562,7 +564,7 @@ enum BasicBlockFlags : unsigned __int64
// For example, the top block might or might not have BBF_GC_SAFE_POINT,
// but we assume it does not have BBF_GC_SAFE_POINT any more.

BBF_SPLIT_LOST = BBF_GC_SAFE_POINT | BBF_HAS_JMP | BBF_KEEP_BBJ_ALWAYS | BBF_CLONED_FINALLY_END,
BBF_SPLIT_LOST = BBF_GC_SAFE_POINT | BBF_HAS_JMP | BBF_KEEP_BBJ_ALWAYS | BBF_CLONED_FINALLY_END | BBF_RECURSIVE_TAILCALL,

// Flags gained by the bottom block when a block is split.
// Note, this is a conservative guess.
Expand Down
21 changes: 19 additions & 2 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4391,8 +4391,8 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl

// Enable the post-phase checks that use internal logic to decide when checking makes sense.
//
activePhaseChecks =
PhaseChecks::CHECK_EH | PhaseChecks::CHECK_LOOPS | PhaseChecks::CHECK_UNIQUE | PhaseChecks::CHECK_PROFILE;
activePhaseChecks = PhaseChecks::CHECK_EH | PhaseChecks::CHECK_LOOPS | PhaseChecks::CHECK_UNIQUE |
PhaseChecks::CHECK_PROFILE | PhaseChecks::CHECK_LINKED_LOCALS;

// Import: convert the instrs in each basic block to a tree based intermediate representation
//
Expand Down Expand Up @@ -4604,10 +4604,23 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl
//
DoPhase(this, PHASE_STR_ADRLCL, &Compiler::fgMarkAddressExposedLocals);

if (opts.OptimizationEnabled())
{
fgNodeThreading = NodeThreading::AllLocals;
}

// Do an early pass of liveness for forward sub and morph. This data is
// valid until after morph.
//
DoPhase(this, PHASE_EARLY_LIVENESS, &Compiler::fgEarlyLiveness);

// Run a simple forward substitution pass.
//
DoPhase(this, PHASE_FWD_SUB, &Compiler::fgForwardSub);

// Locals tree list is no longer kept valid.
fgNodeThreading = NodeThreading::None;

// Apply the type update to implicit byref parameters; also choose (based on address-exposed
// analysis) which implicit byref promotions to keep (requires copy to initialize) or discard.
//
Expand Down Expand Up @@ -4750,6 +4763,8 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl
//
DoPhase(this, PHASE_SET_BLOCK_ORDER, &Compiler::fgSetBlockOrder);

fgNodeThreading = NodeThreading::AllTrees;

// At this point we know if we are fully interruptible or not
if (opts.OptimizationEnabled())
{
Expand Down Expand Up @@ -4942,6 +4957,8 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl
Rationalizer rat(this); // PHASE_RATIONALIZE
rat.Run();

fgNodeThreading = NodeThreading::LIR;

// Here we do "simple lowering". When the RyuJIT backend works for all
// platforms, this will be part of the more general lowering phase. For now, though, we do a separate
// pass of "final lowering." We must do this before (final) liveness analysis, because this creates
Expand Down
90 changes: 63 additions & 27 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -1454,13 +1454,14 @@ extern const char* PhaseEnums[];
// clang-format off
enum class PhaseChecks : unsigned int
{
CHECK_NONE = 0,
CHECK_IR = 1 << 0, // ir flags, etc
CHECK_UNIQUE = 1 << 1, // tree node uniqueness
CHECK_FG = 1 << 2, // flow graph integrity
CHECK_EH = 1 << 3, // eh table integrity
CHECK_LOOPS = 1 << 4, // loop table integrity
CHECK_PROFILE = 1 << 5, // profile data integrity
CHECK_NONE = 0,
CHECK_IR = 1 << 0, // ir flags, etc
CHECK_UNIQUE = 1 << 1, // tree node uniqueness
CHECK_FG = 1 << 2, // flow graph integrity
CHECK_EH = 1 << 3, // eh table integrity
CHECK_LOOPS = 1 << 4, // loop table integrity
CHECK_PROFILE = 1 << 5, // profile data integrity
CHECK_LINKED_LOCALS = 1 << 6, // check linked list of locals
};

inline constexpr PhaseChecks operator ~(PhaseChecks a)
Expand Down Expand Up @@ -1860,6 +1861,16 @@ struct RichIPMapping
DebugInfo debugInfo;
};

// Current kind of node threading stored in GenTree::gtPrev and GenTree::gtNext.
// See fgNodeThreading for more information.
enum class NodeThreading
{
None,
AllLocals, // Locals are threaded (after local morph when optimizing)
AllTrees, // All nodes are threaded (after gtSetBlockOrder)
LIR, // Nodes are in LIR form (after rationalization)
};

/*
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Expand Down Expand Up @@ -1910,6 +1921,7 @@ class Compiler
friend class LIR;
friend class ObjectAllocator;
friend class LocalAddressVisitor;
friend struct Statement;
friend struct GenTree;
friend class MorphInitBlockHelper;
friend class MorphCopyBlockHelper;
Expand Down Expand Up @@ -2789,7 +2801,7 @@ class Compiler
// is #of nodes in subtree) of "tree" is greater than "limit".
// (This is somewhat redundant with the "GetCostEx()/GetCostSz()" fields, but can be used
// before they have been set.)
bool gtComplexityExceeds(GenTree** tree, unsigned limit);
bool gtComplexityExceeds(GenTree* tree, unsigned limit);

GenTree* gtReverseCond(GenTree* tree);

Expand Down Expand Up @@ -4448,30 +4460,47 @@ class Compiler
bool fgRemoveRestOfBlock; // true if we know that we will throw
bool fgStmtRemoved; // true if we remove statements -> need new DFA

// There are two modes for ordering of the trees.
// - In FGOrderTree, the dominant ordering is the tree order, and the nodes contained in
// each tree and sub-tree are contiguous, and can be traversed (in gtNext/gtPrev order)
// by traversing the tree according to the order of the operands.
// - In FGOrderLinear, the dominant ordering is the linear order.

enum FlowGraphOrder
{
FGOrderTree,
FGOrderLinear
};
// There are two modes for ordering of the trees.
// - In FGOrderTree, the dominant ordering is the tree order, and the nodes contained in
// each tree and sub-tree are contiguous, and can be traversed (in gtNext/gtPrev order)
// by traversing the tree according to the order of the operands.
// - In FGOrderLinear, the dominant ordering is the linear order.
FlowGraphOrder fgOrder;

// The following are boolean flags that keep track of the state of internal data structures

bool fgStmtListThreaded; // true if the node list is now threaded
bool fgCanRelocateEHRegions; // true if we are allowed to relocate the EH regions
bool fgEdgeWeightsComputed; // true after we have called fgComputeEdgeWeights
bool fgHaveValidEdgeWeights; // true if we were successful in computing all of the edge weights
bool fgSlopUsedInEdgeWeights; // true if their was some slop used when computing the edge weights
bool fgRangeUsedInEdgeWeights; // true if some of the edgeWeight are expressed in Min..Max form
weight_t fgCalledCount; // count of the number of times this method was called
// This is derived from the profile data
// or is BB_UNITY_WEIGHT when we don't have profile data
// The following are flags that keep track of the state of internal data structures

// Even in tree form (fgOrder == FGOrderTree) the trees are threaded in a
// doubly linked lists during certain phases of the compilation.
// - Local morph threads all locals to be used for early liveness and
// forward sub when optimizing. This is kept valid until after forward sub.
// The first local is kept in Statement::GetRootNode()->gtNext and the last
// local in Statement::GetRootNode()->gtPrev. fgSequenceLocals can be used
// to (re-)sequence a statement into this form, and
// Statement::LocalsTreeList for range-based iteration. The order must
// match tree order.
//
// - fgSetBlockOrder threads all nodes. This is kept valid until LIR form.
// In this form the first node is given by Statement::GetTreeList and the
// last node is given by Statement::GetRootNode(). fgSetStmtSeq can be used
// to (re-)sequence a statement into this form, and Statement::TreeList for
// range-based iteration. The order must match tree order.
//
// - Rationalization links all nodes into linear form which is kept until
// the end of compilation. The first and last nodes are stored in the block.
NodeThreading fgNodeThreading;
bool fgCanRelocateEHRegions; // true if we are allowed to relocate the EH regions
bool fgEdgeWeightsComputed; // true after we have called fgComputeEdgeWeights
bool fgHaveValidEdgeWeights; // true if we were successful in computing all of the edge weights
bool fgSlopUsedInEdgeWeights; // true if their was some slop used when computing the edge weights
bool fgRangeUsedInEdgeWeights; // true if some of the edgeWeight are expressed in Min..Max form
weight_t fgCalledCount; // count of the number of times this method was called
// This is derived from the profile data
// or is BB_UNITY_WEIGHT when we don't have profile data

#if defined(FEATURE_EH_FUNCLETS)
bool fgFuncletsCreated; // true if the funclet creation phase has been run
Expand Down Expand Up @@ -4724,6 +4753,8 @@ class Compiler
GenTreeLclVarCommon* lclVarNode);
bool fgComputeLifeLocal(VARSET_TP& life, VARSET_VALARG_TP keepAliveVars, GenTree* lclVarNode);

GenTree* fgTryRemoveDeadStoreEarly(Statement* stmt, GenTreeLclVarCommon* dst);

void fgComputeLife(VARSET_TP& life,
GenTree* startNode,
GenTree* endNode,
Expand Down Expand Up @@ -5419,6 +5450,7 @@ class Compiler
void fgDebugCheckLinks(bool morphTrees = false);
void fgDebugCheckStmtsList(BasicBlock* block, bool morphTrees);
void fgDebugCheckNodeLinks(BasicBlock* block, Statement* stmt);
void fgDebugCheckLinkedLocals();
void fgDebugCheckNodesUniqueness();
void fgDebugCheckLoopTable();
void fgDebugCheckSsa();
Expand Down Expand Up @@ -5835,6 +5867,8 @@ class Compiler

bool byrefStatesMatchGcHeapStates; // True iff GcHeap and ByrefExposed memory have all the same def points.

PhaseStatus fgEarlyLiveness();

void fgMarkUseDef(GenTreeLclVarCommon* tree);

void fgBeginScopeLife(VARSET_TP* inScope, VarScopeDsc* var);
Expand Down Expand Up @@ -5926,11 +5960,12 @@ class Compiler
void fgMarkDemotedImplicitByRefArgs();

PhaseStatus fgMarkAddressExposedLocals();
void fgMarkAddressExposedLocals(Statement* stmt);
void fgSequenceLocals(Statement* stmt);

PhaseStatus fgForwardSub();
bool fgForwardSubBlock(BasicBlock* block);
bool fgForwardSubStatement(Statement* statement);
void fgForwardSubUpdateLiveness(GenTree* newSubListFirst, GenTree* newSubListLast);

// The given local variable, required to be a struct variable, is being assigned via
// a "lclField", to make it masquerade as an integral type in the ABI. Make sure that
Expand Down Expand Up @@ -9049,6 +9084,8 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

bool fgLocalVarLivenessDone; // Note that this one is used outside of debug.
bool fgLocalVarLivenessChanged;
bool fgIsDoingEarlyLiveness;
bool fgDidEarlyLiveness;
bool compLSRADone;
bool compRationalIRForm;

Expand Down Expand Up @@ -10991,7 +11028,6 @@ class GenTreeVisitor
if (TVisitor::UseExecutionOrder && node->IsReverseOp())
{
assert(node->AsMultiOp()->GetOperandCount() == 2);

result = WalkTree(&node->AsMultiOp()->Op(2), node);
if (result == fgWalkResult::WALK_ABORT)
{
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/compphases.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ CompPhaseNameMacro(PHASE_UPDATE_FINALLY_FLAGS, "Update finally target flag
CompPhaseNameMacro(PHASE_COMPUTE_PREDS, "Compute preds", false, -1, false)
CompPhaseNameMacro(PHASE_EARLY_UPDATE_FLOW_GRAPH, "Update flow graph early pass", false, -1, false)
CompPhaseNameMacro(PHASE_STR_ADRLCL, "Morph - Structs/AddrExp", false, -1, false)
CompPhaseNameMacro(PHASE_EARLY_LIVENESS, "Early liveness", false, -1, false)
CompPhaseNameMacro(PHASE_FWD_SUB, "Forward Substitution", false, -1, false)
CompPhaseNameMacro(PHASE_MORPH_IMPBYREF, "Morph - ByRefs", false, -1, false)
CompPhaseNameMacro(PHASE_PROMOTE_STRUCTS, "Morph - Promote Structs", false, -1, false)
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/earlyprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,7 @@ bool Compiler::optIsNullCheckFoldingLegal(GenTree* tree,
// until we get to the indirection or process the statement root.
GenTree* previousTree = nullCheckTree;
GenTree* currentTree = nullCheckTree->gtNext;
assert(fgStmtListThreaded);
assert(fgNodeThreading == NodeThreading::AllTrees);
while (canRemoveNullCheck && (currentTree != tree) && (currentTree != nullptr))
{
if ((*nullCheckParent == nullptr) && currentTree->TryGetUse(nullCheckTree))
Expand Down
4 changes: 3 additions & 1 deletion src/coreclr/jit/fgbasic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,12 @@ void Compiler::fgInit()
#endif // DEBUG

fgLocalVarLivenessDone = false;
fgIsDoingEarlyLiveness = false;
fgDidEarlyLiveness = false;

/* Statement list is not threaded yet */

fgStmtListThreaded = false;
fgNodeThreading = NodeThreading::None;

// Initialize the logic for adding code. This is used to insert code such
// as the code that raises an exception when an array range check fails.
Expand Down
Loading

0 comments on commit db717e3

Please sign in to comment.