Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiler] Avoid Recursive ThreadStoreLock in Profiling Thread Enumerator #110548

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Dec 9, 2024

Fixes #110062

Profiling Enumerators look to acquire the ThreadStoreLock.
In release config, re-acquiring the ThreadStoreLock and releasing it in ProfilerThreadEnum::Init will cause problems if the callback invoking EnumThread has logic that depends on the ThreadStoreLock being held.
For example, invoking EnumThread within RuntimeSuspendFinished will violate the expectation that the ThreadStoreLock is held until RestartEE is called, demonstrated in #110062

This PR aims to avoid recursively acquiring the ThreadStoreLock by expanding the known scnearios where the profiling thread enumerator shouldn't acquire the ThreadStoreLock.

@mdh1418 mdh1418 changed the title [Profiler] Avoid Recursive ThreadStoreLock [Profiler] Avoid Recursive ThreadStoreLock in Profiling Thread Enumerator Dec 9, 2024
@mdh1418 mdh1418 marked this pull request as ready for review December 9, 2024 20:18
@jkotas
Copy link
Member

jkotas commented Dec 9, 2024

Do we need a test for this?

Profiling Enumerators look to acquire the ThreadStoreLock.
In release config, re-acquiring the ThreadStoreLock and
releasing it in ProfilerThreadEnum::Init will cause
problems if the callback invoking EnumThread has logic
that depends on the ThreadStoreLock being held.
To avoid recursively acquiring the ThreadStoreLock,
expand the condition when the profiling thread enumerator
shouldn't acquire the ThreadStoreLock.
There was a potential race condition when setting the flag
before suspending and resetting the flag after restarting.
For example, if the thread restarting runtime is preempted
right after resuming runtime, the flag could remain unset
by the time another thread looks to suspend runtime, which
would see that the flag as set.
@mdh1418 mdh1418 force-pushed the profiler_thread_enum_avoid_recursive_thread_store_lock branch from cd6f6f5 to d56d8bb Compare December 10, 2024 21:06
Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdh1418
Copy link
Member Author

mdh1418 commented Dec 11, 2024

Given that this test should run quickly when it passes, and its expected failure case is when it hangs/deadlocks, is there a straightforward way to set a quick timeout for this test instead of hitting CI's timeouts? And is there a way to hit such a timeout in a local run via corerun? @davmason @jkotas

@davmason
Copy link
Member

The problem with timeouts is you can't check them in because we run a lot of tests in various modes (GCStress, JITStress, etc) that can make them take 10+ minutes. For this kind of thing what I have done in the past is make the timeout short locally and run it in a loop on your dev machine to build confidence, but leave the timeout unchanged in the tree.

@mdh1418 mdh1418 merged commit a390024 into dotnet:main Dec 11, 2024
96 checks passed
@mdh1418 mdh1418 deleted the profiler_thread_enum_avoid_recursive_thread_store_lock branch December 11, 2024 21:35
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 12, 2024

/backport to release/9.0-staging

Copy link
Contributor

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/12303767691

hez2010 pushed a commit to hez2010/runtime that referenced this pull request Dec 14, 2024
…ator (dotnet#110548)

* [Profiler] Avoid Recursive ThreadStoreLock

Profiling Enumerators look to acquire the ThreadStoreLock.
In release config, re-acquiring the ThreadStoreLock and
releasing it in ProfilerThreadEnum::Init will cause
problems if the callback invoking EnumThread has logic
that depends on the ThreadStoreLock being held.
To avoid recursively acquiring the ThreadStoreLock,
expand the condition when the profiling thread enumerator
shouldn't acquire the ThreadStoreLock.

* [Profiler] Change order to set fProfilerRequestedRuntimeSuspend

There was a potential race condition when setting the flag
before suspending and resetting the flag after restarting.
For example, if the thread restarting runtime is preempted
right after resuming runtime, the flag could remain unset
by the time another thread looks to suspend runtime, which
would see that the flag as set.

* [Profiler][Tests] Add unit test for EnumThreads during suspension

* [Profiler][Tests] Fixup EnumThreads test
@github-actions github-actions bot locked and limited conversation to collaborators Jan 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in GC when attached profiler calls ICorProfilerInfo4::EnumThreads using .NET9 runtime
4 participants