-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cached RW mapping when the corresponding RX one is released #82841
Conversation
When a RX mapping in the ExecutableAllocation is released and there is a RW mapping cached for it, it incorrectly stays in the cache. So when the same virtual address range that was just released gets reserved again and a request to get RW mapping comes in, the cached RW mapping is used. But it is likely that the new RX mapping has a different offset in the underlying file mapping and thus the RW mapping is unrelated to the new RX mapping. Using this RW mapping results either in an overwrite of a different block of memory or just a silent dropping of what's written. This was discovered when investigating a GC reliability framework crash. It turned out it was also the culprit behind the dotnet#75244 issue that I was unable to reproduce and it kept occuring in the CI on x86 only for quite some time. In addition to the fix, I've found that there was an off by one condition in the ExecutableAllocator::FindRWBlock so I've fixed that. And I've also noticed that in UnlockedLoaderHeap::UnlockedReservePages, if the allocation of LoaderHeapBlock failed, we would leave the data block incorrectly on the m_pRangeList. I've fixed that too. Close dotnet#75244
c847589
to
ba4592c
Compare
cc: @Maoni0 this is the fix for the GC reliability framework crash. |
nice! thank you @janvorli! |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/backport to release/7.0 |
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/4324991438 |
@janvorli backporting to release/7.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Remove cached RW mapping when the corresponding RX one is released
Using index info to reconstruct a base tree...
M src/coreclr/utilcode/executableallocator.cpp
M src/coreclr/utilcode/loaderheap.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/utilcode/loaderheap.cpp
CONFLICT (content): Merge conflict in src/coreclr/utilcode/loaderheap.cpp
Auto-merging src/coreclr/utilcode/executableallocator.cpp
CONFLICT (content): Merge conflict in src/coreclr/utilcode/executableallocator.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Remove cached RW mapping when the corresponding RX one is released
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@janvorli an error occurred while backporting to release/7.0, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
When a RX mapping in the ExecutableAllocation is released and there is a RW mapping cached for it, it incorrectly stays in the cache. So when the same virtual address range that was just released gets reserved again and a request to get RW mapping comes in, the cached RW mapping is used. But it is likely that the new RX mapping has a different offset in the underlying file mapping and thus the RW mapping is unrelated to the new RX mapping. Using this RW mapping results either in an overwrite of a different block of memory or just a silent dropping of what's written.
This was discovered when investigating a GC reliability framework crash. It turned out it was also the culprit behind the #75244 issue that I was unable to reproduce and it kept occuring in the CI on x86 only for quite some time.
In addition to the fix, while investigating the original issue, I've noticed that in
UnlockedLoaderHeap::UnlockedReservePages
,if the allocation of
LoaderHeapBlock
failed, we would leave the data block incorrectly on them_pRangeList
. I've fixed that too.Close #75244