-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT drops unused volatile reads. #6280
Comments
It doesn't happen if the result of volatile read is put in an observable location. |
If it happens in stack space, did it really happen? (Understand the motivation is for a barrier) |
Is it ever observable if an unused volatile read is removed? ECMA surely does not say they can't be dropped because it's not detectable that they are dropped. Coalescing also is fine because it's not detectable. It just looks like the two reads happened without any interleaved writes which was possible before. So even if ECMA says that that must refer to the abstract machine, not to the generated x86. Under the as-if rule this is OK. |
The rationale provided by ECMA references mapped hardware registers and as such it's pretty clear that there's no such thing an "unused volatile read". |
I did not know that ECMA assigned hardware port semantics to volatile! This seems like a bad hybrid of C++ volatile and memory barriers. ECMA should be changed to not support hardware registers in memory and only assign barrier semantics to volatile. .NET is not being used in kernel mode. That's unrealistic. Indeed, I.12.6.7 makes this optimization illegal. I think this is a design defect in ECMA. This should be changed and the JIT should ignore that since adhering to that paragraph can't possibly do any good. This is not a breaking change since hardware registers never were accessible to user mode. Nobody could have made use of this rule. |
More or less. It's what VC++ does with volatile and is an extension of the C++ standard.
I'm not sure what actual problem that would solve. |
It would allow the JIT to coalesce volatile accesses. Currently, that is illegal (although I think the JIT should just not comply). C++ compilers already do this for |
Even if the ECMA memory model only dealt with barrier semantics, the removal of a volatile read is potentially observable. For example, given the code in the original post, if the load is removed, the compiler is free to reorder the non-volatile write to |
Yes, the barrier must stay but the load can go. For example, it can be a dummy load from a stack location. Ideally, the compiler had an internal representation for a partial barrier. Code gen could then pick the cheapest instructions for that. |
Yes, that's a good point.
I'm not terribly familiar with the intricacies of user-mode drivers, but this statement only holds so long as there is no way for a user-mode program to access device memory directly. If such memory has ever been accessible to a user-mode .NET program (e.g. via a call to MmMapLockedPagesSpecifyCache in a cooperating kernel-mode driver), then allowing the removal of unused volatile reads may break programs that expect them to be side-effecting. |
Coalesce as in "eliminate redundant loads"? Again, I'm not sure what actual problem that would solve. Also, I don't remember seeing a C++ compiler doing that. Which is not surprising considering that it's not exactly trivial to do so correctly. |
Here are some resources that I liked:
Other compilers are starting to build understanding about (partial) barriers into their engines. I'm personally not sure that time is well spent for .NET since .NET code usually does not spend much time with such low level primitives. But it's useful to understand what's possible and what approaches are being taken by others. |
@GSPP these are good resources; thanks for sharing them! Regarding the original issue: the likelihood of changes to ECMA or a decision to disregard the semantics it prescribes is remote. A more realistic (but not necessarily likely) path forward for enabling some of these transformations is a new set of APIs similar to the |
Exaggerating a bit those resources can be summed up by the following quote from N4455:
That "being discussed" discussion is a link to a LLVM bug created some 3 years ago. After more than a year someone replied and asked for a use case. Nothing happened since then. I'm not saying that there aren't any such optimizations that are useful but it's very difficult to justify changing an existing spec based on what at the moment looks more like speculation. As already pointed out by @pgavlin above this would be better handled by adding new APIs, similar to what C++ did with |
And even on x86 where stores aren't reordered you can still run into issues if a load is removed because that load might have been there to prevent load/store reordering which x86 does allow. Consider Intel's load/store reordering example:
It is possible to get
The added load can't be reordered with the previous store because they access the same location. It also can't be reordered with the existing load because x86 (supposedly) doesn't reorder loads. Now, imagine what happens if the JIT comes in and eliminates the added load because That said, fixing JIT's volatile load elimination doesn't actually fix this. But that's another story, at least then JIT will follow Intel's MM even if that doesn't accurately match .NET's MM. |
I believe this can be closed. ECMA reserves the possibility of using device memory where reads may have nontrivial sideeffects, but a possibility of such scenario in combination with GC seems very doubtful. For the regular case we should treat effects of volatile reads as once we read, we may observe a different value and not as once we read, a bulb lights up on the wall. With the above said, the JIT is right. Unused volatile reads can be dropped. |
Per memory document it is permitted to drop unused reads (volatile or ordinary) NB: coalescing adjacent volatile reads is not permitted, but dropping unused ones is ok. |
If the following program is compiled with optimizations enabled (e.g. the
/o+
flag is provided on thecsc
command line), RyuJIT will drop the volatile read ofa
:The IL as reported by a JIT dump is:
The generated assembly is:
From the dump, it appears that the importer drops the volatile
ldsfld
on the floor when it imports thepop
instruction. This is not correct per the ECMA spec, which states that volatile operations may not be removed or coalesced.Thanks to @omariom for reporting the original issue in #6257.
category:correctness
theme:volatile
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: