You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for @vnen as discussed previously on IRC, and maybe even @reduz can review as well, as it somewhat related to his variadic template encoder commit
@vnen This gdscript code MAY crash if you try it in a test project. Create a Sprite2D and attach this simple script:
extends Sprite2D
func _ready():
self.set_process(true)
func _process(_delta):
var a = get_texture().get_width()
print(a)
Note** That if you change var a = get_texture().get_width() to var a = self.get_texture().get_width() there will be no more crash. This is because the compiler will generate a call-ret opcode in the second case instead of the (hopefully faster) ptr-call-ret opcode.
The root cause is the ptr call to get_texture(). get_texture() is defined in Sprite2D as Ref<Texture2D> get_texture() const; which is pretty commmon.
So far, I've bisected this issue to this commit here: d8b2209 and more notably to the introduction of the (hopefully faster) OPCODE_CALL_PTRCALL_OBJECT opcode.
The root cause is access to uninitialized memory in the gdscript_vm, notably for OPCODE_CALL_PTR opcodes. While this codepath works for every object type in Variant, including variants that wrap a raw Object*. It doesn't work for Ref<> variant objects.
The crash will be somewhat hidden and unlikely to happen because the memory read is within the unref() function of Reference.h. Which checks reference for null before de-referencing the pointer and executing code. Whether or not reference == null in this context is going to depend on what arbitrary memory is on the stack in gdscript_vm when processing OPCODE_CALL_PTRCALL_OBJECT. If it happens to be zeros, no code in unref() will execute and we won't have a bad access.
Below is the call stack from gdscript "get_texture()" running through the gdscript_vm, then through @reduz variadic binders, eventually to call to Ref<> unref().
Note the PtrToArg encoder on the stack frame below. More notably the left had side of the expression casts the passed in 'const void *p_ptr', and dereferences it.
The de-reference in combination with the assignment execute the operator= on the dereferenced Ref<> which eventually gets to calling unref(). See top 3 lines in the first posted stack-trace (line numbers maybe off by a couple in my screenshot).
Now, where does this all come from? Well it all comes down the how the gdscript_vm handles that particular opcode on it's own stack. This relevant opcode execution in the gdscript_vm:
VariantInternal::initialize(ret, Variant::OBJECT) ends up clearing out anything in this Variant* due to eventually calling clear(). As per @reduz refactored variant constructors variants with OBJECT types need a de-init, which if the object is a reference it eventually deletes the in _clear_internal() memdelete(reference);
So in summary:
gdscript_vm: processing OPCODE_CALL_PTRCALL_OBJECT: GET_INSTRUCTION_ARG(ret, argc + 1); (ret is uninitialized memory). The ptrcall in the vm eventually dereferences this address through the encoder (PtrToArg)
So to fix this there's a bunch of different solutions ranging from a new ptr opcde for objects that are references, not just Object*. We could fix or modify the encoder if that's potentially possible. Rework the Ref<> assignment= operator. Initialize stack memory in the vm differently, etc.
But I wanna just bring up how difficult this bug is to identify in the first place. In fact, I was EASILY able to reproduce this bug by a slight modification. unref() in Ref<> is not thread safe. I can't actually use Ref<> in highly multi-threaded code. The Ref<> class isn't suited for highly multi-threaded environments. I simply put a lock in unref() by adding a mutex as class property in Ref<>. Well, haha, locking random, uninitialized memory as a mutex REALLY, REALLY doesn't work and will easily crash every time (as opposed to only crashing when comparing reference to null fails).
Memory issues REALLY suck, and on IRC I already learned @clayjohn dealt with a uninitialized memory issue a while back that took a few hours off his life lol. reading unintialized memory, using afterfree'd, freeing memory that's not yours, all these things can lead to really hard-to-track down bugs later, because the corruption creation point and the corruption discovery point may be in VERY different places. This is basically the whole reason why a slew of c++ sanitizers are being developed (asan, tsan, ubsan, and now msan (memory sanitizer)).
I think due to the complexity of the codebase, and how tightly coupled some of the components are (Ref, Variant, Object, GDScript VM, etc), the godot project REALLY need to be using all of these tools to their full capacity.
Godot has an internally maintained scripting language, and a VM. Just due to the nature of an internal VM, godot is going to be dancing around memory safety in ways that's not trivial to guarantee memory safety or correctness.
I see CI does build and test a build with asan on linux, but I think godot CI should do more. I'd really love to see msan in there as well. I'll send another PR with an option to add msan to the existing project sanitizers on llvm (asan, tsan, and ubsan).
The text was updated successfully, but these errors were encountered:
I tried to use memory sanitizer around half year ago, but it produced ~2GB log when only opening project manager.
This probably can be partially fixed when #43636 will be closed.
Even with the most aggressive C/C++ flags, Rust finds more potentially incorrect code, non-initialized variables etc.
I'd like to see Rust's code in Godot, but it would take a lot of work and it would also cause a lot of trouble
I'm going to give msan another whirl on linux. The un-initialized member check wouldn't have helped in this case, as its much more nasty, but i'm curious if msan would have caught it.
for @vnen as discussed previously on IRC, and maybe even @reduz can review as well, as it somewhat related to his variadic template encoder commit
@vnen This gdscript code MAY crash if you try it in a test project. Create a Sprite2D and attach this simple script:
Note** That if you change
var a = get_texture().get_width()
tovar a = self.get_texture().get_width()
there will be no more crash. This is because the compiler will generate a call-ret opcode in the second case instead of the (hopefully faster) ptr-call-ret opcode.The root cause is the ptr call to get_texture().
get_texture()
is defined in Sprite2D asRef<Texture2D> get_texture() const;
which is pretty commmon.So far, I've bisected this issue to this commit here: d8b2209 and more notably to the introduction of the (hopefully faster) OPCODE_CALL_PTRCALL_OBJECT opcode.
The root cause is access to uninitialized memory in the gdscript_vm, notably for OPCODE_CALL_PTR opcodes. While this codepath works for every object type in Variant, including variants that wrap a raw Object*. It doesn't work for Ref<> variant objects.
The crash will be somewhat hidden and unlikely to happen because the memory read is within the unref() function of Reference.h. Which checks
reference
for null before de-referencing the pointer and executing code. Whether or not reference == null in this context is going to depend on what arbitrary memory is on the stack in gdscript_vm when processing OPCODE_CALL_PTRCALL_OBJECT. If it happens to be zeros, no code in unref() will execute and we won't have a bad access.Below is the call stack from gdscript "get_texture()" running through the gdscript_vm, then through @reduz variadic binders, eventually to call to Ref<> unref().
Note the PtrToArg encoder on the stack frame below. More notably the left had side of the expression casts the passed in 'const void *p_ptr', and dereferences it.
data:image/s3,"s3://crabby-images/f9d38/f9d38f5c67f353822ac9f8d47091d191cbfb90d2" alt="Screen Shot 2020-11-28 at 11 52 50 AM"
The de-reference in combination with the assignment execute the operator= on the dereferenced Ref<> which eventually gets to calling unref(). See top 3 lines in the first posted stack-trace (line numbers maybe off by a couple in my screenshot).
Now, where does this all come from? Well it all comes down the how the gdscript_vm handles that particular opcode on it's own stack. This relevant opcode execution in the gdscript_vm:
VariantInternal::initialize(ret, Variant::OBJECT) ends up clearing out anything in this Variant* due to eventually calling clear(). As per @reduz refactored variant constructors variants with OBJECT types need a de-init, which if the object is a reference it eventually deletes the in _clear_internal()
memdelete(reference);
So in summary:
gdscript_vm: processing OPCODE_CALL_PTRCALL_OBJECT:
GET_INSTRUCTION_ARG(ret, argc + 1);
(ret is uninitialized memory). The ptrcall in the vm eventually dereferences this address through the encoder (PtrToArg)So to fix this there's a bunch of different solutions ranging from a new ptr opcde for objects that are references, not just Object*. We could fix or modify the encoder if that's potentially possible. Rework the Ref<> assignment= operator. Initialize stack memory in the vm differently, etc.
But I wanna just bring up how difficult this bug is to identify in the first place. In fact, I was EASILY able to reproduce this bug by a slight modification.
unref()
in Ref<> is not thread safe. I can't actually use Ref<> in highly multi-threaded code. The Ref<> class isn't suited for highly multi-threaded environments. I simply put a lock in unref() by adding a mutex as class property in Ref<>. Well, haha, locking random, uninitialized memory as a mutex REALLY, REALLY doesn't work and will easily crash every time (as opposed to only crashing when comparing reference to null fails).Lastly I just REALLY want to suggest we add a memory sanitizer build to CI and a default godot project somehow... See: https://clang.llvm.org/docs/MemorySanitizer.html
Memory issues REALLY suck, and on IRC I already learned @clayjohn dealt with a uninitialized memory issue a while back that took a few hours off his life lol. reading unintialized memory, using afterfree'd, freeing memory that's not yours, all these things can lead to really hard-to-track down bugs later, because the corruption creation point and the corruption discovery point may be in VERY different places. This is basically the whole reason why a slew of c++ sanitizers are being developed (asan, tsan, ubsan, and now msan (memory sanitizer)).
I think due to the complexity of the codebase, and how tightly coupled some of the components are (Ref, Variant, Object, GDScript VM, etc), the godot project REALLY need to be using all of these tools to their full capacity.
Godot has an internally maintained scripting language, and a VM. Just due to the nature of an internal VM, godot is going to be dancing around memory safety in ways that's not trivial to guarantee memory safety or correctness.
I see CI does build and test a build with asan on linux, but I think godot CI should do more. I'd really love to see msan in there as well. I'll send another PR with an option to add msan to the existing project sanitizers on llvm (asan, tsan, and ubsan).
The text was updated successfully, but these errors were encountered: