Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: automatically move (some) referenced objects to heap #9873

Merged
merged 33 commits into from
Apr 25, 2021

Conversation

UweKrueger
Copy link
Member

@UweKrueger UweKrueger commented Apr 25, 2021

Creating references of stack objects is problematic. Example:

// Example 1
struct Qwe {
mut:
    a f64
}

fn f() &Qwe {
    q := Qwe{
        a: 12.5
    }
    return &q
}

fn g() f64 {
    c := 12.75
    return c
}

fn main() {
    x := f()
    assert x.a == 12.5 // pass
    y := g()
    assert x.a == 12.5 // fail
}

Here the first assertion passes and second assertion fails, because x points to a stack object that is overwritten by the call to g(). (The exact behaviour depends on the compiler and the optimization being used.)

There are several possible approaches to handle this problem:

  1. Put every object to the heap whose address leaves the function. This is what Go does. Unfortunately, this forces objects to the heap, that are only "borrowed" as in this example:
    // Example 2
    fn (st &St) pr() {
        println(st.val)
    }

    fn main() {
        x := St{val: 12}
        // x is forced to heap because its address leaves `main()`
        x.pr()
    }
  1. Generate an error message when the address of a (potentially?) stack allocated object is returned by a function. This seems simple at first glace but can become complicated. E.g. should pr() in Example 2 be allowed? Or, what happens if a reference is stored in a struct that is returned by value. Or if a function returns the address of a mut or & argument.
  2. Introduce a sophisticated ownership/borrow model to trace the object's path thru the program. This way promotion to heap can be avoided when the object is only borrowed to a called function/method. That is what Rust does. It works excellent but the language becomes, well, Rust.
  3. Assume an object is only borrowed when passed as reference to a function/method and move it to the heap only when referenced otherwise (outside unsafe). A called function assumes that a(n) argument/receiver passed as mut or & might be stack allocated (unless the object type has been declared as [heap]). Thus, the function is not allowed to use this reference (outside unsafe) for other purposes than accessing the object. This is the approach that is implemented by this PR.

With this (solution 4.) both assertions pass in Example 1 and x remains on stack in Example 2.

@@ -80,7 +80,7 @@ pub fn (mut ctx CancelContext) err() IError {

pub fn (ctx CancelContext) value(key string) ?voidptr {
if key == cancel_context_key {
return voidptr(&ctx)
return voidptr(unsafe { &ctx })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, that is very likely a bug 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, too. I've just added what was needed to make it compile. This case needs further investigation - at least the compiler now flags such cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulises-jeremias could you please have a look at this? Here the address of a (stack allocated) value parameter is returned. You can get an error message with the very latest version of V when you remove the unsafe wrapper (that I've added just to make it compile).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change that implementation asap to prevent bugs. For now the unsafe is good until I figure out a better way to solve that 👌🏻

Comment on lines 5864 to 5868
c.error('`$node.name` cannot be referenced since it might be on stack',
node.pos)
} else if c.table.get_type_symbol(obj.typ).kind == .array_fixed {
c.error('cannot reference fixed array `$node.name` as it might be on stack',
node.pos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many this might be a bit surprising in a "high-level" lang which claims to do memory management on it's own. Maybe we could ease the mental shift for them by extending the error messages with ... Wrap with unsafe{} if you're sure else use the [heap] attribute..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the address of something is not a high level operation imho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the error message a bit. I think it should be clear now.

@dumblob
Copy link
Contributor

dumblob commented Apr 25, 2021

Cool idea - thanks for putting thorough thoughts in it!

is_tmp bool // for tmp for loop vars, so that autofree can skip them
is_auto_heap bool // value whoes address goes out of scope
is_heap_ref bool // *known* to be pointer to heap memory (ptr to [heap] struct)
is_stack_obj bool // may be pointer to stack value (`mut` or `&` arg and not [heap] struct)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, can there be a situation where .is_heap_ref = true and .is_stack_obj = true at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but there can be a situation where both are false (the default).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an enum with 3 states will be a less error prone representation instead of 2 bools that can get out of sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I don't know. Then comparisons would be needed where now a simple meaningful bool is used. I think this would make the code a tiny little bit more clumsy. 🤔
And bool is just 1 byte whereas an enum is 4 bytes in size.

Copy link
Member Author

@UweKrueger UweKrueger Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm thinking about resurrecting #8802. There the flag is_auto_heap (maybe renamed to is_auto_ref) could also be used for mut function arguments / loop variables. So it's better to keep it separate.

Copy link
Member

@spytheman spytheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

@medvednikov medvednikov merged commit 3c0a368 into vlang:master Apr 25, 2021
@medvednikov
Copy link
Member

Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants