Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sometimes there is an unwanted memcpy when passing large structs by-value #17580

Open
andrewrk opened this issue Oct 18, 2023 · 7 comments · Fixed by #18641
Open

sometimes there is an unwanted memcpy when passing large structs by-value #17580

andrewrk opened this issue Oct 18, 2023 · 7 comments · Fixed by #18641
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. optimization regression It worked in a previous version of Zig, but stopped working.
Milestone

Comments

@andrewrk
Copy link
Member

Zig Version: 0.12.0-dev.1103+7a9500fd80

This reduced code snippet results in an unwanted memcpy when passing a large struct by value:

export fn entry() bool {
    var foo: Foo = undefined;
    return foo.method(100);
}

const Foo = struct {
    array: [100]bool,

    fn method(foo: Foo, index: usize) bool {
        return foo.array[noop(index)];
    }

    fn noop(index: usize) usize {
        return index;
    }
};
$ zig build-obj test.zig -OReleaseFast --verbose-llvm-ir  -fstrip
; Function Attrs: nounwind uwtable
define internal fastcc i1 @test.Foo.method(ptr nonnull readonly align 1 %0, i64 %1) unnamed_addr #0 {
  %3 = alloca [100 x i1], align 1
  %4 = getelementptr inbounds %test.Foo, ptr %0, i32 0, i32 0
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %3, ptr align 1 %4, i64 100, i1 false)
  %5 = call fastcc i64 @test.Foo.noop(i64 %1)
  %6 = getelementptr inbounds [100 x i1], ptr %3, i64 0, i64 %5
  %7 = load i1, ptr %6, align 1
  ret i1 %7
}

The alloca and memcpy are undesirable. With this change, it goes away:

-        return foo.array[noop(index)];
+        return foo.array[index];

Temporary workaround is to pass by const pointer instead. However, we don't want Zig users to get used to doing this, or they will never kick the habit even when this issue is fixed, which is why I am making this a high priority issue, and denying any more requests to do the workaround in the standard library.

After fixing this, revert 90a877f.

Related:

@andrewrk andrewrk added optimization backend-llvm The LLVM backend outputs an LLVM IR Module. regression It worked in a previous version of Zig, but stopped working. labels Oct 18, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Oct 18, 2023
@IntegratedQuantum
Copy link
Contributor

This code looks pretty similar to #15685, might be the same issue. Glad to see that it gets priority.

Temporary workaround is to pass by const pointer instead.

I tried changing to const pointer in your code, but that didn't appear to change the llvm-ir at all.
To avoid the memcpy I had to additionally apply the workaround from #15685:

fn method(foo: *const Foo, index: usize) bool {
    return (&foo.array)[noop(index)];
}

So I think this is also related to array access inside a struct and not purely the passing by value.

andrewrk added a commit that referenced this issue Jan 24, 2024
…ers"

This reverts commit 2ab7893.

Premature merge - apologies for the disruption.

Reopens #15685
Reopens #17580
@andrewrk
Copy link
Member Author

Fix reverted in 9d5a133.

@andrewrk andrewrk reopened this Jan 24, 2024
silver-signal pushed a commit to silver-signal/zig that referenced this issue Jan 25, 2024
…ers"

This reverts commit 2ab7893.

Premature merge - apologies for the disruption.

Reopens ziglang#15685
Reopens ziglang#17580
bilaliscarioth pushed a commit to bilaliscarioth/zig that referenced this issue Jan 27, 2024
…ers"

This reverts commit 2ab7893.

Premature merge - apologies for the disruption.

Reopens ziglang#15685
Reopens ziglang#17580
@pkova
Copy link
Contributor

pkova commented Feb 12, 2024

The example above can be futher reduced down to

var array: [100]bool = undefined;

fn noop(index: usize) usize {
    return index;
}

export fn method(index: usize) bool {
    return array[noop(index)];
}

which exhibits the problem, and

var array: [100]bool = undefined;

fn noop(index: usize) usize {
    return index;
}

export fn method(index: usize) bool {
    const hello = noop(index);
    return array[hello];
}

that does not.

The AIR emitted by the first example has the load before that the call, which seems clearly wrong to me. This would indicate that the root cause is not in Liveness at all, but at some point earlier in the compilation process.

@pkova
Copy link
Contributor

pkova commented Feb 18, 2024

Okay these are my findings from investigating this way too much. I was wrong in my post above.

Consider the following code snippet:

var array: [100]bool = undefined;

fn noop(index: usize) usize {
    return index;
}

export fn method(index: usize) bool {
    return array[noop(index)];
}

This corresponds to the following AIR

~/Desktop/zig/build/stage4/bin/zig build-obj test.zig -OReleaseFast --verbose-air -fstrip
# Begin Function AIR: test.method:
# Total AIR+Liveness bytes: 194B
# AIR Instructions:         6 (54B)
# AIR Extra Data:           9 (36B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(usize, 0)
  %2 = load([100]bool, <*[100]bool, (variable)>)
  %3 = call(<fn (usize) usize, (function 'noop')>, [%0!])
  %4 = array_elem_val(%2!, %3!)
  %5!= ret(%4!)
# End Function AIR: test.method

# Begin Function AIR: test.noop:
# Total AIR+Liveness bytes: 147B
# AIR Instructions:         3 (27B)
# AIR Extra Data:           4 (16B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(usize, 0)
  %2!= ret(%0!)
# End Function AIR: test.noop

llvm.zig includes the function canElideLoad which uses categorizeOperand from Liveness.zig to determine whether a LLVM load can be optimized out. In this scenario the load we are trying to remove is %2.

canElideLoad iterates forward in the current block and invokes categorizeOperand for each instruction it finds. When categorizeOperand encounters a call instruction it analyzes the call function parameters to see whether they die. If they do the load can be emitted because function parameters in zig are immutable. The array above is not a function parameter, though!

There's kind of an implicit pass-by-value going on here. If I understand the semantics of zig correctly the behavior we want is that the array inside the function method should not mutate even if the function noop mutates it. This is why canElideLoad returns false and we have to @memcpy. The LLVM load and call are exactly right.

The only way I see to elide the load here would be for categorizeOperand to recursively analyze the function body itself when it encounters a call. As far as I can tell this would be quite a sweeping compiler change since currently AIR is mostly function scoped.

@mlugg
Copy link
Member

mlugg commented Feb 18, 2024

Good sleuthing - you've reached essentially the same conclusion I came to when I briefly looked at this. In essence, the logic is operating completely correctly by the current widely understood semantics. The way we solve this issue likely has wide-reaching implications for the language design surrounding aliasing, PRO, etc.

@andrewrk - since this issue is a high priority, it might be worth having a one-time compiler / language design meeting to discuss how to solve it? Whatever we decide will have spec implications.

@andrewrk andrewrk modified the milestones: 0.12.0, 0.13.0 Mar 20, 2024
@mlugg mlugg moved this to Optimize Machine Code in Performance Aug 22, 2024
@emneo-dev
Copy link

emneo-dev commented Nov 1, 2024

I think I just stepped into that problem while trying to optimize code by using std.bit_set.
All the calls to isSet resulted in unwanted memcpys making the code extremely slow (About 6x slower than just using a simple u8 slice).

I can try to give a reproducible example if needed, but currently std.bit_set is basically unusable right now for anything that needs performance :/ (which is pretty common)

PS: The bit_set was 100bytes in size

@thuiop
Copy link

thuiop commented Jan 11, 2025

I think I just stepped into that problem while trying to optimize code by using std.bit_set. All the calls to isSet resulted in unwanted memcpys making the code extremely slow (About 6x slower than just using a simple u8 slice).

I can try to give a reproducible example if needed, but currently std.bit_set is basically unusable right now for anything that needs performance :/ (which is pretty common)

PS: The bit_set was 100bytes in size

I encountered the exact same problem here, which was infortunate because I wanted to try using the bitset for performance reasons and it was very puzzling where the memcpy came from.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. optimization regression It worked in a previous version of Zig, but stopped working.
Projects
Status: Optimize Machine Code
Development

Successfully merging a pull request may close this issue.

6 participants