Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing structs with Option fields set to None produces non-optimal assembly #104290

Closed
dotdash opened this issue Nov 11, 2022 · 6 comments · Fixed by #135335
Closed

Initializing structs with Option fields set to None produces non-optimal assembly #104290

dotdash opened this issue Nov 11, 2022 · 6 comments · Fixed by #135335
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@dotdash
Copy link
Contributor

dotdash commented Nov 11, 2022

Given a struct S like this:

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

Initializing this struct with all fields set to None, gives non-optimal assembly code, as LLVM doesn't combine the stores into a memset, because the u8 fields within the Option fields are not set. Initializing all fields to Some(0) and then overwriting them with None gives optimal results.

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

pub fn bad() -> S {
    S {
        f1: None,
        f2: None,
        f3: None,
        f4: None,
        f5: None,
        f6: None,
        f7: None,
        f8: None,
    }
}

pub fn good() -> S {
    let mut s = S {
        f1: Some(0),
        f2: Some(0),
        f3: Some(0),
        f4: Some(0),
        f5: Some(0),
        f6: Some(0),
        f7: Some(0),
        f8: Some(0),
    };

    s.f1 = None;
    s.f2 = None;
    s.f3 = None;
    s.f4 = None;
    s.f5 = None;
    s.f6 = None;
    s.f7 = None;
    s.f8 = None;

    s
}

Gives

	.text
	.file	"undefmemset.9474daa7-cgu.0"
	.section	.text._ZN11undefmemset3bad17h0bf7686eb09f8ffcE,"ax",@progbits
	.globl	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE
	.p2align	4, 0x90
	.type	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE,@function
_ZN11undefmemset3bad17h0bf7686eb09f8ffcE:
	.cfi_startproc
	movq	%rdi, %rax
	movb	$0, (%rdi)
	movb	$0, 2(%rdi)
	movb	$0, 4(%rdi)
	movb	$0, 6(%rdi)
	movb	$0, 8(%rdi)
	movb	$0, 10(%rdi)
	movb	$0, 12(%rdi)
	movb	$0, 14(%rdi)
	retq
.Lfunc_end0:
	.size	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE, .Lfunc_end0-_ZN11undefmemset3bad17h0bf7686eb09f8ffcE
	.cfi_endproc

	.section	.text._ZN11undefmemset4good17hc418ddbabe9f4c4aE,"ax",@progbits
	.globl	_ZN11undefmemset4good17hc418ddbabe9f4c4aE
	.p2align	4, 0x90
	.type	_ZN11undefmemset4good17hc418ddbabe9f4c4aE,@function
_ZN11undefmemset4good17hc418ddbabe9f4c4aE:
	.cfi_startproc
	movq	%rdi, %rax
	xorps	%xmm0, %xmm0
	movups	%xmm0, (%rdi)
	retq
.Lfunc_end1:
	.size	_ZN11undefmemset4good17hc418ddbabe9f4c4aE, .Lfunc_end1-_ZN11undefmemset4good17hc418ddbabe9f4c4aE
	.cfi_endproc

	.section	".note.GNU-stack","",@progbits
@Rageking8
Copy link
Contributor

@rustbot label +A-codegen +I-slow

@rustbot rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 11, 2022
@TimNN
Copy link
Contributor

TimNN commented Nov 11, 2022

I investigated this issue a bit a few weeks ago but never got around to reporting it. It reproduces with clang as well (https://godbolt.org/z/8Yxnqrq6K):

#include <cstdint>

struct WithPadding {
    uint8_t a;
    uint16_t b;
};

void foo(WithPadding* p) {
    *p = {1, 2}; // Two `mov`s
}

void bar(WithPadding* p) {
    WithPadding t = {1, 2};
    *p = t; // One `mov`
}

AFAICT the issue is that early on in the optimization pipeline LLVM sees store undef to the "value" byte of the Option<u8> (or padding in the C++ example), but that gets optimized out at some point. Then, later, the MemCpyOptPass doesn't know that the "value" byte is undef / allowed to be written to, so generates multiple stores instead of just one.

Here is a smaller Rust repro (https://godbolt.org/z/xohrPs1rT):

pub fn with_hole(x: &mut [Option<u8>; 2]) {
    *x = [None, Some(2)]; // Two `mov`s
}

pub fn one_word(x: &mut [Option<u8>; 2]) {
    *x = [Some(1), Some(2)]; // One `mov`
}

@clubby789
Copy link
Contributor

clubby789 commented Dec 27, 2022

The two issues are

  • An early InstCombine pass eliminates all the store undef instructions before we reach MemCpyOpt
  • MemCpyOpt folds preceding undef stores into store 0, but not succeeding ones (https://reviews.llvm.org/D140697)

The second issue can be easily fixed, but inserting a MemCpyOpt early enough in the pipeline to fix this issue causes regressions in some other LLVM tests

@alex
Copy link
Member

alex commented Jan 3, 2025

In 1.83, these now generate the same code, with 8 repeated mov byte: https://rust.godbolt.org/z/3eoW5eoxM

@olivergillespie
Copy link

I'm not sure if this is the same or related - I saw unexpected poor performance in my application when initializing a large array of Option<u16>. It's still bad on 1.83.

pub fn init_plain_u16() -> [u16; LEN] {
    [0; LEN] // memset, fast
}

pub fn init_option_u16() -> [Option<u16>; LEN] {
    [None; LEN] // loop zeroing every other word, slow
}

I would hope that both cases compile to memset.

https://rust.godbolt.org/z/z7vn6hcv4

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 10, 2025
Treat undef bytes as equal to any other byte

Basically since `undef` can be any byte, it can also be the byte(s) that are in the non-undef parts of a value. So we can just treat the `undef` at not being there and only look at the initialized bytes and memset over them

fixes rust-lang#104290

based on rust-lang#135258
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 11, 2025
Treat undef bytes as equal to any other byte

Basically since `undef` can be any byte, it can also be the byte(s) that are in the non-undef parts of a value. So we can just treat the `undef` at not being there and only look at the initialized bytes and memset over them

fixes rust-lang#104290

based on rust-lang#135258
@RalfJung
Copy link
Member

This seems like something the backend should be able to do better at? Cc @nikic

#135335 is clever, but ultimately that PR reduces the amount of information we give to the backend, which could have other unintended effects.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 21, 2025
codegen: store ScalarPair via memset when one side is undef and the other side can be memset

Basically since `undef` can be any byte, it can also be the byte(s) that are in the non-undef parts of a value. So we can just treat the `undef` at not being there and only look at the initialized bytes and memset over them

fixes rust-lang#104290

based on rust-lang#135258
@bors bors closed this as completed in a7a6c64 Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
8 participants