Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

dotdash · 2022-11-11T15:29:45Z

Given a struct S like this:

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

Initializing this struct with all fields set to None, gives non-optimal assembly code, as LLVM doesn't combine the stores into a memset, because the u8 fields within the Option fields are not set. Initializing all fields to Some(0) and then overwriting them with None gives optimal results.

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

pub fn bad() -> S {
    S {
        f1: None,
        f2: None,
        f3: None,
        f4: None,
        f5: None,
        f6: None,
        f7: None,
        f8: None,
    }
}

pub fn good() -> S {
    let mut s = S {
        f1: Some(0),
        f2: Some(0),
        f3: Some(0),
        f4: Some(0),
        f5: Some(0),
        f6: Some(0),
        f7: Some(0),
        f8: Some(0),
    };

    s.f1 = None;
    s.f2 = None;
    s.f3 = None;
    s.f4 = None;
    s.f5 = None;
    s.f6 = None;
    s.f7 = None;
    s.f8 = None;

    s
}

Gives

	.text
	.file	"undefmemset.9474daa7-cgu.0"
	.section	.text._ZN11undefmemset3bad17h0bf7686eb09f8ffcE,"ax",@progbits
	.globl	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE
	.p2align	4, 0x90
	.type	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE,@function
_ZN11undefmemset3bad17h0bf7686eb09f8ffcE:
	.cfi_startproc
	movq	%rdi, %rax
	movb	$0, (%rdi)
	movb	$0, 2(%rdi)
	movb	$0, 4(%rdi)
	movb	$0, 6(%rdi)
	movb	$0, 8(%rdi)
	movb	$0, 10(%rdi)
	movb	$0, 12(%rdi)
	movb	$0, 14(%rdi)
	retq
.Lfunc_end0:
	.size	_ZN11undefmemset3bad17h0bf7686eb09f8ffcE, .Lfunc_end0-_ZN11undefmemset3bad17h0bf7686eb09f8ffcE
	.cfi_endproc

	.section	.text._ZN11undefmemset4good17hc418ddbabe9f4c4aE,"ax",@progbits
	.globl	_ZN11undefmemset4good17hc418ddbabe9f4c4aE
	.p2align	4, 0x90
	.type	_ZN11undefmemset4good17hc418ddbabe9f4c4aE,@function
_ZN11undefmemset4good17hc418ddbabe9f4c4aE:
	.cfi_startproc
	movq	%rdi, %rax
	xorps	%xmm0, %xmm0
	movups	%xmm0, (%rdi)
	retq
.Lfunc_end1:
	.size	_ZN11undefmemset4good17hc418ddbabe9f4c4aE, .Lfunc_end1-_ZN11undefmemset4good17hc418ddbabe9f4c4aE
	.cfi_endproc

	.section	".note.GNU-stack","",@progbits

The text was updated successfully, but these errors were encountered:

Rageking8 · 2022-11-11T15:45:04Z

@rustbot label +A-codegen +I-slow

TimNN · 2022-11-11T15:46:10Z

I investigated this issue a bit a few weeks ago but never got around to reporting it. It reproduces with clang as well (https://godbolt.org/z/8Yxnqrq6K):

#include <cstdint>

struct WithPadding {
    uint8_t a;
    uint16_t b;
};

void foo(WithPadding* p) {
    *p = {1, 2}; // Two `mov`s
}

void bar(WithPadding* p) {
    WithPadding t = {1, 2};
    *p = t; // One `mov`
}

AFAICT the issue is that early on in the optimization pipeline LLVM sees store undef to the "value" byte of the Option<u8> (or padding in the C++ example), but that gets optimized out at some point. Then, later, the MemCpyOptPass doesn't know that the "value" byte is undef / allowed to be written to, so generates multiple stores instead of just one.

Here is a smaller Rust repro (https://godbolt.org/z/xohrPs1rT):

pub fn with_hole(x: &mut [Option<u8>; 2]) {
    *x = [None, Some(2)]; // Two `mov`s
}

pub fn one_word(x: &mut [Option<u8>; 2]) {
    *x = [Some(1), Some(2)]; // One `mov`
}

clubby789 · 2022-12-27T16:29:20Z

The two issues are

An early InstCombine pass eliminates all the store undef instructions before we reach MemCpyOpt
MemCpyOpt folds preceding undef stores into store 0, but not succeeding ones (https://reviews.llvm.org/D140697)

The second issue can be easily fixed, but inserting a MemCpyOpt early enough in the pipeline to fix this issue causes regressions in some other LLVM tests

alex · 2025-01-03T16:19:24Z

In 1.83, these now generate the same code, with 8 repeated mov byte: https://rust.godbolt.org/z/3eoW5eoxM

olivergillespie · 2025-01-08T12:22:39Z

I'm not sure if this is the same or related - I saw unexpected poor performance in my application when initializing a large array of Option<u16>. It's still bad on 1.83.

pub fn init_plain_u16() -> [u16; LEN] {
    [0; LEN] // memset, fast
}

pub fn init_option_u16() -> [Option<u16>; LEN] {
    [None; LEN] // loop zeroing every other word, slow
}

I would hope that both cases compile to memset.

https://rust.godbolt.org/z/z7vn6hcv4

Treat undef bytes as equal to any other byte Basically since `undef` can be any byte, it can also be the byte(s) that are in the non-undef parts of a value. So we can just treat the `undef` at not being there and only look at the initialized bytes and memset over them fixes rust-lang#104290 based on rust-lang#135258

RalfJung · 2025-01-11T09:14:24Z

This seems like something the backend should be able to do better at? Cc @nikic

#135335 is clever, but ultimately that PR reduces the amount of information we give to the backend, which could have other unintended effects.

codegen: store ScalarPair via memset when one side is undef and the other side can be memset Basically since `undef` can be any byte, it can also be the byte(s) that are in the non-undef parts of a value. So we can just treat the `undef` at not being there and only look at the initialized bytes and memset over them fixes rust-lang#104290 based on rust-lang#135258

rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 11, 2022

oli-obk mentioned this issue Jan 10, 2025

codegen: store ScalarPair via memset when one side is undef and the other side can be memset #135335

Merged

bors closed this as completed in a7a6c64 Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

dotdash commented Nov 11, 2022

Rageking8 commented Nov 11, 2022

TimNN commented Nov 11, 2022

clubby789 commented Dec 27, 2022 •

edited

Loading

alex commented Jan 3, 2025

olivergillespie commented Jan 8, 2025

RalfJung commented Jan 11, 2025

Initializing structs with Option fields set to None produces non-optimal assembly #104290

Initializing structs with Option fields set to None produces non-optimal assembly #104290

Comments

dotdash commented Nov 11, 2022

Rageking8 commented Nov 11, 2022

TimNN commented Nov 11, 2022

clubby789 commented Dec 27, 2022 • edited Loading

alex commented Jan 3, 2025

olivergillespie commented Jan 8, 2025

RalfJung commented Jan 11, 2025

Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

clubby789 commented Dec 27, 2022 •

edited

Loading