Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

cargo build --profile production fails with linker error when ran on the substrate repo #13376

Closed
athei opened this issue Feb 13, 2023 · 8 comments
Assignees

Comments

@athei
Copy link
Member

athei commented Feb 13, 2023

The production profile is used for benchmarking. I get the following error when linking node-template:

= note: Undefined symbols for architecture arm64:
            "wasmtime_runtime::libcalls::trampolines::impl_table_fill_funcref::heae340996f2a67ea", referenced from:
                _table_fill_funcref in node_template-b694b63f0433be47.node_template.30339334-cgu.0.rcgu.o
            "wasmtime_runtime::libcalls::trampolines::impl_table_grow_funcref::hd4e8b458d7fde540", referenced from:
                _table_grow_funcref in node_template-b694b63f0433be47.node_template.30339334-cgu.0.rcgu.o
          ld: symbol(s) not found for architecture arm64
          clang: error: linker command failed with exit code 1 (use -v to see invocation)

There are more crates who throw this and similar errors. They are all related to wasmtime. Apparently it only happens when lto is enabled (which is only the case with the production profile). The error was introduced with #13160. The commit before this PR works fine but the commit that includes this PR does not.

Please note that this does not only happen for me locally but also when trying to bench with the benchmarking machine.

I suggest rolling back this PR until we resolved this issue.

cc @koute @alvicsam @ggwpez

@athei athei changed the title cargo build --profile production ends with linker error when ran on the substrate repo cargo build --profile production fails with linker error when ran on the substrate repo Feb 13, 2023
@koute
Copy link
Contributor

koute commented Feb 13, 2023

I'll take a look at this.

@koute koute self-assigned this Feb 13, 2023
@koute
Copy link
Contributor

koute commented Feb 13, 2023

The problem reproduces for me.

From a cursory look it looks like it might be a bug in rustc or LLVM. It can be trivially worked around inside of wasmtime though.

What's happening is that wasmtime defines a bunch of trampolines which are called from within the WASM into native code, and those trampolines go through a thin inline global assembler shim from which they are called. And the LTO strips those symbols out for some reason even though that's not supposed to happen AFAIK. The workaround for this is to just mark those as #[no_mangle] inside of wasmtime, after which LTO doesn't strip them out anymore. (I've checked.)

I'll report this to wasmtime.

@koute
Copy link
Contributor

koute commented Feb 13, 2023

wasmtime issue: bytecodealliance/wasmtime#5768

@athei
Copy link
Member Author

athei commented Feb 13, 2023

Thanks a lot. Just out of curiosity: Shouldn't you always use #[no_mangle] when two different binaries interact and you don't need the mangling? To me it seems like a useless risk to mangle them (even if both sides use the same compiler).

@koute
Copy link
Contributor

koute commented Feb 14, 2023

Shouldn't you always use #[no_mangle] when two different binaries interact and you don't need the mangling? To me it seems like a useless risk to mangle them (even if both sides use the same compiler).

I'm not sure I completely understand the question, but in general it's the other way around: you essentially always want to mangle your symbols unless you're exposing and/or interacting with a stable API/ABI boundary.

In this case not mangling the symbols makes sense because those are just internal symbols (there are no two binaries here!) so they don't actually need to be externally exposed nor stable. They can be anything as long as they're unique. Without more digging I don't know exactly why this problem triggers, but it's most likely not due to mangling/not mangling itself. In Rust #[no_mangle] also has a secondary effect which affects symbol visibility, and that's probably why adding a #[no_mangle] "fixes" the issue. (Or in other words, whether the symbol is mangled or not is most likely just a red herring, and what actually prevents LTO from stripping the symbol out is #[no_mangle]'s secondary effect of making the symbol public.)

@athei
Copy link
Member Author

athei commented Feb 14, 2023

I thought that symbols were called from asm. For me this sounded like you want a stable ABI as this is handwritten code.

@koute
Copy link
Contributor

koute commented Feb 14, 2023

I thought that symbols were called from asm. For me this sounded like you want a stable ABI as this is handwritten code.

Yes, but that's a global_asm! block (that is, a global inline assembly block in Rust), and not a separately compiled .s file.

@koute
Copy link
Contributor

koute commented Feb 16, 2023

Closing, since this is now technically fixed. (Although we're not using an official wasmtime release with the fix this should change next week once 6.0.0 is released.)

@koute koute closed this as completed Feb 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants