Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable assert_instr on wasm32 vector splat operations #569

Closed
wants to merge 2 commits into from

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Sep 13, 2018

Note: codegen for this is already working in LLVM upstream. This PR currently fails on my machine. It should start passing once we update LLVM upstream.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 13, 2018

cc @alexcrichton for some reason this passes, I think the simd128 tests are not run on ci :(

on my machine this fails

CARGO_TARGET_WASM32_UNKNOWN_UNKNOWN_RUNNER=wasm-bindgen-test-runner CARGO_TARGET_WASM32_UNKNOWN_UNKNOWN_LINKER=../../ci/lld-shim cargo test --release --features=wasm_simd128 --target=wasm32-unknown-unknown

with

LLVM ERROR: Cannot select: 0x123020410: ch = store<(store 16 into %ir.0)> 0x122b17ad8, 0x123020270, 0x123020068, undef:i32
  0x123020270: v16i8 = BUILD_VECTOR 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138, 0x123020138
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
    0x123020138: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
      0x1230200d0: i32 = TargetConstant<1>
  0x123020068: i32 = WebAssemblyISD::ARGUMENT TargetConstant:i32<0>
    0x123020000: i32 = TargetConstant<0>
  0x1230204e0: i32 = undef
In function: _ZN8coresimd8coresimd6wasm327simd1285i8x165splat17ha890e58e2edc746bE

which is the error I expect this to produce on CI.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 13, 2018

So I can reproduce, on my machine, the tests fail to build if I execute the command in crates/coresimd, but if I execute it in the root directory by adding -p coresimd -p stdsimd like ci scripts do, then the feature is not properly enabled.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 13, 2018

And indeed cargo -vv shows that the --features=wasm_simd128 is not passed to the crates being compiled when executing cargo test in the root directory using -p coresimd...

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 13, 2018

This is a cargo bug: rust-lang/cargo#5364

@alexcrichton
Copy link
Member

Ah yeah I tried this awhile back but LLVM didn't have enough support yet unfortunately. We may need to wait a bit longer :(

(or maybe update LLVM again?)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 13, 2018

or maybe update LLVM again?)

This should already work with LLVM trunk since the LLVM bug for this has been closed with a merged patch. The next time LLVM is updated this should just work.

@alexcrichton
Copy link
Member

Ah unfortunately an LLVM update won't do it at the moment :(

Updating LLVM in rust-lang/rust yields:

unrecognized register class
UNREACHABLE executed at /home/alex/code/rust4/src/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp:162!                                        

so a different error! Just still an error :(

@tlively
Copy link

tlively commented Sep 14, 2018

@alexcrichton That bug is good to know about. The fix is very simple, but there are a bunch of similar bugs elsewhere, so I'll try to land patches fixing all of them tomorrow.

@gnzlbg FYI, you can see all of the SIMD instructions and intrinsics that have lowering implemented in LLVM tip of tree by looking at llvm/test/CodeGen/WebAssembly/simd*.ll.

https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/WebAssembly/simd.ll
https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/WebAssembly/simd-arith.ll
https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/WebAssembly/simd-comparisons.ll

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 14, 2018

so I'll try to land patches fixing all of them tomorrow

Tomorrow is saturday, please enjoy your weekend, this can wait till monday :P

@tlively
Copy link

tlively commented Sep 14, 2018

It was still Thursday in my timezone when I posted that!

@alexcrichton
Copy link
Member

FWIW I was toying around with this last night and was able to get past LLVM (yay!). Not to say I got everything working though! There were a few follow-up problems:

I wasn't able to emit everything and get LLVM to not error, after I enabled the splat operations and got a successful compile I then enabled extract_lane intrinsics and got an LLVM error :(

In any case we can't (and don't really need to?) make much progress here until Node or a browser implements the current version of the SIMD spec.

Wanted to at leat provide an update though!

@tlively would it be useful to get reduced test cases for LLVM errors? Or is it still too early for that?

@tlively
Copy link

tlively commented Oct 4, 2018

Yes, test cases would be very useful and welcome!

The latest one-stop shop to see what instructions are implemented is https://github.com/llvm-mirror/llvm/blob/master/test/MC/WebAssembly/simd-encodings.s.

There is also another opcode reorganization in the works here: WebAssembly/simd#42. Once that is settled, LLVM and V8 will finally agree on simd opcodes.

@alexcrichton
Copy link
Member

Ok sure! I've collected some files at https://gist.github.com/alexcrichton/f727348a2d547248b93ec0b8684996af which sould all trigger errors with llc -O2 foo.ll -filetype=obj -o /dev/null.

Now we may also have bad codegen in Rust, if we need to codegen various intrinsics differently just let me know! Also they're all reduced with bugpoint so they may look a little odd, I swear we don't normally deal with that much undef

@tlively
Copy link

tlively commented Oct 5, 2018

These are great, thank you. One codegen issue I saw was that for any_true and all_true you should use the new @llvm.wasm.anytrue and @llvm.wasm.alltrue intrinsic functions. The @llvm.experimental.vector.reduce.{or,and} intrinsics do not have quite the same semantics as the corresponding WebAssembly instructions.

@alexcrichton
Copy link
Member

Ok cool thanks for the tip! When using those I get, though:

While deleting: i32 (<16 x i8>)* %llvm.wasm.alltrue.v8i16
Use still stuck around after Def is destroyed:i32 (<8 x i16>)* bitcast (i32 (<16 x i8>)* @llvm.wasm.alltrue.v8i16 to i32 (<8 x i16>)*)
rustc: /home/alex/code/rust2/src/llvm/lib/IR/Value.cpp:92: llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.

and unfortunately rustc doesn't even get as far as emitting IR so I can't extract a test case :(

@tlively
Copy link

tlively commented Oct 5, 2018

Yikes! That will require some deeper poking around. Would you be able to get an LLVM stack trace for that by any chance?

@alexcrichton
Copy link
Member

Sure yeah, using gdb I get this stack trace

@tlively
Copy link

tlively commented Oct 5, 2018

Another thing I notice is that Rust's use of <1 x i128> is causing the emitted code to store the vector in memory to return it, rather than returning it directly. The best solution will be for me to add support for that type to the LLVM backend.

@alexcrichton
Copy link
Member

Ah that may be because our ABI definition for wasm is super bare bones compare to, for example x86_64 which is significantly more complicated. In that sense we may be pessimizing to storing out pointers as opposed to realizing it can be passed by value.

In any case though we'll want both strategies working in the long run!

@tlively
Copy link

tlively commented Oct 5, 2018

No, this is happening in LLVM. For example, if you look at bug15.ll, you'll see that there are no stores anywhere. But after I fix the codegen issue, LLVM lowers it to take an outparam pointer for the vector and return via a store.

@alexcrichton
Copy link
Member

Ok we're getting really close, thanks so much for all your awesome help here @tlively, the backend is progressing at a lightning pace! Remaining work items I've found are:

And... that may actually be it! I enabled basically all the other operations successfully and hand inspection of the wasm looks like it's all being generated as expected. I can start pushing work to this PR (or a separate PR, either way's fine) once we start getting closer to this all working.

@tlively
Copy link

tlively commented Oct 13, 2018

@alexcrichton I would add one more work item: There is no way to get min/max instructions from the IR yet, but a fix has been accepted and will land very soon.

I also want to make sure you're aware of and using all the wasm-specific LLVM intrinsics we have, including:

  • @wasm.add.saturate.signed.v16i8(<16 x i8>, <16 x i8>), etc.
  • @wasm.add.saturate.unsigned.v16i8(<16 x i8>, <16 x i8>), etc.
  • @wasm.sub.saturate.signed.v16i8(<16 x i8>, <16 x i8>), etc.
  • @wasm.sub.saturate.unsigned.v16i8(<16 x i8>, <16 x i8>), etc.
  • @wasm.bitselect.v16i8(<16 x i8>, <16 x i8>, <16 x i8>), etc. (control mask is the first argument)

And I know you're already using @wasm.anytrue.* and @wasm.alltrue.*.

Edit: Another work item is that instruction selection crashes when lane indices are `undef, although this may not be affecting Rust's codegen. A fix is out for review.

@alexcrichton
Copy link
Member

Oh heh good point! I wasn't able to test those because our implementation uses gt and lt under the hood which didn't work for other reasons. I'm also remembering now that I haven't tested the bitselect instruction yet. I suspect we're missing a small few here and there, but hopefully with the fixes above we've got the vast majority working and we'll be able to test them as well!

I don't think we've bound the saturating arithmetic ones just yet, but I'll be sure to hook into those intrinsics for that and bitselect. The current code doesn't use the intrinsic for bitselect, but it won't be hard to use it!

@alexcrichton
Copy link
Member

Oh also FWIW the undef in extractlane indices was probably related to bugpoint being a bit too aggressive about inserting undef everywhere, I agree that it probably shouldn't come up in our codegen naturally or through optimizations otherwise.

@alexcrichton
Copy link
Member

Ok I'm gonna try to keep making progress on this at #620, so closing in favor of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants