VectorUtils: mark lrint, llrint as trivially vectorizable #69945

artagnon · 2023-10-23T16:47:19Z

With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.

llvmbot · 2023-10-23T16:48:27Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Ramkumar Ramachandra (artagnon)

Changes

With the recent change 98c90a (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer to vectorize llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, to demonstrate that SLPVectorizer indeed vectorizes the relevant tests under the RISC-V target.

Fixes #55208.

-- 8< --
Based on #69940. Please review only the second patch.

Full diff: https://github.com/llvm/llvm-project/pull/69945.diff

2 Files Affected:

(modified) llvm/lib/Analysis/VectorUtils.cpp (+4)
(modified) llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll (+152-13)

diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index 9893e23468e177d..628c82e611faea1 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -92,6 +92,8 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
   case Intrinsic::canonicalize:
   case Intrinsic::fptosi_sat:
   case Intrinsic::fptoui_sat:
+  case Intrinsic::lrint:
+  case Intrinsic::llrint:
     return true;
   default:
     return false;
@@ -123,6 +125,8 @@ bool llvm::isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
   switch (ID) {
   case Intrinsic::fptosi_sat:
   case Intrinsic::fptoui_sat:
+  case Intrinsic::lrint:
+  case Intrinsic::llrint:
     return OpdIdx == -1 || OpdIdx == 0;
   case Intrinsic::is_fpclass:
     return OpdIdx == 0;
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
index 9206f529cbfd368..30d6f877ce177a3 100644
--- a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
@@ -29,24 +29,32 @@ entry:
   ret <4 x float> %vecins.3
 }
 
+define <2 x i64> @lrint_v2i64f32(ptr %a) {
+; CHECK-LABEL: define <2 x i64> @lrint_v2i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x i64> @llvm.lrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x i64> [[TMP1]]
+;
+entry:
+  %0 = load <2 x float>, ptr %a
+  %vecext = extractelement <2 x float> %0, i64 0
+  %1 = call i64 @llvm.lrint.i64.f32(float %vecext)
+  %vecins = insertelement <2 x i64> undef, i64 %1, i64 0
+  %vecext.1 = extractelement <2 x float> %0, i64 1
+  %2 = call i64 @llvm.lrint.i64.f32(float %vecext.1)
+  %vecins.1 = insertelement <2 x i64> %vecins, i64 %2, i64 1
+  ret <2 x i64> %vecins.1
+}
+
 define <4 x i64> @lrint_v4i64f32(ptr %a) {
 ; CHECK-LABEL: define <4 x i64> @lrint_v4i64f32(
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT:    ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT:    ret <4 x i64> [[TMP1]]
 ;
 entry:
   %0 = load <4 x float>, ptr %a
@@ -65,5 +73,136 @@ entry:
   ret <4 x i64> %vecins.3
 }
 
+define <8 x i64> @lrint_v8i64f32(ptr %a) {
+; CHECK-LABEL: define <8 x i64> @lrint_v8i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:    ret <8 x i64> [[VECINS_71]]
+;
+entry:
+  %0 = load <8 x float>, ptr %a
+  %vecext = extractelement <8 x float> %0, i64 0
+  %1 = call i64 @llvm.lrint.i64.f32(float %vecext)
+  %vecins = insertelement <8 x i64> undef, i64 %1, i64 0
+  %vecext.1 = extractelement <8 x float> %0, i64 1
+  %2 = call i64 @llvm.lrint.i64.f32(float %vecext.1)
+  %vecins.1 = insertelement <8 x i64> %vecins, i64 %2, i64 1
+  %vecext.2 = extractelement <8 x float> %0, i64 2
+  %3 = call i64 @llvm.lrint.i64.f32(float %vecext.2)
+  %vecins.2 = insertelement <8 x i64> %vecins.1, i64 %3, i64 2
+  %vecext.3 = extractelement <8 x float> %0, i64 3
+  %4 = call i64 @llvm.lrint.i64.f32(float %vecext.3)
+  %vecins.3 = insertelement <8 x i64> %vecins.2, i64 %4, i64 3
+  %vecext.4 = extractelement <8 x float> %0, i64 4
+  %5 = call i64 @llvm.lrint.i64.f32(float %vecext.4)
+  %vecins.4 = insertelement <8 x i64> %vecins.3, i64 %5, i64 4
+  %vecext.5 = extractelement <8 x float> %0, i64 5
+  %6 = call i64 @llvm.lrint.i64.f32(float %vecext.5)
+  %vecins.5 = insertelement <8 x i64> %vecins.4, i64 %6, i64 5
+  %vecext.6 = extractelement <8 x float> %0, i64 6
+  %7 = call i64 @llvm.lrint.i64.f32(float %vecext.6)
+  %vecins.6 = insertelement <8 x i64> %vecins.5, i64 %7, i64 6
+  %vecext.7 = extractelement <8 x float> %0, i64 7
+  %8 = call i64 @llvm.lrint.i64.f32(float %vecext.7)
+  %vecins.7 = insertelement <8 x i64> %vecins.6, i64 %8, i64 7
+  ret <8 x i64> %vecins.7
+}
+
+define <2 x i64> @llrint_v2i64f32(ptr %a) {
+; CHECK-LABEL: define <2 x i64> @llrint_v2i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x i64> @llvm.llrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x i64> [[TMP1]]
+;
+entry:
+  %0 = load <2 x float>, ptr %a
+  %vecext = extractelement <2 x float> %0, i64 0
+  %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+  %vecins = insertelement <2 x i64> undef, i64 %1, i64 0
+  %vecext.1 = extractelement <2 x float> %0, i64 1
+  %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+  %vecins.1 = insertelement <2 x i64> %vecins, i64 %2, i64 1
+  ret <2 x i64> %vecins.1
+}
+
+define <4 x i64> @llrint_v4i64f32(ptr %a) {
+; CHECK-LABEL: define <4 x i64> @llrint_v4i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT:    ret <4 x i64> [[TMP1]]
+;
+entry:
+  %0 = load <4 x float>, ptr %a
+  %vecext = extractelement <4 x float> %0, i64 0
+  %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+  %vecins = insertelement <4 x i64> undef, i64 %1, i64 0
+  %vecext.1 = extractelement <4 x float> %0, i64 1
+  %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+  %vecins.1 = insertelement <4 x i64> %vecins, i64 %2, i64 1
+  %vecext.2 = extractelement <4 x float> %0, i64 2
+  %3 = call i64 @llvm.llrint.i64.f32(float %vecext.2)
+  %vecins.2 = insertelement <4 x i64> %vecins.1, i64 %3, i64 2
+  %vecext.3 = extractelement <4 x float> %0, i64 3
+  %4 = call i64 @llvm.llrint.i64.f32(float %vecext.3)
+  %vecins.3 = insertelement <4 x i64> %vecins.2, i64 %4, i64 3
+  ret <4 x i64> %vecins.3
+}
+
+define <8 x i64> @llrint_v8i64f32(ptr %a) {
+; CHECK-LABEL: define <8 x i64> @llrint_v8i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:    ret <8 x i64> [[VECINS_71]]
+;
+entry:
+  %0 = load <8 x float>, ptr %a
+  %vecext = extractelement <8 x float> %0, i64 0
+  %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+  %vecins = insertelement <8 x i64> undef, i64 %1, i64 0
+  %vecext.1 = extractelement <8 x float> %0, i64 1
+  %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+  %vecins.1 = insertelement <8 x i64> %vecins, i64 %2, i64 1
+  %vecext.2 = extractelement <8 x float> %0, i64 2
+  %3 = call i64 @llvm.llrint.i64.f32(float %vecext.2)
+  %vecins.2 = insertelement <8 x i64> %vecins.1, i64 %3, i64 2
+  %vecext.3 = extractelement <8 x float> %0, i64 3
+  %4 = call i64 @llvm.llrint.i64.f32(float %vecext.3)
+  %vecins.3 = insertelement <8 x i64> %vecins.2, i64 %4, i64 3
+  %vecext.4 = extractelement <8 x float> %0, i64 4
+  %5 = call i64 @llvm.llrint.i64.f32(float %vecext.4)
+  %vecins.4 = insertelement <8 x i64> %vecins.3, i64 %5, i64 4
+  %vecext.5 = extractelement <8 x float> %0, i64 5
+  %6 = call i64 @llvm.llrint.i64.f32(float %vecext.5)
+  %vecins.5 = insertelement <8 x i64> %vecins.4, i64 %6, i64 5
+  %vecext.6 = extractelement <8 x float> %0, i64 6
+  %7 = call i64 @llvm.llrint.i64.f32(float %vecext.6)
+  %vecins.6 = insertelement <8 x i64> %vecins.5, i64 %7, i64 6
+  %vecext.7 = extractelement <8 x float> %0, i64 7
+  %8 = call i64 @llvm.llrint.i64.f32(float %vecext.7)
+  %vecins.7 = insertelement <8 x i64> %vecins.6, i64 %8, i64 7
+  ret <8 x i64> %vecins.7
+}
+
 declare float @llvm.rint.f32(float)
 declare i64 @llvm.lrint.i64.f32(float)
+declare i64 @llvm.llrint.i64.f32(float)

RKSimon · 2023-10-23T16:51:48Z

Any chance you could add some Analysis/CostModel/XXX/ test coverage please?

artagnon · 2023-10-23T16:55:40Z

Any chance you could add some Analysis/CostModel/XXX/ test coverage please?

I already added it in #66924. See https://github.com/llvm/llvm-project/pull/66924/files#diff-7a095b9cb54a5f064065524b233fea89606804a2426e81365757a410c6441b53.

davemgreen

My understanding is that the changes in isTriviallyVectorizable will alter the LoopVectorizer and Scalarization too.

Can you add some tests to make sure they work, similar to https://reviews.llvm.org/D124358.

With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.

artagnon · 2023-10-31T13:50:17Z

As the pre-commit tests have all landed, it should now be possible to review this patch independently.

@davemgreen Thanks; fixed now.

RKSimon

LGTM - cheers

artagnon · 2023-10-31T15:23:27Z

Sorry, I forgot to mention: I have some concerns with the custom RISC-V lowering, that I raised here, even though Craig approved the patch. I'm worried that I might break [l]lrint on RISC-V by merging this patch: could someone kindly let me know if my concern is real?

artagnon · 2023-10-31T16:07:45Z

Update: I attempted to break something by writing variants of programs with std::lrint and std::llrint that SLP would vectorize, and running it through clang. It seems that nothing is vectorized, and there are still libcalls to lrint and llrint. It seems that there is more work to be done, and #55208 won't be fixed with this patch. The good news is that this patch doesn't break anything, and should be okay to merge.

RKSimon · 2023-10-31T20:19:18Z

The patch still LGTM to me, the missing libcall to Intrinsic conversion seems to be entirely independent of this.

mstorsjo · 2023-11-01T08:12:23Z

This commit caused failed asserts when building ffmpeg for x86. The issue is reproducible in a reduced form like this:

short a[];
int b;
long lrint();
void c() {
  for (; b; b++)
    a[b] = lrint(b);
}

$ clang -target i686-linux-gnu -c repro.c -O2 -fno-math-errno
clang: /home/martin/code/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h:160: const llvm::SDValue& llvm::DAGTypeLegalizer::getSDValue(llvm::DAGTypeLegalizer::TableId&): Assertion `Id && "TableId should be non-zero"' failed.

(The same also reproduces with a x86_64-w64-mingw32 target triple, but apparently not with x86_64-linux-gnu.)

Please have a look, or revert if fixing takes a while.

…9945)" This reverts commit 5bfd89b. It was causing build failures on ffmpeg on i686.

artagnon · 2023-11-01T09:59:28Z

Thanks for the report and the reduced test case! Change reverted in ac7c816, and requires further investigation to fix underlying issue.

artagnon requested review from preames, RKSimon, lukel97, alexey-bataev, davemgreen and topperc October 23, 2023 16:47

llvmbot added llvm:analysis llvm:transforms labels Oct 23, 2023

davemgreen reviewed Oct 23, 2023

View reviewed changes

artagnon changed the title ~~SLPVectorizer: vectorize lrint, llrint~~ VectorUtils: mark lrint, llrint as trivially vectorizable Oct 31, 2023

artagnon force-pushed the slp-lrint branch from c37b42d to 7f7dee5 Compare October 31, 2023 13:49

artagnon requested a review from davemgreen October 31, 2023 13:50

RKSimon approved these changes Oct 31, 2023

View reviewed changes

artagnon merged commit 5bfd89b into llvm:main Oct 31, 2023

artagnon deleted the slp-lrint branch October 31, 2023 21:29

artagnon added a commit that referenced this pull request Nov 1, 2023

Revert "VectorUtils: mark lrint, llrint as trivially vectorizable (#6…

ac7c816

…9945)" This reverts commit 5bfd89b. It was causing build failures on ffmpeg on i686.

artagnon mentioned this pull request Nov 3, 2023

[X86] llvm.lrint.vxi32.vxf64 fails to codegen #71187

Closed

artagnon mentioned this pull request Nov 6, 2023

clang/CodeGen/RISCV: test lowering of math builtins #71399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorUtils: mark lrint, llrint as trivially vectorizable #69945

VectorUtils: mark lrint, llrint as trivially vectorizable #69945

artagnon commented Oct 23, 2023 •

edited

Loading

llvmbot commented Oct 23, 2023 •

edited

Loading

RKSimon commented Oct 23, 2023

artagnon commented Oct 23, 2023

davemgreen left a comment

artagnon commented Oct 31, 2023

RKSimon left a comment

artagnon commented Oct 31, 2023

artagnon commented Oct 31, 2023

RKSimon commented Oct 31, 2023

mstorsjo commented Nov 1, 2023 •

edited

Loading

artagnon commented Nov 1, 2023

VectorUtils: mark lrint, llrint as trivially vectorizable #69945

VectorUtils: mark lrint, llrint as trivially vectorizable #69945

Conversation

artagnon commented Oct 23, 2023 • edited Loading

llvmbot commented Oct 23, 2023 • edited Loading

RKSimon commented Oct 23, 2023

artagnon commented Oct 23, 2023

davemgreen left a comment

Choose a reason for hiding this comment

artagnon commented Oct 31, 2023

RKSimon left a comment

Choose a reason for hiding this comment

artagnon commented Oct 31, 2023

artagnon commented Oct 31, 2023

RKSimon commented Oct 31, 2023

mstorsjo commented Nov 1, 2023 • edited Loading

artagnon commented Nov 1, 2023

artagnon commented Oct 23, 2023 •

edited

Loading

llvmbot commented Oct 23, 2023 •

edited

Loading

mstorsjo commented Nov 1, 2023 •

edited

Loading