-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VectorUtils: mark lrint, llrint as trivially vectorizable #69945
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Ramkumar Ramachandra (artagnon) ChangesWith the recent change 98c90a (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for Fixes #55208. -- 8< -- Full diff: https://github.com/llvm/llvm-project/pull/69945.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index 9893e23468e177d..628c82e611faea1 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -92,6 +92,8 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
case Intrinsic::canonicalize:
case Intrinsic::fptosi_sat:
case Intrinsic::fptoui_sat:
+ case Intrinsic::lrint:
+ case Intrinsic::llrint:
return true;
default:
return false;
@@ -123,6 +125,8 @@ bool llvm::isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
switch (ID) {
case Intrinsic::fptosi_sat:
case Intrinsic::fptoui_sat:
+ case Intrinsic::lrint:
+ case Intrinsic::llrint:
return OpdIdx == -1 || OpdIdx == 0;
case Intrinsic::is_fpclass:
return OpdIdx == 0;
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
index 9206f529cbfd368..30d6f877ce177a3 100644
--- a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
@@ -29,24 +29,32 @@ entry:
ret <4 x float> %vecins.3
}
+define <2 x i64> @lrint_v2i64f32(ptr %a) {
+; CHECK-LABEL: define <2 x i64> @lrint_v2i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
+; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.lrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT: ret <2 x i64> [[TMP1]]
+;
+entry:
+ %0 = load <2 x float>, ptr %a
+ %vecext = extractelement <2 x float> %0, i64 0
+ %1 = call i64 @llvm.lrint.i64.f32(float %vecext)
+ %vecins = insertelement <2 x i64> undef, i64 %1, i64 0
+ %vecext.1 = extractelement <2 x float> %0, i64 1
+ %2 = call i64 @llvm.lrint.i64.f32(float %vecext.1)
+ %vecins.1 = insertelement <2 x i64> %vecins, i64 %2, i64 1
+ ret <2 x i64> %vecins.1
+}
+
define <4 x i64> @lrint_v4i64f32(ptr %a) {
; CHECK-LABEL: define <4 x i64> @lrint_v4i64f32(
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT: ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT: ret <4 x i64> [[TMP1]]
;
entry:
%0 = load <4 x float>, ptr %a
@@ -65,5 +73,136 @@ entry:
ret <4 x i64> %vecins.3
}
+define <8 x i64> @lrint_v8i64f32(ptr %a) {
+; CHECK-LABEL: define <8 x i64> @lrint_v8i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: ret <8 x i64> [[VECINS_71]]
+;
+entry:
+ %0 = load <8 x float>, ptr %a
+ %vecext = extractelement <8 x float> %0, i64 0
+ %1 = call i64 @llvm.lrint.i64.f32(float %vecext)
+ %vecins = insertelement <8 x i64> undef, i64 %1, i64 0
+ %vecext.1 = extractelement <8 x float> %0, i64 1
+ %2 = call i64 @llvm.lrint.i64.f32(float %vecext.1)
+ %vecins.1 = insertelement <8 x i64> %vecins, i64 %2, i64 1
+ %vecext.2 = extractelement <8 x float> %0, i64 2
+ %3 = call i64 @llvm.lrint.i64.f32(float %vecext.2)
+ %vecins.2 = insertelement <8 x i64> %vecins.1, i64 %3, i64 2
+ %vecext.3 = extractelement <8 x float> %0, i64 3
+ %4 = call i64 @llvm.lrint.i64.f32(float %vecext.3)
+ %vecins.3 = insertelement <8 x i64> %vecins.2, i64 %4, i64 3
+ %vecext.4 = extractelement <8 x float> %0, i64 4
+ %5 = call i64 @llvm.lrint.i64.f32(float %vecext.4)
+ %vecins.4 = insertelement <8 x i64> %vecins.3, i64 %5, i64 4
+ %vecext.5 = extractelement <8 x float> %0, i64 5
+ %6 = call i64 @llvm.lrint.i64.f32(float %vecext.5)
+ %vecins.5 = insertelement <8 x i64> %vecins.4, i64 %6, i64 5
+ %vecext.6 = extractelement <8 x float> %0, i64 6
+ %7 = call i64 @llvm.lrint.i64.f32(float %vecext.6)
+ %vecins.6 = insertelement <8 x i64> %vecins.5, i64 %7, i64 6
+ %vecext.7 = extractelement <8 x float> %0, i64 7
+ %8 = call i64 @llvm.lrint.i64.f32(float %vecext.7)
+ %vecins.7 = insertelement <8 x i64> %vecins.6, i64 %8, i64 7
+ ret <8 x i64> %vecins.7
+}
+
+define <2 x i64> @llrint_v2i64f32(ptr %a) {
+; CHECK-LABEL: define <2 x i64> @llrint_v2i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
+; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.llrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT: ret <2 x i64> [[TMP1]]
+;
+entry:
+ %0 = load <2 x float>, ptr %a
+ %vecext = extractelement <2 x float> %0, i64 0
+ %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+ %vecins = insertelement <2 x i64> undef, i64 %1, i64 0
+ %vecext.1 = extractelement <2 x float> %0, i64 1
+ %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+ %vecins.1 = insertelement <2 x i64> %vecins, i64 %2, i64 1
+ ret <2 x i64> %vecins.1
+}
+
+define <4 x i64> @llrint_v4i64f32(ptr %a) {
+; CHECK-LABEL: define <4 x i64> @llrint_v4i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
+; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT: ret <4 x i64> [[TMP1]]
+;
+entry:
+ %0 = load <4 x float>, ptr %a
+ %vecext = extractelement <4 x float> %0, i64 0
+ %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+ %vecins = insertelement <4 x i64> undef, i64 %1, i64 0
+ %vecext.1 = extractelement <4 x float> %0, i64 1
+ %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+ %vecins.1 = insertelement <4 x i64> %vecins, i64 %2, i64 1
+ %vecext.2 = extractelement <4 x float> %0, i64 2
+ %3 = call i64 @llvm.llrint.i64.f32(float %vecext.2)
+ %vecins.2 = insertelement <4 x i64> %vecins.1, i64 %3, i64 2
+ %vecext.3 = extractelement <4 x float> %0, i64 3
+ %4 = call i64 @llvm.llrint.i64.f32(float %vecext.3)
+ %vecins.3 = insertelement <4 x i64> %vecins.2, i64 %4, i64 3
+ ret <4 x i64> %vecins.3
+}
+
+define <8 x i64> @llrint_v8i64f32(ptr %a) {
+; CHECK-LABEL: define <8 x i64> @llrint_v8i64f32(
+; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: ret <8 x i64> [[VECINS_71]]
+;
+entry:
+ %0 = load <8 x float>, ptr %a
+ %vecext = extractelement <8 x float> %0, i64 0
+ %1 = call i64 @llvm.llrint.i64.f32(float %vecext)
+ %vecins = insertelement <8 x i64> undef, i64 %1, i64 0
+ %vecext.1 = extractelement <8 x float> %0, i64 1
+ %2 = call i64 @llvm.llrint.i64.f32(float %vecext.1)
+ %vecins.1 = insertelement <8 x i64> %vecins, i64 %2, i64 1
+ %vecext.2 = extractelement <8 x float> %0, i64 2
+ %3 = call i64 @llvm.llrint.i64.f32(float %vecext.2)
+ %vecins.2 = insertelement <8 x i64> %vecins.1, i64 %3, i64 2
+ %vecext.3 = extractelement <8 x float> %0, i64 3
+ %4 = call i64 @llvm.llrint.i64.f32(float %vecext.3)
+ %vecins.3 = insertelement <8 x i64> %vecins.2, i64 %4, i64 3
+ %vecext.4 = extractelement <8 x float> %0, i64 4
+ %5 = call i64 @llvm.llrint.i64.f32(float %vecext.4)
+ %vecins.4 = insertelement <8 x i64> %vecins.3, i64 %5, i64 4
+ %vecext.5 = extractelement <8 x float> %0, i64 5
+ %6 = call i64 @llvm.llrint.i64.f32(float %vecext.5)
+ %vecins.5 = insertelement <8 x i64> %vecins.4, i64 %6, i64 5
+ %vecext.6 = extractelement <8 x float> %0, i64 6
+ %7 = call i64 @llvm.llrint.i64.f32(float %vecext.6)
+ %vecins.6 = insertelement <8 x i64> %vecins.5, i64 %7, i64 6
+ %vecext.7 = extractelement <8 x float> %0, i64 7
+ %8 = call i64 @llvm.llrint.i64.f32(float %vecext.7)
+ %vecins.7 = insertelement <8 x i64> %vecins.6, i64 %8, i64 7
+ ret <8 x i64> %vecins.7
+}
+
declare float @llvm.rint.f32(float)
declare i64 @llvm.lrint.i64.f32(float)
+declare i64 @llvm.llrint.i64.f32(float)
|
Any chance you could add some Analysis/CostModel/XXX/ test coverage please? |
I already added it in #66924. See https://github.com/llvm/llvm-project/pull/66924/files#diff-7a095b9cb54a5f064065524b233fea89606804a2426e81365757a410c6441b53. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the changes in isTriviallyVectorizable will alter the LoopVectorizer and Scalarization too.
Can you add some tests to make sure they work, similar to https://reviews.llvm.org/D124358.
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.
As the pre-commit tests have all landed, it should now be possible to review this patch independently. @davemgreen Thanks; fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers
Sorry, I forgot to mention: I have some concerns with the custom RISC-V lowering, that I raised here, even though Craig approved the patch. I'm worried that I might break [l]lrint on RISC-V by merging this patch: could someone kindly let me know if my concern is real? |
Update: I attempted to break something by writing variants of programs with std::lrint and std::llrint that SLP would vectorize, and running it through clang. It seems that nothing is vectorized, and there are still libcalls to lrint and llrint. It seems that there is more work to be done, and #55208 won't be fixed with this patch. The good news is that this patch doesn't break anything, and should be okay to merge. |
The patch still LGTM to me, the missing libcall to Intrinsic conversion seems to be entirely independent of this. |
This commit caused failed asserts when building ffmpeg for x86. The issue is reproducible in a reduced form like this: short a[];
int b;
long lrint();
void c() {
for (; b; b++)
a[b] = lrint(b);
} $ clang -target i686-linux-gnu -c repro.c -O2 -fno-math-errno
clang: /home/martin/code/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h:160: const llvm::SDValue& llvm::DAGTypeLegalizer::getSDValue(llvm::DAGTypeLegalizer::TableId&): Assertion `Id && "TableId should be non-zero"' failed. (The same also reproduces with a Please have a look, or revert if fixing takes a while. |
Thanks for the report and the reduced test case! Change reverted in ac7c816, and requires further investigation to fix underlying issue. |
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.