Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Add patterns for V_CMP_O/U #69157

Merged
merged 2 commits into from
Oct 16, 2023
Merged

Conversation

Pierre-vh
Copy link
Contributor

Fixes SWDEV-427162

@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Fixes SWDEV-427162


Patch is 37.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69157.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/VOPCInstructions.td (+8)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll (+346)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll (+460)
diff --git a/llvm/lib/Target/AMDGPU/VOPCInstructions.td b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
index 6fc3d0957dce191..63e75a2382366e1 100644
--- a/llvm/lib/Target/AMDGPU/VOPCInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOPCInstructions.td
@@ -1081,6 +1081,8 @@ multiclass FCMP_Pattern <PatFrags cond, Instruction inst, ValueType vt> {
   }
 }
 
+defm : FCMP_Pattern <COND_O, V_CMP_O_F32_e64, f32>;
+defm : FCMP_Pattern <COND_UO, V_CMP_U_F32_e64, f32>;
 defm : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F32_e64, f32>;
 defm : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F32_e64, f32>;
 defm : FCMP_Pattern <COND_OGT, V_CMP_GT_F32_e64, f32>;
@@ -1088,6 +1090,8 @@ defm : FCMP_Pattern <COND_OGE, V_CMP_GE_F32_e64, f32>;
 defm : FCMP_Pattern <COND_OLT, V_CMP_LT_F32_e64, f32>;
 defm : FCMP_Pattern <COND_OLE, V_CMP_LE_F32_e64, f32>;
 
+defm : FCMP_Pattern <COND_O, V_CMP_O_F64_e64, f64>;
+defm : FCMP_Pattern <COND_UO, V_CMP_U_F64_e64, f64>;
 defm : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F64_e64, f64>;
 defm : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F64_e64, f64>;
 defm : FCMP_Pattern <COND_OGT, V_CMP_GT_F64_e64, f64>;
@@ -1110,6 +1114,8 @@ defm : FCMP_Pattern <COND_ULT, V_CMP_NGE_F64_e64, f64>;
 defm : FCMP_Pattern <COND_ULE, V_CMP_NGT_F64_e64, f64>;
 
 let OtherPredicates = [HasTrue16BitInsts] in {
+defm : FCMP_Pattern <COND_O, V_CMP_U_F16_t16_e64, f16>;
+defm : FCMP_Pattern <COND_UO, V_CMP_O_F16_t16_e64, f16>;
 defm : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F16_t16_e64, f16>;
 defm : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F16_t16_e64, f16>;
 defm : FCMP_Pattern <COND_OGT, V_CMP_GT_F16_t16_e64, f16>;
@@ -1126,6 +1132,8 @@ defm : FCMP_Pattern <COND_ULE, V_CMP_NGT_F16_t16_e64, f16>;
 } // End OtherPredicates = [HasTrue16BitInsts]
 
 let OtherPredicates = [NotHasTrue16BitInsts] in {
+defm : FCMP_Pattern <COND_O, V_CMP_U_F16_e64, f16>;
+defm : FCMP_Pattern <COND_UO, V_CMP_O_F16_e64, f16>;
 defm : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F16_e64, f16>;
 defm : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F16_e64, f16>;
 defm : FCMP_Pattern <COND_OGT, V_CMP_GT_F16_e64, f16>;
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll
index 007b52fa3a0c6f2..16f90296b98908b 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w32.ll
@@ -494,6 +494,121 @@ define amdgpu_kernel void @v_fcmp_f32_ole(ptr addrspace(1) %out, float %src) {
   ret void
 }
 
+define amdgpu_kernel void @v_fcmp_f32_o(ptr addrspace(1) %out, float %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f32_o:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_clause 0x1
+; SDAG-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; SDAG-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_o_f32_e64 s2, 0x42c80000, s2
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f32_o:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_clause 0x1
+; SDAG-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; SDAG-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_o_f32_e64 s0, 0x42c80000, s4
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s0
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[2:3]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f32_o:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_clause 0x1
+; GISEL-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; GISEL-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_o_f32_e64 s2, 0x42c80000, s2
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f32_o:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_clause 0x1
+; GISEL-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; GISEL-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_o_f32_e64 s0, 0x42c80000, s4
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s0
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[2:3]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 7)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @v_fcmp_f32_uo(ptr addrspace(1) %out, float %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f32_uo:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_clause 0x1
+; SDAG-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; SDAG-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_u_f32_e64 s2, 0x42c80000, s2
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f32_uo:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_clause 0x1
+; SDAG-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; SDAG-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_u_f32_e64 s0, 0x42c80000, s4
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s0
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[2:3]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f32_uo:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_clause 0x1
+; GISEL-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; GISEL-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_u_f32_e64 s2, 0x42c80000, s2
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f32_uo:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_clause 0x1
+; GISEL-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; GISEL-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_u_f32_e64 s0, 0x42c80000, s4
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s0
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[2:3]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 8)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
 
 define amdgpu_kernel void @v_fcmp_f32_ueq(ptr addrspace(1) %out, float %src) {
 ; SDAG-GFX11-LABEL: v_fcmp_f32_ueq:
@@ -1249,6 +1364,122 @@ define amdgpu_kernel void @v_fcmp_f64_ueq(ptr addrspace(1) %out, double %src) {
   ret void
 }
 
+define amdgpu_kernel void @v_fcmp_f64_o(ptr addrspace(1) %out, double %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f64_o:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_load_b128 s[0:3], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_mov_b32 s4, 0
+; SDAG-GFX11-NEXT:    s_mov_b32 s5, 0x40590000
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_o_f64_e64 s2, s[2:3], s[4:5]
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f64_o:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    s_mov_b32 s4, 0
+; SDAG-GFX10-NEXT:    s_mov_b32 s5, 0x40590000
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_o_f64_e64 s2, s[2:3], s[4:5]
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s2
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[0:1]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f64_o:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_load_b128 s[0:3], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    s_mov_b32 s4, 0
+; GISEL-GFX11-NEXT:    s_mov_b32 s5, 0x40590000
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_o_f64_e64 s2, s[2:3], s[4:5]
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f64_o:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    s_mov_b32 s4, 0
+; GISEL-GFX10-NEXT:    s_mov_b32 s5, 0x40590000
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_o_f64_e64 s2, s[2:3], s[4:5]
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[0:1]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 7)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @v_fcmp_f64_uo(ptr addrspace(1) %out, double %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f64_uo:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_load_b128 s[0:3], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_mov_b32 s4, 0
+; SDAG-GFX11-NEXT:    s_mov_b32 s5, 0x40590000
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_u_f64_e64 s2, s[2:3], s[4:5]
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f64_uo:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    s_mov_b32 s4, 0
+; SDAG-GFX10-NEXT:    s_mov_b32 s5, 0x40590000
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_u_f64_e64 s2, s[2:3], s[4:5]
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s2
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[0:1]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f64_uo:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_load_b128 s[0:3], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    s_mov_b32 s4, 0
+; GISEL-GFX11-NEXT:    s_mov_b32 s5, 0x40590000
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_u_f64_e64 s2, s[2:3], s[4:5]
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f64_uo:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_load_dwordx4 s[0:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    s_mov_b32 s4, 0
+; GISEL-GFX10-NEXT:    s_mov_b32 s5, 0x40590000
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_u_f64_e64 s2, s[2:3], s[4:5]
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[0:1]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 8)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
 define amdgpu_kernel void @v_fcmp_f64_une(ptr addrspace(1) %out, double %src) {
 ; SDAG-GFX11-LABEL: v_fcmp_f64_une:
 ; SDAG-GFX11:       ; %bb.0:
@@ -2348,6 +2579,121 @@ define amdgpu_kernel void @v_fcmp_f16_ult(ptr addrspace(1) %out, half %src) {
   ret void
 }
 
+define amdgpu_kernel void @v_fcmp_f16_o(ptr addrspace(1) %out, half %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f16_o:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_clause 0x1
+; SDAG-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; SDAG-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_u_f16_e64 s2, 0x5640, s2
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f16_o:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_clause 0x1
+; SDAG-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; SDAG-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_u_f16_e64 s0, 0x5640, s4
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s0
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[2:3]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f16_o:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_clause 0x1
+; GISEL-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; GISEL-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_o_f16_e64 s2, 0x5640, s2
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f16_o:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_clause 0x1
+; GISEL-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; GISEL-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_o_f16_e64 s0, 0x5640, s4
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s0
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[2:3]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f16(half %src, half 100.00, i32 7)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @v_fcmp_f16_uo(ptr addrspace(1) %out, half %src) {
+; SDAG-GFX11-LABEL: v_fcmp_f16_uo:
+; SDAG-GFX11:       ; %bb.0:
+; SDAG-GFX11-NEXT:    s_clause 0x1
+; SDAG-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; SDAG-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; SDAG-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX11-NEXT:    v_cmp_o_f16_e64 s2, 0x5640, s2
+; SDAG-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; SDAG-GFX11-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, s2
+; SDAG-GFX11-NEXT:    global_store_b32 v0, v1, s[0:1]
+; SDAG-GFX11-NEXT:    s_nop 0
+; SDAG-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; SDAG-GFX11-NEXT:    s_endpgm
+;
+; SDAG-GFX10-LABEL: v_fcmp_f16_uo:
+; SDAG-GFX10:       ; %bb.0:
+; SDAG-GFX10-NEXT:    s_clause 0x1
+; SDAG-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; SDAG-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; SDAG-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; SDAG-GFX10-NEXT:    v_cmp_o_f16_e64 s0, 0x5640, s4
+; SDAG-GFX10-NEXT:    v_mov_b32_e32 v1, s0
+; SDAG-GFX10-NEXT:    global_store_dword v0, v1, s[2:3]
+; SDAG-GFX10-NEXT:    s_endpgm
+;
+; GISEL-GFX11-LABEL: v_fcmp_f16_uo:
+; GISEL-GFX11:       ; %bb.0:
+; GISEL-GFX11-NEXT:    s_clause 0x1
+; GISEL-GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; GISEL-GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX11-NEXT:    v_cmp_u_f16_e64 s2, 0x5640, s2
+; GISEL-GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GISEL-GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GISEL-GFX11-NEXT:    global_store_b32 v1, v0, s[0:1]
+; GISEL-GFX11-NEXT:    s_nop 0
+; GISEL-GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GISEL-GFX11-NEXT:    s_endpgm
+;
+; GISEL-GFX10-LABEL: v_fcmp_f16_uo:
+; GISEL-GFX10:       ; %bb.0:
+; GISEL-GFX10-NEXT:    s_clause 0x1
+; GISEL-GFX10-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; GISEL-GFX10-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; GISEL-GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GISEL-GFX10-NEXT:    v_cmp_u_f16_e64 s0, 0x5640, s4
+; GISEL-GFX10-NEXT:    v_mov_b32_e32 v0, s0
+; GISEL-GFX10-NEXT:    global_store_dword v1, v0, s[2:3]
+; GISEL-GFX10-NEXT:    s_endpgm
+  %result = call i32 @llvm.amdgcn.fcmp.f16(half %src, half 100.00, i32 8)
+  store i32 %result, ptr addrspace(1) %out
+  ret void
+}
 
 define amdgpu_kernel void @v_fcmp_f16_ule(ptr addrspace(1) %out, half %src) {
 ; SDAG-GFX11-LABEL: v_fcmp_f16_ule:
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll
index eeff0c57bb46152..b62512ae05e6e3f 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.w64.ll
@@ -546,6 +546,129 @@ define amdgpu_kernel void @v_fcmp_f32_ole(ptr addrspace(1) %out, float %src) {
   ret void
 }
 
+define amdgpu_kernel void @v_fcmp_f32_o(ptr addrspace(1) %out, float %src) {
+; GFX11-LABEL: v_fcmp_f32_o:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_clause 0x1
+; GFX11-NEXT:    s_load_b32 s2, s[0:1], 0x2c
+; GFX11-NEXT:    s_load_b64 s[0:1], s[0:1], 0x24
+; GFX11-NEXT:    v_mov_b32_e32 v2, 0
+; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:    v_cmp_o_f32_e64 s[2:3], 0x42c80000, s2
+; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX11-NEXT:    v_mov_b32_e32 v0, s2
+; GFX11-NEXT:    v_mov_b32_e32 v1, s3
+; GFX11-NEXT:    global_store_b64 v2, v[0:1], s[0:1]
+; GFX11-NEXT:    s_nop 0
+; GFX11-NEXT:    s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
+; GFX11-NEXT:    s_endpgm
+;
+; GFX9-LABEL: v_fcmp_f32_o:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_load_dword s4, s[0:1], 0x2c
+; GFX9-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x24
+; GFX9-NEXT:    v_mov_b32_e32 v0, 0x42c80000
+; GFX9-NEXT:    v_mov_b32_e32 v2, 0
+; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX9-NEXT:    v_cmp_o_f32_e64 s[0:1], s4, v0
+; GFX9-NEXT:    v_mov_b32_e32 v0, s0
+; GFX9-NEXT:    v_mov_b32_e32 v1, s1
+; GFX9-NEXT:    global_store_dwordx2 v2, v[0:1], s[2:3]
+; GFX9-NEXT:    s_endpgm
+;
+; VI-SDAG-LABEL: v_fcmp_f32_o:
+; VI-SDAG:       ; %bb.0:
+; VI-SDAG-NEXT:    s_load_dword s2, s[0:1], 0x2c
+; VI-SDAG-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x24
+; VI-SDAG-NEXT:    v_mov_b32_e32 v0, 0x42c80000
+; VI-SDAG-NEXT:    s_waitcnt lgkmcnt(0)
+; VI-SDAG-NEXT:    v_cmp_o_f32_e64 s[2:3], s2, v0
+; VI-SDAG-NEXT:    v_mov_b32_e32 v0, s0
+; VI-SDAG-NEXT:    v_mov_b32_e32 v2, s2
+; VI-SDAG-NEXT:    v_mov_b32_e32 v1, s1
+; VI-SDAG-NEXT:    v_mov_b32_e32 v3, s3
+; VI-SDAG-NEXT:    flat_store_dwordx2 v[0:1], v[2:3]
+; VI-SDAG-NEXT:    s_endpgm
+;
+; VI-GISEL-LABEL: v_fcmp_f32_o:
+; VI-GISEL:       ; %bb.0:
+; VI-GISEL-NEXT:    s_load_dword s2, s[0:1], 0x2c
+; VI-GISEL-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x24
+; VI-GISEL-NEXT:    v_mov_b32_e32 v0, 0x42c80000
+; VI-GISEL-NEXT:    s_waitcnt lgkmcnt(0)
+; VI-GISEL-NEXT:    v_cmp_o_f32_e64 s[2:3], s2, v0
+; VI-GISEL-NEXT:    v_mov_b32_e32 v0, s2
+; VI-GISEL-NEXT:    v_mov_b32_e32 v3, s1
+; VI-GISEL-NEXT:    v_mov_b32_e32 v1, s3
+; VI-GISEL-NEXT:    v_mov_b32_e32 ...
[truncated]

; GISEL-GFX10-NEXT: v_mov_b32_e32 v0, s0
; GISEL-GFX10-NEXT: global_store_dword v1, v0, s[2:3]
; GISEL-GFX10-NEXT: s_endpgm
%result = call i32 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the problem also occur when you use regular fcmp instructions (possibly combined with a ballot intrinsic)? Or does it only affect the llvm.amdgcn.fcmp intrinsic?

The llvm.amdgcn.fcmp intrinsic should be deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure really, I observed this due to a fcmp being generated from InstCombineIntrinsics by combining a icmp + fcmp
In this case the setcc setuo is emitted from FCMP lowering. Maybe the actual fcmp inst goes through a different path and the o/uo gets legalized away?

@Pierre-vh Pierre-vh merged commit 4d6fc88 into llvm:main Oct 16, 2023
@Pierre-vh Pierre-vh deleted the additional_vcmp_pats branch October 18, 2023 12:03
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Oct 20, 2023
Local branch amd-gfx c0764b1 Merged main:160e8eb44961 into amd-gfx:da11a8d21864
Remote branch main 4d6fc88 [AMDGPU] Add patterns for V_CMP_O/U (llvm#69157)
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 27, 2023
Fixes SWDEV-427162

Change-Id: Ib7decbb7a18f6631d659e2f84752367741289704
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request Dec 15, 2023
Fixes SWDEV-427162

Change-Id: Ib7decbb7a18f6631d659e2f84752367741289704
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants