We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi. I'm having an issue with softmax when computing it first for FP16 and then for FP32 arrays (or vice versa).
If I first compute softmax for FP16, then when I invoke it for FP32 it passes the arguments as if they were FP16 thus producing invalid results.
Here's MWE (for Julia language) with MIOPEN_LOG_LEVEL=7 AMD_LOG_LEVEL=7 logs.
MIOPEN_LOG_LEVEL=7 AMD_LOG_LEVEL=7
As can be seen, when invoking softmax for FP32 even though the tensor descriptor shows dtype as FP32, the Arg 0 & Arg 1 are displayed as half*.
Arg 0
Arg 1
half*
Same happens when first computing FP32 and then FP16, but in this case if uses float* for FP16.
float*
I'm using ROCm 6.1.1. Could it be because problem descriptor does not take into account the dtype and thus returns FP16 kernel for FP32 input?
julia> using AMDGPU julia> x1 = AMDGPU.rand(Float16, 5); julia> x2 = AMDGPU.rand(Float32, 5); julia> y1 = AMDGPU.MIOpen.softmax(x1; dims=:) :3:hip_device_runtime.cpp :623 : 74316958870 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef4670 ) :3:hip_device_runtime.cpp :631 : 74316958885 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_memory.cpp :777 : 74316958895 us: [pid:419303 tid:0x7cb967986d00] hipMemGetInfo ( 0x7cb95cef4640, 0x7cb95cef4660 ) :3:hip_memory.cpp :801 : 74316958905 us: [pid:419303 tid:0x7cb967986d00] hipMemGetInfo: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316958908 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef46a0 ) :3:hip_device_runtime.cpp :631 : 74316958912 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_mempool.cpp :167 : 74316958918 us: [pid:419303 tid:0x7cb967986d00] hipMemPoolGetAttribute ( 0x2fbc420, 5, 0x7cb95cef4680 ) :3:hip_mempool.cpp :172 : 74316958923 us: [pid:419303 tid:0x7cb967986d00] hipMemPoolGetAttribute: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316958928 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef46c0 ) :3:hip_device_runtime.cpp :631 : 74316958932 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_mempool.cpp :167 : 74316958938 us: [pid:419303 tid:0x7cb967986d00] hipMemPoolGetAttribute ( 0x2fbc420, 7, 0x7cb95cef46b0 ) :3:hip_mempool.cpp :172 : 74316958942 us: [pid:419303 tid:0x7cb967986d00] hipMemPoolGetAttribute: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316958948 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef46e0 ) :3:hip_device_runtime.cpp :631 : 74316958951 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_memory.cpp :599 : 74316958956 us: [pid:419303 tid:0x7cb967986d00] hipMalloc ( 0x7cb95cef46d0, 10 ) :4:rocdevice.cpp :2310: 74316958970 us: [pid:419303 tid:0x7cb967986d00] Allocate hsa device memory 0x7cb76d402000, size 0xa :3:rocdevice.cpp :2349: 74316958974 us: [pid:419303 tid:0x7cb967986d00] device=0x237f2d0, freeMem_ = 0x5fecfffd8 :3:hip_memory.cpp :601 : 74316958980 us: [pid:419303 tid:0x7cb967986d00] hipMalloc: Returned hipSuccess : 0x7cb76d402000: duration: 24 us typeof(y) = ROCArray{Float16, 1, AMDGPU.Runtime.Mem.HIPBuffer} typeof(x) = ROCArray{Float16, 1, AMDGPU.Runtime.Mem.HIPBuffer} :3:hip_device_runtime.cpp :623 : 74316964432 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef7fd0 ) :3:hip_device_runtime.cpp :631 : 74316964438 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316984219 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95cef7fe0 ) :3:hip_device_runtime.cpp :631 : 74316984227 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316984244 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95ce54020 ) :3:hip_device_runtime.cpp :631 : 74316984248 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74316984250 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95ce54030 ) :3:hip_device_runtime.cpp :631 : 74316984253 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : xdesc = AMDGPU.MIOpen.TensorDescriptor(Ptr{AMDGPU.MIOpen.miopenTensorDescriptor} @0x00000000037d6c50, AMDGPU.MIOpen.miopenHalf) ydesc = AMDGPU.MIOpen.TensorDescriptor(Ptr{AMDGPU.MIOpen.miopenTensorDescriptor} @0x0000000003f087d0, AMDGPU.MIOpen.miopenHalf) :3:hip_device_runtime.cpp :623 : 74317020908 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95bdc0260 ) :3:hip_device_runtime.cpp :631 : 74317020920 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_context.cpp :137 : 74317020935 us: [pid:419303 tid:0x7cb967986d00] hipInit ( 0 ) :3:hip_context.cpp :143 : 74317020938 us: [pid:419303 tid:0x7cb967986d00] hipInit: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74317020944 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7ffff82c84a8 ) :3:hip_device_runtime.cpp :631 : 74317020947 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74317020953 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7ffff82c7e88 ) :3:hip_device_runtime.cpp :631 : 74317020956 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device.cpp :471 : 74317020961 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600 ( 0x7ffff82c7e88, 0 ) :3:hip_device.cpp :473 : 74317020966 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600: Returned hipSuccess : :3:hip_memory.cpp :599 : 74317020978 us: [pid:419303 tid:0x7cb967986d00] hipMalloc ( 0x2d43868, 33554432 ) :4:rocdevice.cpp :2310: 74317021338 us: [pid:419303 tid:0x7cb967986d00] Allocate hsa device memory 0x7cb6f3e00000, size 0x2000000 :3:rocdevice.cpp :2349: 74317021344 us: [pid:419303 tid:0x7cb967986d00] device=0x237f2d0, freeMem_ = 0x5fccfffd8 :3:hip_memory.cpp :601 : 74317021350 us: [pid:419303 tid:0x7cb967986d00] hipMalloc: Returned hipSuccess : 0x7cb6f3e00000: duration: 372 us :3:hip_device.cpp :471 : 74317021361 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600 ( 0x7ffff82c7ad8, 0 ) :3:hip_device.cpp :473 : 74317021365 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600: Returned hipSuccess : MIOpen(HIP): Info [get_device_name] Raw device name: gfx1100 MIOpen(HIP): Info [Handle] stream: 0, device_id: 0 :3:hip_device_runtime.cpp :623 : 74317059767 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95be1ad20 ) :3:hip_device_runtime.cpp :631 : 74317059775 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_stream.cpp :617 : 74317059789 us: [pid:419303 tid:0x7cb967986d00] hipStreamQuery ( stream:0x34d94e0 ) :3:rocdevice.cpp :2840: 74317059797 us: [pid:419303 tid:0x7cb967986d00] No HW event :3:hip_stream.cpp :618 : 74317059802 us: [pid:419303 tid:0x7cb967986d00] hipStreamQuery: Returned hipSuccess : :3:hip_device.cpp :471 : 74317059808 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600 ( 0x7ffff82c7d78, 0 ) :3:hip_device.cpp :473 : 74317059814 us: [pid:419303 tid:0x7cb967986d00] hipGetDevicePropertiesR0600: Returned hipSuccess : MIOpen(HIP): Info [get_device_name] Raw device name: gfx1100 MIOpen(HIP): Info [SetStream] stream: 0x34d94e0, device_id: 0 :3:hip_device_runtime.cpp :623 : 74317059832 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95be1ad50 ) :3:hip_device_runtime.cpp :631 : 74317059836 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem sfmfwd-n_x1c_x5h_x1w_x1n_y1c_y5h_y1w_y1xpk1ypk1a1.000000b0.000000algo0mode1 and algorithm Softmax MIOpen(HIP): Info2 [GetFound1_0] No invokers found for sfmfwd-n_x1c_x5h_x1w_x1n_y1c_y5h_y1w_y1xpk1ypk1a1.000000b0.000000algo0mode1 MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 6.1.40092, MIOpen version 3.1.0.bd953f0e9-dirty MIOpen(HIP): Info2 [SearchForSolutions] Softmax: Success. MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: SoftmaxForward :3:hip_device_runtime.cpp :653 : 74317059942 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice ( 0 ) :3:hip_device_runtime.cpp :657 : 74317059947 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :161 : 74317059955 us: [pid:419303 tid:0x7cb967986d00] hipDeviceGetAttribute ( 0x7ffff82c7904, 63, 0 ) :3:hip_device_runtime.cpp :449 : 74317059959 us: [pid:419303 tid:0x7cb967986d00] hipDeviceGetAttribute: Returned hipSuccess : MIOpen(HIP): Info [IsNetworkedFilesystem] Filesystem type at '/home/pxlth/.cache/miopen/3.1.0.bd953f0e9-dirty' is: 0xef53 'EXT2/3/4_SUPER_MAGIC' MIOpen(HIP): Info2 [GetLibPath] Lib Path: /opt/rocm-6.1.1/lib/libMIOpen.so.1.0.60101 MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file MIOpen(HIP): Info [KernDb] database not present MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /home/pxlth/.cache/miopen/3.1.0.bd953f0e9-dirty/gfx1100_48.ukdb MIOpen(HIP): Trace [Exec] 137135748836608:PRAGMA journal_mode=WAL; MIOpen(HIP): Trace [Exec] 137135748836608:CREATE TABLE IF NOT EXISTS `kern_db` (`id` INTEGER PRIMARY KEY ASC,`kernel_name` TEXT NOT NULL,`kernel_args` TEXT NOT NULL,`kernel_blob` BLOB NOT NULL,`kernel_hash` TEXT NOT NULL,`uncompressed_size` INT NOT NULL);CREATE UNIQUE INDEX IF NOT EXISTS `idx_kern_db` ON kern_db(kernel_name, kernel_args); MIOpen(HIP): Info2 [KernDb] Database created successfully MIOpen(HIP): Trace [Exec] 137135748836608:PRAGMA table_info(kern_db); MIOpen(HIP): Info2 [LoadBinary] Loading binary for: MIOpenSoftmax.cl.o; args: -DNUM_BATCH=64 -DBATCH_SIZE=4 -DU_BATCH_SIZE=2 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DUSE_SOFTMAX_FAST=1 -DUSE_SOFTMAX_MODE_CHANNEL=1 -DRUN_FORWARD=1 -DIS_INPUT_PACKED=1 -DIS_OUTPUT_PACKED=1 -mcpu=gfx1100 MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'MIOpenSoftmax.cl.o') AND (kernel_args = '-DNUM_BATCH=64 -DBATCH_SIZE=4 -DU_BATCH_SIZE=2 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DUSE_SOFTMAX_FAST=1 -DUSE_SOFTMAX_MODE_CHANNEL=1 -DRUN_FORWARD=1 -DIS_INPUT_PACKED=1 -DIS_OUTPUT_PACKED=1 -mcpu=gfx1100'); MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.261839 ms MIOpen(HIP): Info2 [LoadBinary] Successfully loaded binary for: MIOpenSoftmax.cl.o; args: -DNUM_BATCH=64 -DBATCH_SIZE=4 -DU_BATCH_SIZE=2 -DMIOPEN_USE_FP16=1 -DMIOPEN_USE_FP32=0 -DUSE_SOFTMAX_FAST=1 -DUSE_SOFTMAX_MODE_CHANNEL=1 -DRUN_FORWARD=1 -DIS_INPUT_PACKED=1 -DIS_OUTPUT_PACKED=1 -mcpu=gfx1100 :3:hip_module.cpp :58 : 74317060838 us: [pid:419303 tid:0x7cb967986d00] hipModuleLoadData ( 0x7ffff82c7888, 0x3f96160 ) :3:devprogram.cpp :2679: 74317060961 us: [pid:419303 tid:0x7cb967986d00] Using Code Object V5. :3:hip_module.cpp :59 : 74317061669 us: [pid:419303 tid:0x7cb967986d00] hipModuleLoadData: Returned hipSuccess : :3:hip_module.cpp :74 : 74317061679 us: [pid:419303 tid:0x7cb967986d00] hipModuleGetFunction ( 0x7ffff82c7cb0, 0x3896370, SoftmaxForward ) :3:hip_module.cpp :88 : 74317061684 us: [pid:419303 tid:0x7cb967986d00] hipModuleGetFunction: Returned hipSuccess : MIOpen(HIP): Info2 [Register] Invoker registered for algorithm sfmfwd-n_x1c_x5h_x1w_x1n_y1c_y5h_y1w_y1xpk1ypk1a1.000000b0.000000algo0mode1 and solver Softmax MIOpen(HIP): Info2 [SetAsFound1_0] Solver Softmax registered as find 1.0 best for Softmax in sfmfwd-n_x1c_x5h_x1w_x1n_y1c_y5h_y1w_y1xpk1ypk1a1.000000b0.000000algo0mode1 :3:hip_device_runtime.cpp :653 : 74317061707 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice ( 0 ) :3:hip_device_runtime.cpp :657 : 74317061711 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice: Returned hipSuccess : MIOpen(HIP): Info2 [run] kernel_name = SoftmaxForward, global_work_dim = { 256, 1, 1 }, local_work_dim = { 256, 1, 1 } :3:hip_module.cpp :470 : 74317061729 us: [pid:419303 tid:0x7cb967986d00] hipExtModuleLaunchKernel ( 0x0x3f0ad90, 256, 1, 1, 256, 1, 1, 0, stream:0x34d94e0, char array:<null>, 0x7ffff82c7e50, event:0, event:0, 0 ) :4:command.cpp :346 : 74317061737 us: [pid:419303 tid:0x7cb967986d00] Command (KernelExecution) enqueued: 0x33e3d20 :3:rocvirtual.cpp :723 : 74317061742 us: [pid:419303 tid:0x7cb967986d00] Arg0: half* x = ptr:0x7cb76d400000 obj:[0x7cb76d400000-0x7cb76d40000a] :3:rocvirtual.cpp :723 : 74317061745 us: [pid:419303 tid:0x7cb967986d00] Arg1: half* y = ptr:0x7cb76d402000 obj:[0x7cb76d402000-0x7cb76d40200a] :3:rocvirtual.cpp :798 : 74317061749 us: [pid:419303 tid:0x7cb967986d00] Arg2: int vector_size = val:5 :3:rocvirtual.cpp :798 : 74317061754 us: [pid:419303 tid:0x7cb967986d00] Arg3: int grid_size = val:1 :3:rocvirtual.cpp :798 : 74317061758 us: [pid:419303 tid:0x7cb967986d00] Arg4: int spatial_dim = val:1 :3:rocvirtual.cpp :798 : 74317061760 us: [pid:419303 tid:0x7cb967986d00] Arg5: int input_h = val:1 :3:rocvirtual.cpp :798 : 74317061763 us: [pid:419303 tid:0x7cb967986d00] Arg6: int input_w = val:1 :3:rocvirtual.cpp :798 : 74317061765 us: [pid:419303 tid:0x7cb967986d00] Arg7: int in_nstr = val:5 :3:rocvirtual.cpp :798 : 74317061767 us: [pid:419303 tid:0x7cb967986d00] Arg8: int in_cstr = val:1 :3:rocvirtual.cpp :798 : 74317061770 us: [pid:419303 tid:0x7cb967986d00] Arg9: int in_hstr = val:1 :3:rocvirtual.cpp :798 : 74317061774 us: [pid:419303 tid:0x7cb967986d00] Arg10: int out_nstr = val:5 :3:rocvirtual.cpp :798 : 74317061778 us: [pid:419303 tid:0x7cb967986d00] Arg11: int out_cstr = val:1 :3:rocvirtual.cpp :798 : 74317061782 us: [pid:419303 tid:0x7cb967986d00] Arg12: int out_hstr = val:1 :3:rocvirtual.cpp :798 : 74317061786 us: [pid:419303 tid:0x7cb967986d00] Arg13: int x_offset = val:0 :3:rocvirtual.cpp :798 : 74317061790 us: [pid:419303 tid:0x7cb967986d00] Arg14: int y_offset = val:0 :3:rocvirtual.cpp :798 : 74317061794 us: [pid:419303 tid:0x7cb967986d00] Arg15: float alpha = val:1065353216 :3:rocvirtual.cpp :798 : 74317061797 us: [pid:419303 tid:0x7cb967986d00] Arg16: float beta = val:0 :3:rocvirtual.cpp :3016: 74317061800 us: [pid:419303 tid:0x7cb967986d00] ShaderName : SoftmaxForward :4:rocvirtual.cpp :898 : 74317061809 us: [pid:419303 tid:0x7cb967986d00] HWq=0x7cb77c200000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[256, 1, 1], workgroup=[256, 1, 1], private_seg_size=0, group_seg_size=1536, kernel_obj=0x7cb8e32a4940, kernarg_address=0x7cb76e000000, completion_signal=0x0 :3:hip_module.cpp :482 : 74317061813 us: [pid:419303 tid:0x7cb967986d00] hipExtModuleLaunchKernel: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74317061834 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95be1ad60 ) :3:hip_device_runtime.cpp :631 : 74317061838 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :608 : 74317061843 us: [pid:419303 tid:0x7cb967986d00] hipDeviceSynchronize ( ) :4:commandqueue.cpp :150 : 74317061847 us: [pid:419303 tid:0x7cb967986d00] HW Event not ready, awaiting completion instead :4:command.cpp :285 : 74317061852 us: [pid:419303 tid:0x7cb967986d00] Queue marker to command queue: 0x34d94e0 :4:command.cpp :346 : 74317061856 us: [pid:419303 tid:0x7cb967986d00] Command (InternalMarker) enqueued: 0x3080930 :4:rocvirtual.cpp :1071: 74317061862 us: [pid:419303 tid:0x7cb967986d00] HWq=0x7cb77c200000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7cb8861ff680 :4:rocvirtual.cpp :570 : 74317061866 us: [pid:419303 tid:0x7cb967986d00] Host wait on completion_signal=0x7cb8861ff680 :3:rocvirtual.hpp :66 : 74317061870 us: [pid:419303 tid:0x7cb967986d00] Host active wait for Signal = (0x7cb8861ff680) for -1 ns :4:command.cpp :175 : 74317061887 us: [pid:419303 tid:0x7cb967986d00] Command 0x33e3d20 complete :4:command.cpp :169 : 74317061890 us: [pid:419303 tid:0x7cb967986d00] Command 0x3080930 complete (Wall: 74317061889, CPU: 0, GPU: 0 us) :4:command.cpp :249 : 74317061895 us: [pid:419303 tid:0x7cb967986d00] Waiting for event 0x33e3d20 to complete, current status 0 :4:command.cpp :264 : 74317061899 us: [pid:419303 tid:0x7cb967986d00] Event 0x33e3d20 wait completed :4:commandqueue.cpp :163 : 74317061902 us: [pid:419303 tid:0x7cb967986d00] All commands finished :3:hip_device_runtime.cpp :611 : 74317061907 us: [pid:419303 tid:0x7cb967986d00] hipDeviceSynchronize: Returned hipSuccess : julia> y2 = AMDGPU.MIOpen.softmax(x2; dims=:); :3:hip_device_runtime.cpp :623 : 74320653959 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d18c0 ) :3:hip_device_runtime.cpp :631 : 74320653971 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_mempool.cpp :70 : 74320653982 us: [pid:419303 tid:0x7cb967986d00] hipMallocAsync ( 0x7cb95d3d18b0, 20, stream:0x34d94e0 ) :4:rocdevice.cpp :2310: 74320653997 us: [pid:419303 tid:0x7cb967986d00] Allocate hsa device memory 0x7cb76d403000, size 0x14 :3:rocdevice.cpp :2349: 74320654001 us: [pid:419303 tid:0x7cb967986d00] device=0x237f2d0, freeMem_ = 0x5fccfffc4 :3:hip_mempool_impl.cpp :213 : 74320654007 us: [pid:419303 tid:0x7cb967986d00] Pool AllocMem: 0x7cb76d403000, 0x38ab4f0 :3:hip_mempool.cpp :89 : 74320654010 us: [pid:419303 tid:0x7cb967986d00] hipMallocAsync: Returned hipSuccess : typeof(y) = ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer} typeof(x) = ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer} :3:hip_device_runtime.cpp :623 : 74320654170 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1a80 ) :3:hip_device_runtime.cpp :631 : 74320654175 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74320654182 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1a90 ) :3:hip_device_runtime.cpp :631 : 74320654186 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74320654194 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1ab0 ) :3:hip_device_runtime.cpp :631 : 74320654198 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74320654202 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1ac0 ) :3:hip_device_runtime.cpp :631 : 74320654205 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : xdesc = AMDGPU.MIOpen.TensorDescriptor(Ptr{AMDGPU.MIOpen.miopenTensorDescriptor} @0x00000000036863d0, AMDGPU.MIOpen.miopenFloat) ydesc = AMDGPU.MIOpen.TensorDescriptor(Ptr{AMDGPU.MIOpen.miopenTensorDescriptor} @0x000000000351ef80, AMDGPU.MIOpen.miopenFloat) :3:hip_device_runtime.cpp :623 : 74320654926 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1c00 ) :3:hip_device_runtime.cpp :631 : 74320654931 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem sfmfwd-n_x1c_x5h_x1w_x1n_y1c_y5h_y1w_y1xpk1ypk1a1.000000b0.000000algo0mode1 and algorithm Softmax :3:hip_device_runtime.cpp :653 : 74320654959 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice ( 0 ) :3:hip_device_runtime.cpp :657 : 74320654963 us: [pid:419303 tid:0x7cb967986d00] hipSetDevice: Returned hipSuccess : MIOpen(HIP): Info2 [run] kernel_name = SoftmaxForward, global_work_dim = { 256, 1, 1 }, local_work_dim = { 256, 1, 1 } :3:hip_module.cpp :470 : 74320654982 us: [pid:419303 tid:0x7cb967986d00] hipExtModuleLaunchKernel ( 0x0x3f0ad90, 256, 1, 1, 256, 1, 1, 0, stream:0x34d94e0, char array:<null>, 0x7ffff82c7e50, event:0, event:0, 0 ) :4:command.cpp :346 : 74320654991 us: [pid:419303 tid:0x7cb967986d00] Command (KernelExecution) enqueued: 0x2b63750 :3:rocvirtual.cpp :723 : 74320654996 us: [pid:419303 tid:0x7cb967986d00] Arg0: half* x = ptr:0x7cb76d401000 obj:[0x7cb76d401000-0x7cb76d401014] :3:rocvirtual.cpp :723 : 74320655001 us: [pid:419303 tid:0x7cb967986d00] Arg1: half* y = ptr:0x7cb76d403000 obj:[0x7cb76d403000-0x7cb76d403014] :3:rocvirtual.cpp :798 : 74320655004 us: [pid:419303 tid:0x7cb967986d00] Arg2: int vector_size = val:5 :3:rocvirtual.cpp :798 : 74320655008 us: [pid:419303 tid:0x7cb967986d00] Arg3: int grid_size = val:1 :3:rocvirtual.cpp :798 : 74320655012 us: [pid:419303 tid:0x7cb967986d00] Arg4: int spatial_dim = val:1 :3:rocvirtual.cpp :798 : 74320655016 us: [pid:419303 tid:0x7cb967986d00] Arg5: int input_h = val:1 :3:rocvirtual.cpp :798 : 74320655020 us: [pid:419303 tid:0x7cb967986d00] Arg6: int input_w = val:1 :3:rocvirtual.cpp :798 : 74320655025 us: [pid:419303 tid:0x7cb967986d00] Arg7: int in_nstr = val:5 :3:rocvirtual.cpp :798 : 74320655029 us: [pid:419303 tid:0x7cb967986d00] Arg8: int in_cstr = val:1 :3:rocvirtual.cpp :798 : 74320655032 us: [pid:419303 tid:0x7cb967986d00] Arg9: int in_hstr = val:1 :3:rocvirtual.cpp :798 : 74320655036 us: [pid:419303 tid:0x7cb967986d00] Arg10: int out_nstr = val:5 :3:rocvirtual.cpp :798 : 74320655041 us: [pid:419303 tid:0x7cb967986d00] Arg11: int out_cstr = val:1 :3:rocvirtual.cpp :798 : 74320655044 us: [pid:419303 tid:0x7cb967986d00] Arg12: int out_hstr = val:1 :3:rocvirtual.cpp :798 : 74320655049 us: [pid:419303 tid:0x7cb967986d00] Arg13: int x_offset = val:0 :3:rocvirtual.cpp :798 : 74320655052 us: [pid:419303 tid:0x7cb967986d00] Arg14: int y_offset = val:0 :3:rocvirtual.cpp :798 : 74320655056 us: [pid:419303 tid:0x7cb967986d00] Arg15: float alpha = val:1065353216 :3:rocvirtual.cpp :798 : 74320655059 us: [pid:419303 tid:0x7cb967986d00] Arg16: float beta = val:0 :3:rocvirtual.cpp :3016: 74320655062 us: [pid:419303 tid:0x7cb967986d00] ShaderName : SoftmaxForward :4:rocvirtual.cpp :898 : 74320655070 us: [pid:419303 tid:0x7cb967986d00] HWq=0x7cb77c200000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[256, 1, 1], workgroup=[256, 1, 1], private_seg_size=0, group_seg_size=1536, kernel_obj=0x7cb8e32a4940, kernarg_address=0x7cb76e000000, completion_signal=0x0 :3:hip_module.cpp :482 : 74320655077 us: [pid:419303 tid:0x7cb967986d00] hipExtModuleLaunchKernel: Returned hipSuccess : :3:hip_device_runtime.cpp :623 : 74320655090 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice ( 0x7cb95d3d1c10 ) :3:hip_device_runtime.cpp :631 : 74320655094 us: [pid:419303 tid:0x7cb967986d00] hipGetDevice: Returned hipSuccess : :3:hip_device_runtime.cpp :608 : 74320655099 us: [pid:419303 tid:0x7cb967986d00] hipDeviceSynchronize ( ) :4:commandqueue.cpp :150 : 74320655104 us: [pid:419303 tid:0x7cb967986d00] HW Event not ready, awaiting completion instead :4:command.cpp :285 : 74320655108 us: [pid:419303 tid:0x7cb967986d00] Queue marker to command queue: 0x34d94e0 :4:command.cpp :346 : 74320655112 us: [pid:419303 tid:0x7cb967986d00] Command (InternalMarker) enqueued: 0x3082900 :4:rocvirtual.cpp :1071: 74320655119 us: [pid:419303 tid:0x7cb967986d00] HWq=0x7cb77c200000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7cb8861ff600 :4:rocvirtual.cpp :570 : 74320655123 us: [pid:419303 tid:0x7cb967986d00] Host wait on completion_signal=0x7cb8861ff600 :3:rocvirtual.hpp :66 : 74320655127 us: [pid:419303 tid:0x7cb967986d00] Host active wait for Signal = (0x7cb8861ff600) for -1 ns :4:command.cpp :175 : 74320655543 us: [pid:419303 tid:0x7cb967986d00] Command 0x2b63750 complete :4:command.cpp :169 : 74320655547 us: [pid:419303 tid:0x7cb967986d00] Command 0x3082900 complete (Wall: 74320655546, CPU: 0, GPU: 0 us) :4:command.cpp :249 : 74320655551 us: [pid:419303 tid:0x7cb967986d00] Waiting for event 0x2b63750 to complete, current status 0 :4:command.cpp :264 : 74320655555 us: [pid:419303 tid:0x7cb967986d00] Event 0x2b63750 wait completed :4:commandqueue.cpp :163 : 74320655559 us: [pid:419303 tid:0x7cb967986d00] All commands finished :3:hip_device_runtime.cpp :611 : 74320655564 us: [pid:419303 tid:0x7cb967986d00] hipDeviceSynchronize: Returned hipSuccess :
The text was updated successfully, but these errors were encountered:
use datatype for softmax problem descriptor, fix #2966
953a4b9
Fix multiple softmax issues (#2992)
6a46e1d
* use datatype for softmax problem descriptor, fix #2966 * use strides for softmax problem descriptor, fix #2813
CAHEK7
Successfully merging a pull request may close this issue.
Hi. I'm having an issue with softmax when computing it first for FP16 and then for FP32 arrays (or vice versa).
If I first compute softmax for FP16, then when I invoke it for FP32 it passes the arguments as if they were FP16 thus producing invalid results.
Here's MWE (for Julia language) with
MIOPEN_LOG_LEVEL=7 AMD_LOG_LEVEL=7
logs.As can be seen, when invoking softmax for FP32 even though the tensor descriptor shows dtype as FP32, the
Arg 0
&Arg 1
are displayed ashalf*
.Same happens when first computing FP32 and then FP16, but in this case if uses
float*
for FP16.I'm using ROCm 6.1.1. Could it be because problem descriptor does not take into account the dtype and thus returns FP16 kernel for FP32 input?
The text was updated successfully, but these errors were encountered: