Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

johnplatts · 2025-03-21T18:33:58Z

There are some ISA's that have hardware FMA instructions that are guaranteed to be available if hardware floating-point support is present, including PPC, RISC-V (on CPU's implementing the "F" and "D" extension), NVidia GPU's, and AMD GPU's.

GCC/Clang also has the __builtin_fma and __builtin_fmaf builtins that are guaranteed to be compiled down to a single FMA instruction on ISA's with hardware floating point that can carry out FMA using a single instruction, even with optimizations disabled (-O0).

There are use cases for implementing MulAdd/NegMulAdd/MulSub/NegMulSub using __builtin_fma and __builtin_fmaf for SCALAR/EMU128 on RISC-V CPU's that have the RISC-V F and D extensions (but might not necessarily support the "V" SIMD extension), NVidia GPU's, and AMD GPU's.

jan-wassenberg · 2025-03-21T19:06:28Z

hm, personally, I would think it is much more interesting to use the vector form of FMA instructions, as opposed to just the scalar fmaf().

We could add such a scalar form if there is someone saying they would use it?

FYI #2536 was a small part of something where they just wanted convenient access to a float16 type via the Highway headers.

johnplatts mentioned this issue Mar 21, 2025

Enable FP16 when compiling for IMG GPUs #2536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

johnplatts commented Mar 21, 2025

jan-wassenberg commented Mar 21, 2025

Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

Comments

johnplatts commented Mar 21, 2025

jan-wassenberg commented Mar 21, 2025