Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should FMA optimizations be implemented for SCALAR/EMU128 on PPC/RISC-V/GPU? #2542

Open
johnplatts opened this issue Mar 21, 2025 · 1 comment

Comments

@johnplatts
Copy link
Contributor

There are some ISA's that have hardware FMA instructions that are guaranteed to be available if hardware floating-point support is present, including PPC, RISC-V (on CPU's implementing the "F" and "D" extension), NVidia GPU's, and AMD GPU's.

GCC/Clang also has the __builtin_fma and __builtin_fmaf builtins that are guaranteed to be compiled down to a single FMA instruction on ISA's with hardware floating point that can carry out FMA using a single instruction, even with optimizations disabled (-O0).

There are use cases for implementing MulAdd/NegMulAdd/MulSub/NegMulSub using __builtin_fma and __builtin_fmaf for SCALAR/EMU128 on RISC-V CPU's that have the RISC-V F and D extensions (but might not necessarily support the "V" SIMD extension), NVidia GPU's, and AMD GPU's.

@jan-wassenberg
Copy link
Member

hm, personally, I would think it is much more interesting to use the vector form of FMA instructions, as opposed to just the scalar fmaf().

We could add such a scalar form if there is someone saying they would use it?

FYI #2536 was a small part of something where they just wanted convenient access to a float16 type via the Highway headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants