Enable POWER9 fp32 and fp16 SIMD code #366

fitzsim · 2023-01-03T06:43:20Z

With the FP32 base model and this patch set, the jfk example takes about 3.2 seconds to transcribe. This is another data point for #300, and it is about one second faster than the current FP16 SIMD code.

ggerganov · 2023-01-03T20:05:58Z

ggml.c

+#define GGML_F32x4_REDUCE(sumf, sum)                            \
+  sum[0] = vec_add(sum[0], sum[1]);                             \
+  sum[2] = vec_add(sum[2], sum[3]);                             \
+  sum[4] = vec_add(sum[4], sum[5]);                             \
+  sum[6] = vec_add(sum[6], sum[7]);                             \
+  sum[0] = vec_add(sum[0], sum[2]);                             \
+  sum[4] = vec_add(sum[4], sum[6]);                             \
+  sum[0] = vec_add(sum[0], sum[4]);                             \
+  sumf = vec_extract(sum[0], 0) + vec_extract(sum[0], 1)        \
+    + vec_extract(sum[0], 2) + vec_extract(sum[0], 3);


Is there a reason to use this version instead of the for-based version?
The advantage of the latter is that it will work for GGML_F32_ARR == 1, 2, 4, 8, 16, while doing it like this it will only work for GGML_F32_ARR == 8

fitzsim · 2023-01-03T22:34:46Z

I'll try reverting that and compare speed. Also, I think I can get rid of the load/store argument changes, but it'll take some more work. And the F32 implementation seems to still use some F16 operations, so I'll investigate that. I'll make a new pull request for all this later. For now I'll close this one.

fitzsim added 3 commits January 3, 2023 00:48

ggml : change GGML_F16_VEC_LOAD, GGML_F16_VEC_STORE arguments

63cf29c

ggml : macroize POWER9 ppc64le fp16 SIMD code

cdbe556

ggml : enable f32 SIMD for POWER9 ppc64le

eb87ee5

fitzsim mentioned this pull request Jan 3, 2023

Very slow - any way to speed up? #300

Closed

ggerganov reviewed Jan 3, 2023

View reviewed changes

fitzsim closed this Jan 3, 2023

fitzsim mentioned this pull request Jan 4, 2023

Reorganize POWER9 SIMD code #369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable POWER9 fp32 and fp16 SIMD code #366

Enable POWER9 fp32 and fp16 SIMD code #366

fitzsim commented Jan 3, 2023

ggerganov Jan 3, 2023

fitzsim commented Jan 3, 2023

Enable POWER9 fp32 and fp16 SIMD code #366

Enable POWER9 fp32 and fp16 SIMD code #366

Conversation

fitzsim commented Jan 3, 2023

ggerganov Jan 3, 2023

Choose a reason for hiding this comment

fitzsim commented Jan 3, 2023