Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSE3 and fp16 conversion lookup table #368

Merged
merged 13 commits into from
Jan 6, 2023

Conversation

abitofevrything
Copy link
Contributor

Adds SSE3 support for SIMD and support for using Imath for fp16-fp32 conversions. Imath can be faster on systems where whisper.cpp doesn't already have a native method for doing the conversion as it uses a lookup table, leading to an ~3.5x speed increase on my system.

@abitofevrything abitofevrything marked this pull request as draft January 3, 2023 21:15
@abitofevrything
Copy link
Contributor Author

Drafting as I am unsure what value to put for GGML_F32_STEP and GGML_F16_STEP - guidance on this would be appreciated.

@abitofevrything
Copy link
Contributor Author

A quick test seems to show that 32 leads to better performance than 16 or 64

@ggerganov
Copy link
Owner

A quick test seems to show that 32 leads to better performance than 16 or 64

Yes, that's what I do - trial and error to find the best value :)

This is a great contribution.
Before merging, I would like to avoid the Imath dependency.
We can simply generate a lookup table in ggml.c and use it instead of relying on Imath.
Take a look at the existing lookup tables for gelu and exp:

whisper.cpp/ggml.c

Lines 246 to 250 in a0d4f8e

// precomputed gelu table for f16 (128 KB)
static ggml_fp16_t table_gelu_f16[1 << 16];
// precomputed exp table for f16 (128 KB)
static ggml_fp16_t table_exp_f16[1 << 16];

I'm very curious to see if this F16 LUT will speed-up the WASM examples, because WASM does not have an intrinsic for FP16 <-> FP32 conversion so it fallbacks to the naive conversion method.

@abitofevrything
Copy link
Contributor Author

Leaving as a draft for now as I want to see if I can get rid of some of the memcpy calls in the ggml_lookup_fp16_to_fo32 function.

A review would be appreciated as I am almost done with this though.

@abitofevrything abitofevrything changed the title Add SSE3 and Imath support Add SSE3 and fp16 conversion lookup table Jan 6, 2023
@abitofevrything
Copy link
Contributor Author

Turns out the memcpy calls are optimised out by the compiler anyways :) Marking this as ready.

@abitofevrything abitofevrything marked this pull request as ready for review January 6, 2023 12:34
@ggerganov
Copy link
Owner

@abitofevrything
Good news! As expected, the lookup table improves the WASM performance.
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

@ggerganov ggerganov merged commit a62170c into ggerganov:master Jan 6, 2023
rock3125 pushed a commit to rock3125/whisper.cpp that referenced this pull request Feb 21, 2023
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request Apr 28, 2023
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request Oct 24, 2023
* Improves WASM performance:
  On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome

* Add support for SSE3 SIMD

* Add SSE3 to system information

* Add Imath support for fp16-fp32 conversions

* Add Imath to system information

* Wrap Imath calls to avoid static function warnings

* Drop Imath; Add lookup table for f16 -> f32 conversions

* Remove TODO comments

* Update SSE3 to new macro arguments

* Correct updated macro definitions

* Prefer static inline where possible

* ggml : static inlines + add public f16 <-> f32 conversions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants