[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Interpause · 2023-04-16T13:40:25Z

This is just an idea for you. Most modern smartphones come with some form of AI accelerator. I am aware GGML-based projects like llama.cpp can compile and run on mobile devices, but there is probably performance left on the table. I think there is right now a gap for an mobile-optimized AI inference library with quantization support and the other tricks present in GGML. For reference: https://developer.android.com/ndk/guides/neuralnetworks

Saghetti0 · 2023-11-15T03:09:49Z

Would love to see this as well!

ggerganov · 2023-11-19T16:49:41Z

If there is community help, we can try to add support for NNAPI. Currently, I don't have enough capacity to investigate this, but I think it is something interesting and can unlock many applications. Probably will look into this in the future and hoping there are some contributions in the meantime

rhjdvsgsgks · 2023-11-27T17:50:31Z

im trying to write a nnapi backend (well, you should not expect my work. because im a completely newbie. and mostly wont have any success). but after some document reading. i found that unlike cl or vk, nnapi didn't provide a way to use accelerated matrix multiply or some shader like stuff to compute something in gpu. the only things you can do with it is upload a graph of how layers connected (include operand and weight). so seems like it not very match the architecture llama.cpp current have? if no, please point me a backend using similar architecture so that i can have reference

pax-k · 2024-03-27T15:50:34Z

@ggerganov maybe it's worth checking NNAPI using ONNX runtime? WhisperRN runs smooth with CoreML, but on Android, even the tiny model is way too laggy to be usable on a budget device (for example Samsung a14, 4 GB RAM)

Binozo · 2024-12-28T15:54:31Z

@pax-k how do you define "laggy"? I am also investigating the performance on the Android side. My Samsung S22 is capable to transcribe a 30 second german voice message in about 3 seconds with the small whisper model.

I am also looking forward for the future because I am 100% sure the great Ai focus at Google will improve the Ai hardware for the next generations of Android phones.
It's better to be ready than being left behind.

With running in profile mode I could reduce the inference time by almost a second. I think this is acceptable.
(Running Flutter + onnxruntime + nnapi/xnnpack + 4 Thread processing)

Interpause changed the title ~~Use Android NNAPI to accelerate inference on Android Devices~~ [Idea]: Use Android NNAPI to accelerate inference on Android Devices Apr 16, 2023

Digipom mentioned this issue Sep 5, 2023

whisper.cpp should support NNAPI on Android ggerganov/whisper.cpp#1249

Open

zhouwg mentioned this issue Mar 4, 2024

PoC:clean-room implementation of real-time AI subtitle for English online-TV(OTT TV) kantv-ai/kantv#64

Closed

shubham0204 mentioned this issue Sep 19, 2024

Support for accelerated inference in Android shubham0204/CLIP-Android#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Interpause commented Apr 16, 2023

Saghetti0 commented Nov 15, 2023

ggerganov commented Nov 19, 2023

rhjdvsgsgks commented Nov 27, 2023

pax-k commented Mar 27, 2024

Binozo commented Dec 28, 2024 •

edited

Loading

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

[Idea]: Use Android NNAPI to accelerate inference on Android Devices #88

Comments

Interpause commented Apr 16, 2023

Saghetti0 commented Nov 15, 2023

ggerganov commented Nov 19, 2023

rhjdvsgsgks commented Nov 27, 2023

pax-k commented Mar 27, 2024

Binozo commented Dec 28, 2024 • edited Loading

Binozo commented Dec 28, 2024 •

edited

Loading