Skip to content

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

License

Notifications You must be signed in to change notification settings

k2-fsa/sherpa-onnx

Folders and files

NameName
Last commit message
Last commit date
Jul 12, 2024
Jun 22, 2024
Jul 6, 2024
Jul 12, 2024
Jul 12, 2024
Jul 10, 2024
Jul 6, 2024
Jul 12, 2024
Jul 12, 2024
Jul 10, 2024
Jul 11, 2024
Jul 11, 2024
Jul 10, 2024
Jul 10, 2024
Jul 10, 2024
Jul 12, 2024
Jul 10, 2024
Jul 10, 2024
Jul 12, 2024
Jul 12, 2024
Jul 12, 2024
Feb 25, 2024
Jun 18, 2024
Apr 15, 2023
Jun 19, 2024
Mar 29, 2023
Jul 10, 2024
Jul 12, 2024
Jul 12, 2024
Jul 13, 2023
Feb 22, 2023
Mar 4, 2024
Jul 8, 2024
Jun 23, 2024
Jul 10, 2024
Jul 10, 2024
Jul 10, 2024
Jul 10, 2024
Jun 23, 2024
Jul 11, 2024
Jul 11, 2024
Jul 11, 2024
Jun 23, 2024
Jun 12, 2024
Feb 23, 2024
Mar 18, 2024
Apr 5, 2024
Feb 23, 2024
Jul 15, 2023
Mar 24, 2024

Repository files navigation

Supported functions

Speech recognition Speech synthesis Speaker verification Speaker identification
✔️ ✔️ ✔️ ✔️
Spoken Language identification Audio tagging Voice activity detection Keyword spotting
✔️ ✔️ ✔️ ✔️

Supported platforms

Architecture Android iOS Windows macOS linux
x64 ✔️ ✔️ ✔️ ✔️
x86 ✔️ ✔️
arm64 ✔️ ✔️ ✔️ ✔️ ✔️
arm32 ✔️ ✔️
riscv64 ✔️

Supported programming languages

C++ C Python C# Java JavaScript Kotlin Swift Go Dart
✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

  • Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
  • Text-to-speech (i.e., TTS)
  • Speaker identification
  • Speaker verification
  • Spoken language identification
  • Audio tagging
  • VAD (e.g., silero-vad)
  • Keyword spotting

on the following platforms and operating systems:

with the following APIs

  • C++, C, Python, Go, C#
  • Java, Kotlin, JavaScript
  • Swift
  • Dart

Links for pre-built Android APKs

Description URL 中国用户
Streaming speech recognition Address 点此
Text-to-speech Address 点此
Voice activity detection (VAD) Address 点此
VAD + non-streaming speech recognition Address 点此
Two-pass speech recognition Address 点此
Audio tagging Address 点此
Audio tagging (WearOS) Address 点此
Speaker identification Address 点此
Spoken language identification Address 点此
Keyword spotting Address 点此

Links for pre-built Flutter APPs

Real-time speech recognition

Description URL 中国用户
Streaming speech recognition Address 点此

Text-to-speech

Description URL 中国用户
Android (arm64-v8a, armeabi-v7a, x86_64) Address 点此
Linux (x64) Address 点此
macOS (x64) Address 点此
macOS (arm64) Address 点此
Windows (x64) Address 点此

Note: You need to build from source for iOS.

Links for pre-trained models

Description URL
Speech recognition (speech to text, ASR) Address
Text-to-speech (TTS) Address
VAD Address
Keyword spotting Address
Audio tagging Address
Speaker identification (Speaker ID) Address
Spoken language identification (Language ID) See multi-lingual Whisper ASR models from Speech recognition
Punctuation Address

Useful links

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.