-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Streaming zipformer #267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks great! Just left some minor comments.
@@ -68,7 +68,19 @@ To use fast_beam_search with an LG, use | |||
foo.wav \ | |||
bar.wav | |||
|
|||
(4) To decode wav.scp | |||
(4) To use an streaming Zipformer model for recognition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(4) To use an streaming Zipformer model for recognition | |
(4) To use a streaming Zipformer model for recognition |
int32_t chunk_size = config.chunk_size; | ||
// It is used after feature embedding, that does (T-7)//2 | ||
int32_t model_chunk_size = encoder.attr("decode_chunk_size").toInt(); | ||
SHERPA_CHECK_EQ(chunk_size / 2, model_chunk_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can get model_chunk_size
from the model, is it still necessary to require the user to specify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to ensure the given chunk_size is equal to the exported one. I will update the code in streaming zipformer. It supports using different chunk size.
} | ||
|
||
int32_t num_encoders = num_elements / 7; | ||
int32_t batch_size = static_cast<const torch::Tensor &>(states[0]).size(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t batch_size = static_cast<const torch::Tensor &>(states[0]).size(1); | |
int32_t batch_size = states[0].size(1); |
@@ -20,6 +20,49 @@ This sections lists models trained using `icefall`_. | |||
English | |||
^^^^^^^ | |||
|
|||
icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update
https://github.com/k2-fsa/sherpa/blob/master/.github/scripts/run-online-transducer.sh
@@ -68,7 +68,19 @@ To use fast_beam_search with an LG, use | |||
foo.wav \ | |||
bar.wav | |||
|
|||
(4) To decode wav.scp | |||
(4) To use an streaming Zipformer model for recognition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update the comment in
https://github.com/k2-fsa/sherpa/blob/master/sherpa/cpp_api/bin/online-recognizer-microphone.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
git lfs pull --include "exp/decoder_jit_trace.pt" | ||
git lfs pull --include "exp/joiner_jit_trace.pt" | ||
git lfs pull --include "data/lang_bpe_500/LG.pt" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
popd |
sherpa/cpp_api/online-recognizer.cc
Outdated
} | ||
} | ||
if (!is_supported) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!is_supported) { | |
if (!model_) { |
and remove is_supported
.
This PR adds streaming zipformer (see k2-fsa/icefall#787) as an online model.