Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Streaming zipformer #267

Merged
merged 11 commits into from
Jan 6, 2023
Merged

Conversation

yaozengwei
Copy link
Collaborator

This PR adds streaming zipformer (see k2-fsa/icefall#787) as an online model.

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great! Just left some minor comments.

@@ -68,7 +68,19 @@ To use fast_beam_search with an LG, use
foo.wav \
bar.wav

(4) To decode wav.scp
(4) To use an streaming Zipformer model for recognition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(4) To use an streaming Zipformer model for recognition
(4) To use a streaming Zipformer model for recognition

int32_t chunk_size = config.chunk_size;
// It is used after feature embedding, that does (T-7)//2
int32_t model_chunk_size = encoder.attr("decode_chunk_size").toInt();
SHERPA_CHECK_EQ(chunk_size / 2, model_chunk_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can get model_chunk_size from the model, is it still necessary to require the user to specify it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to ensure the given chunk_size is equal to the exported one. I will update the code in streaming zipformer. It supports using different chunk size.

}

int32_t num_encoders = num_elements / 7;
int32_t batch_size = static_cast<const torch::Tensor &>(states[0]).size(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int32_t batch_size = static_cast<const torch::Tensor &>(states[0]).size(1);
int32_t batch_size = states[0].size(1);

@@ -20,6 +20,49 @@ This sections lists models trained using `icefall`_.
English
^^^^^^^

icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -68,7 +68,19 @@ To use fast_beam_search with an LG, use
foo.wav \
bar.wav

(4) To decode wav.scp
(4) To use an streaming Zipformer model for recognition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

git lfs pull --include "exp/decoder_jit_trace.pt"
git lfs pull --include "exp/joiner_jit_trace.pt"
git lfs pull --include "data/lang_bpe_500/LG.pt"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
popd

}
}
if (!is_supported) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!is_supported) {
if (!model_) {

and remove is_supported.

@yaozengwei yaozengwei added ready and removed ready labels Jan 6, 2023
@yaozengwei yaozengwei added ready and removed ready labels Jan 6, 2023
@yaozengwei yaozengwei merged commit f59887b into k2-fsa:master Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants