Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Emformer as RNN-T encoder. #278

Merged
merged 6 commits into from
Apr 2, 2022

Conversation

csukuangfj
Copy link
Collaborator

We import Emformer from torchaudio directly at present and use it to replace the Conformer model for the current RNN-T setup.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Apr 1, 2022

Here is a recording about the streaming decoding using RNN-T greedy search with Emformer.

Screen.Recording.2022-04-01.at.7.43.04.PM.mov

The number of model parameters is: 65390556

The WERs using --epoch 29 --avg 6 --decoding_method modified_beam_search are 4.58/11.54 (test-clean/test-other),
which is very close to the one listed in https://github.com/pytorch/audio/tree/main/examples/asr/emformer_rnnt, i.e., 4.56 for test-clean (note that one is trained for 120 epochs using 32 GPUs, while this one is trained using 8 GPUs).

@pzelasko
Copy link
Collaborator

pzelasko commented Apr 1, 2022

Great news!

@csukuangfj csukuangfj changed the base branch from master to streaming April 2, 2022 05:35
@csukuangfj
Copy link
Collaborator Author

I am merging it to the streaming branch first.

@yaozengwei will work on the streaming decoding part with beam search (--max-sym-per-frame=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants