Use Emformer as RNN-T encoder. #278

csukuangfj · 2022-03-30T09:16:16Z

We import Emformer from torchaudio directly at present and use it to replace the Conformer model for the current RNN-T setup.

csukuangfj · 2022-04-01T11:47:33Z

Here is a recording about the streaming decoding using RNN-T greedy search with Emformer.

Screen.Recording.2022-04-01.at.7.43.04.PM.mov

The number of model parameters is: 65390556

The WERs using --epoch 29 --avg 6 --decoding_method modified_beam_search are 4.58/11.54 (test-clean/test-other),
which is very close to the one listed in https://github.com/pytorch/audio/tree/main/examples/asr/emformer_rnnt, i.e., 4.56 for test-clean (note that one is trained for 120 epochs using 32 GPUs, while this one is trained using 8 GPUs).

pzelasko · 2022-04-01T12:48:30Z

Great news!

csukuangfj · 2022-04-02T05:37:27Z

I am merging it to the streaming branch first.

@yaozengwei will work on the streaming decoding part with beam search (--max-sym-per-frame=1)

csukuangfj added 5 commits March 30, 2022 16:09

Add emformer model.

b4c7a27

Copy files.

e867a62

Use Emformer model as RNN-T encoder.

5728a44

Support streaming decoding.

6f64a0e

Minor fixes.

3479af0

Add RNN-T Emformer for Aishell.

128c9db

csukuangfj changed the base branch from master to streaming April 2, 2022 05:35

csukuangfj merged commit 189ca55 into k2-fsa:streaming Apr 2, 2022

funboarder13920 mentioned this pull request May 24, 2022

[Streaming] Reproducing librispeech results - RNN-T emformer #383

Open

csukuangfj deleted the rnnt-emformer branch May 28, 2022 22:01

yaozengwei mentioned this pull request May 31, 2022

Emformer with conv module and scaling mechanism #389

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Emformer as RNN-T encoder. #278

Use Emformer as RNN-T encoder. #278

csukuangfj commented Mar 30, 2022

csukuangfj commented Apr 1, 2022 •

edited

Loading

pzelasko commented Apr 1, 2022

csukuangfj commented Apr 2, 2022

Use Emformer as RNN-T encoder. #278

Use Emformer as RNN-T encoder. #278

Conversation

csukuangfj commented Mar 30, 2022

csukuangfj commented Apr 1, 2022 • edited Loading

pzelasko commented Apr 1, 2022

csukuangfj commented Apr 2, 2022

csukuangfj commented Apr 1, 2022 •

edited

Loading