Limit the number of symbols per frame in RNN-T decoding. #151

csukuangfj · 2021-12-18T02:21:45Z

Fix the issue mentioned in #143 (comment)

For a model trained using

./transducer_stateless/train.py \
  --world-size 1 \
  --num-epochs 20 \
  --start-epoch 0 \
  --exp-dir transducer_stateless/exp \
  --full-libri 0 \
  --max-duration 250 \
  --lr-factor 3

When it is decoded with

./transducer_stateless/decode.py --epoch 5 --avg 3 --exp-dir ./transducer_stateless/exp --max-duration 100

Before this PR

The decoding log is

2021-12-18 09:58:43,173 INFO [decode.py:316] batch 0/?, cuts processed until now is 20
2021-12-18 09:59:41,255 INFO [decode.py:316] batch 100/?, cuts processed until now is 1406
2021-12-18 10:00:37,971 INFO [decode.py:316] batch 200/?, cuts processed until now is 2563
2021-12-18 10:00:43,287 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-clean-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:00:43,382 INFO [utils.py:404] [test-clean-greedy_search] %WER 57.28% [30118 / 52576, 22560 ins, 1855 del, 5703 sub ]
2021-12-18 10:00:43,862 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-clean-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:00:43,872 INFO [decode.py:363]
For test-clean, WER of different settings are:
greedy_search   57.28   best for test-clean

2021-12-18 10:00:45,167 INFO [decode.py:316] batch 0/?, cuts processed until now is 23
2021-12-18 10:01:40,751 INFO [decode.py:316] batch 100/?, cuts processed until now is 1614
2021-12-18 10:02:35,445 INFO [decode.py:316] batch 200/?, cuts processed until now is 2899
2021-12-18 10:02:38,633 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-other-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:02:38,737 INFO [utils.py:404] [test-other-greedy_search] %WER 69.45% [36351 / 52343, 20869 ins, 3021 del, 12461 sub ]
2021-12-18 10:02:39,317 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-other-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:02:39,327 INFO [decode.py:363]
For test-other, WER of different settings are:
greedy_search   69.45   best for test-other

2021-12-18 10:02:39,327 INFO [decode.py:449] Done!

With this PR

2021-12-18 10:05:38,976 INFO [decode.py:316] batch 0/?, cuts processed until now is 20
2021-12-18 10:06:35,527 INFO [decode.py:316] batch 100/?, cuts processed until now is 1406
2021-12-18 10:07:31,099 INFO [decode.py:316] batch 200/?, cuts processed until now is 2563
2021-12-18 10:07:36,654 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-clean-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:07:36,758 INFO [utils.py:404] [test-clean-greedy_search] %WER 13.06% [6868 / 52576, 766 ins, 672 del, 5430 sub ]
2021-12-18 10:07:37,009 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-clean-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:07:37,023 INFO [decode.py:363]
For test-clean, WER of different settings are:
greedy_search   13.06   best for test-clean

2021-12-18 10:07:38,123 INFO [decode.py:316] batch 0/?, cuts processed until now is 23
2021-12-18 10:08:34,661 INFO [decode.py:316] batch 100/?, cuts processed until now is 1614
2021-12-18 10:09:30,827 INFO [decode.py:316] batch 200/?, cuts processed until now is 2899
2021-12-18 10:09:34,202 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-other-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:09:34,298 INFO [utils.py:404] [test-other-greedy_search] %WER 30.69% [16065 / 52343, 1643 ins, 1879 del, 12543 sub ]
2021-12-18 10:09:34,579 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-other-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:09:34,593 INFO [decode.py:363]
For test-other, WER of different settings are:
greedy_search   30.69   best for test-other

2021-12-18 10:09:34,593 INFO [decode.py:449] Done!

You can see that limiting the number of symbols per frame helps to mitigate the issue as the insertion errors are reduced significantly.

csukuangfj · 2021-12-18T02:24:44Z

Note: I only change the greedy search. The beam search is too slow when the decoder is RNN. Will update beam search when we remove the recurrent connections from the decoder.

danpovey · 2021-12-18T05:36:29Z

Cool!
Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

csukuangfj · 2021-12-19T13:14:35Z

Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

I guess you are meaning the decoding results for the stateless decoder, right?

As for the change by limiting the number of symbols per frame, I just verified that it does not affect the model trained using the code from the master.

danpovey · 2021-12-20T03:21:26Z

Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

I guess you are meaning the decoding results for the stateless decoder, right?

Hm, I think I meant the LSTM-based decoder, whether having a max symbols per frame would make a difference (the real question is: at what max-symbols-per-frame does it start to impact results, and how?)... but really the question is more general, for whatever kinds of models we currently favor. My plan/hope for FST-based decoding was to have the decoding algorithm take a fixed number of steps per frame.

As for the change by limiting the number of symbols per frame, I just verified that it does not affect the model trained using the code from the master.

OK cool. But it must affect results at some value...

Limit the number of symbols per frame in RNN-T decoding.

f8d02d6

csukuangfj merged commit cb04c8a into k2-fsa:master Dec 18, 2021

csukuangfj deleted the fix-greedy-search branch December 18, 2021 03:00

csukuangfj mentioned this pull request Dec 18, 2021

RNN-T Conformer training for LibriSpeech #143

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the number of symbols per frame in RNN-T decoding. #151

Limit the number of symbols per frame in RNN-T decoding. #151

csukuangfj commented Dec 18, 2021

csukuangfj commented Dec 18, 2021

danpovey commented Dec 18, 2021

csukuangfj commented Dec 19, 2021

danpovey commented Dec 20, 2021 •

edited

Loading

Limit the number of symbols per frame in RNN-T decoding. #151

Limit the number of symbols per frame in RNN-T decoding. #151

Conversation

csukuangfj commented Dec 18, 2021

Before this PR

With this PR

csukuangfj commented Dec 18, 2021

danpovey commented Dec 18, 2021

csukuangfj commented Dec 19, 2021

danpovey commented Dec 20, 2021 • edited Loading

danpovey commented Dec 20, 2021 •

edited

Loading