Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of symbols per frame in RNN-T decoding. #151

Merged
merged 1 commit into from
Dec 18, 2021

Conversation

csukuangfj
Copy link
Collaborator

Fix the issue mentioned in #143 (comment)

For a model trained using

./transducer_stateless/train.py \
  --world-size 1 \
  --num-epochs 20 \
  --start-epoch 0 \
  --exp-dir transducer_stateless/exp \
  --full-libri 0 \
  --max-duration 250 \
  --lr-factor 3

When it is decoded with

./transducer_stateless/decode.py --epoch 5 --avg 3 --exp-dir ./transducer_stateless/exp --max-duration 100

Before this PR

The decoding log is

2021-12-18 09:58:43,173 INFO [decode.py:316] batch 0/?, cuts processed until now is 20
2021-12-18 09:59:41,255 INFO [decode.py:316] batch 100/?, cuts processed until now is 1406
2021-12-18 10:00:37,971 INFO [decode.py:316] batch 200/?, cuts processed until now is 2563
2021-12-18 10:00:43,287 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-clean-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:00:43,382 INFO [utils.py:404] [test-clean-greedy_search] %WER 57.28% [30118 / 52576, 22560 ins, 1855 del, 5703 sub ]
2021-12-18 10:00:43,862 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-clean-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:00:43,872 INFO [decode.py:363]
For test-clean, WER of different settings are:
greedy_search   57.28   best for test-clean

2021-12-18 10:00:45,167 INFO [decode.py:316] batch 0/?, cuts processed until now is 23
2021-12-18 10:01:40,751 INFO [decode.py:316] batch 100/?, cuts processed until now is 1614
2021-12-18 10:02:35,445 INFO [decode.py:316] batch 200/?, cuts processed until now is 2899
2021-12-18 10:02:38,633 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-other-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:02:38,737 INFO [utils.py:404] [test-other-greedy_search] %WER 69.45% [36351 / 52343, 20869 ins, 3021 del, 12461 sub ]
2021-12-18 10:02:39,317 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-other-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:02:39,327 INFO [decode.py:363]
For test-other, WER of different settings are:
greedy_search   69.45   best for test-other

2021-12-18 10:02:39,327 INFO [decode.py:449] Done!

With this PR

2021-12-18 10:05:38,976 INFO [decode.py:316] batch 0/?, cuts processed until now is 20
2021-12-18 10:06:35,527 INFO [decode.py:316] batch 100/?, cuts processed until now is 1406
2021-12-18 10:07:31,099 INFO [decode.py:316] batch 200/?, cuts processed until now is 2563
2021-12-18 10:07:36,654 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-clean-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:07:36,758 INFO [utils.py:404] [test-clean-greedy_search] %WER 13.06% [6868 / 52576, 766 ins, 672 del, 5430 sub ]
2021-12-18 10:07:37,009 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-clean-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:07:37,023 INFO [decode.py:363]
For test-clean, WER of different settings are:
greedy_search   13.06   best for test-clean

2021-12-18 10:07:38,123 INFO [decode.py:316] batch 0/?, cuts processed until now is 23
2021-12-18 10:08:34,661 INFO [decode.py:316] batch 100/?, cuts processed until now is 1614
2021-12-18 10:09:30,827 INFO [decode.py:316] batch 200/?, cuts processed until now is 2899
2021-12-18 10:09:34,202 INFO [decode.py:333] The transcripts are stored in transducer_stateless/exp/greedy_search/recogs-test-other-greedy_search-epo
ch-5-avg-3.txt
2021-12-18 10:09:34,298 INFO [utils.py:404] [test-other-greedy_search] %WER 30.69% [16065 / 52343, 1643 ins, 1879 del, 12543 sub ]
2021-12-18 10:09:34,579 INFO [decode.py:346] Wrote detailed error stats to transducer_stateless/exp/greedy_search/errs-test-other-greedy_search-epoch
-5-avg-3.txt
2021-12-18 10:09:34,593 INFO [decode.py:363]
For test-other, WER of different settings are:
greedy_search   30.69   best for test-other

2021-12-18 10:09:34,593 INFO [decode.py:449] Done!

You can see that limiting the number of symbols per frame helps to mitigate the issue as the insertion errors are reduced significantly.

@csukuangfj
Copy link
Collaborator Author

Note: I only change the greedy search. The beam search is too slow when the decoder is RNN. Will update beam search when we remove the recurrent connections from the decoder.

@csukuangfj csukuangfj merged commit cb04c8a into k2-fsa:master Dec 18, 2021
@csukuangfj csukuangfj deleted the fix-greedy-search branch December 18, 2021 03:00
@danpovey
Copy link
Collaborator

Cool!
Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

@csukuangfj
Copy link
Collaborator Author

Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

I guess you are meaning the decoding results for the stateless decoder, right?


As for the change by limiting the number of symbols per frame, I just verified that it does not affect the model trained using the code from the master.

@danpovey
Copy link
Collaborator

danpovey commented Dec 20, 2021

Would be interesting to see whether it has any impact on decoding results for the
original LSTM-based model.

I guess you are meaning the decoding results for the stateless decoder, right?

Hm, I think I meant the LSTM-based decoder, whether having a max symbols per frame would make a difference (the real question is: at what max-symbols-per-frame does it start to impact results, and how?)... but really the question is more general, for whatever kinds of models we currently favor. My plan/hope for FST-based decoding was to have the decoding algorithm take a fixed number of steps per frame.

As for the change by limiting the number of symbols per frame, I just verified that it does not affect the model trained using the code from the master.

OK cool. But it must affect results at some value...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants