Streaming Zipformer with multi-dataset #984

marcoyang1998 · 2023-04-03T10:42:31Z

This PR adds multi-dataset setup for streaming zipformer. Unlike older recipes (e.g pruned_transducer_stateless3 and pruned_transducer_stateless8), we only use one head for both LibriSpeech and GigaSpeech.

The model achieves the following WERs:

decoding method	chunk size	test-clean	test-other	comment	decoding mode
greedy search	320ms	2.43	6.0	--epoch 20 --avg 4	simulated streaming
greedy search	320ms	2.47	6.13	--epoch 20 --avg 4	chunk-wise
fast beam search	320ms	2.43	5.99	--epoch 20 --avg 4	simulated streaming
fast beam search	320ms	2.8	6.46	--epoch 20 --avg 4	chunk-wise
modified beam search	320ms	2.4	5.96	--epoch 20 --avg 4	simulated streaming
modified beam search	320ms	2.42	6.03	--epoch 20 --avg 4	chunk-size
greedy search	640ms	2.26	5.58	--epoch 20 --avg 4	simulated streaming
greedy search	640ms	2.33	5.76	--epoch 20 --avg 4	chunk-wise
fast beam search	640ms	2.27	5.54	--epoch 20 --avg 4	simulated streaming
fast beam search	640ms	2.37	5.75	--epoch 20 --avg 4	chunk-wise
modified beam search	640ms	2.22	5.5	--epoch 20 --avg 4	simulated streaming
modified beam search	640ms	2.25	5.69	--epoch 20 --avg 4	chunk-size

Will upload the pre-trained models later to huggingface.

…mer_libri_small_models

ezerhouni · 2023-04-04T12:02:30Z

@marcoyang1998 Thank you for this PR
Did you try multi head streaming zipformer ?

marcoyang1998 · 2023-04-04T13:14:57Z

Did you try multi head streaming zipformer ?

No. We did some internal testing and found out that there is no performance difference between one head and two heads. Using one head also simplifies the inference because the single head can be used for both Librispeech and Gigaspeech.

marcoyang1998 · 2023-04-07T08:27:41Z

A comparison with our previous best-performing streaming model lstm_transducer_stateless2 (with two heads) on GigaSpeech:

model	head	dev	test
`lstm_transducer_stateless2`	libri	15.07	15.45
`lstm_transducer_stateless2`	giga	12.45	12.39
this PR, `pruned_transducer_stateless7_streaming_multi`	one head	12.08	11.98

As can be seen, we need to use the Giga head to decode on GigaSpeech in the two-headed lstm_transducer_stateless2 to have good WERs, making the usage complicated. Using only one head resolves this problem because the single head model achieves good WERs on LibriSpeech and GigaSpeech at the same time.

marcoyang1998 added 8 commits February 14, 2023 17:05

copy files

469b2ad

change data type

c5ae2e7

update files

bddd499

modify train.py

d3145cd

Merge branch 'master' of github.com:marcoyang1998/icefall into zipfor…

139b7ec

…mer_libri_small_models

add files

3347056

remove files

7bb94a0

update files

951319d

marcoyang1998 changed the title ~~Zipformer libri small models~~ Streaming Zipformer with multi-dataset Apr 3, 2023

marcoyang1998 added 9 commits April 3, 2023 23:17

fix code style

1a059bd

fix decode.py

0994afb

fix decoding bugs

475430b

refactor import

8bb2917

add right padding option in decode.py

8f33080

fix datamodule

daab1d0

update RESULTS.md

267cdf3

fix bug

3099e40

reformat code

b52e7ae

marcoyang1998 added 2 commits April 6, 2023 14:52

fix style

2089aa1

fix bug in streaming_decode.py

f359fe3

marcoyang1998 added 5 commits April 7, 2023 16:29

update suffix

81b72ce

update results and readme

56a14d9

fix style

c9c7340

resolve conflict

e4e999b

update RESULTS.md

170c68b

marcoyang1998 merged commit 57d6482 into k2-fsa:master Apr 21, 2023

csukuangfj mentioned this pull request Jun 18, 2023

Hi! is there any example implementation of streaming for this model: https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-2023-04-04 k2-fsa/sherpa-onnx#178

Closed

csukuangfj mentioned this pull request Jun 21, 2023

add a pre-trained streaming zipformer for sherpa-onnx (English) k2-fsa/sherpa#410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Zipformer with multi-dataset #984

Streaming Zipformer with multi-dataset #984

marcoyang1998 commented Apr 3, 2023 •

edited

Loading

ezerhouni commented Apr 4, 2023

marcoyang1998 commented Apr 4, 2023

marcoyang1998 commented Apr 7, 2023

Streaming Zipformer with multi-dataset #984

Streaming Zipformer with multi-dataset #984

Conversation

marcoyang1998 commented Apr 3, 2023 • edited Loading

ezerhouni commented Apr 4, 2023

marcoyang1998 commented Apr 4, 2023

marcoyang1998 commented Apr 7, 2023

marcoyang1998 commented Apr 3, 2023 •

edited

Loading