k2-fsa · yaozengwei · Sep 29, 2022 · Jul 12, 2022 · Aug 19, 2022 · Aug 22, 2022
diff --git a/.flake8 b/.flake8
@@ -9,7 +9,7 @@ per-file-ignores =
     egs/*/ASR/pruned_transducer_stateless*/*.py: E501,
     egs/*/ASR/*/optim.py: E501,
     egs/*/ASR/*/scaling.py: E501,
-    egs/librispeech/ASR/lstm_transducer_stateless/*.py: E501, E203
+    egs/librispeech/ASR/lstm_transducer_stateless*/*.py: E501, E203
     egs/librispeech/ASR/conv_emformer_transducer_stateless*/*.py: E501, E203
     egs/librispeech/ASR/conformer_ctc2/*py: E501,
     egs/librispeech/ASR/RESULTS.md: E999,

diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
@@ -1,12 +1,100 @@
 ## Results
 
+#### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + gradient filter)
+
+[lstm_transducer_stateless3](./lstm_transducer_stateless3)
+
+It implements LSTM model with mechanisms in reworked model for streaming ASR.
+Gradient filter is applied inside each lstm module to stabilize the training.
+
+See <https://github.com/k2-fsa/icefall/pull/564> for more details.
+
+#### training on full librispeech
+
+This model contains 12 encoder layers (LSTM module + Feedforward module). The number of model parameters is 84689496.
+
+The WERs are:
+
+|                                     | test-clean | test-other | comment              | decoding mode        |
+|-------------------------------------|------------|------------|----------------------|----------------------|
+| greedy search (max sym per frame 1) | 3.66       | 9.51       | --epoch 40 --avg 15  | simulated streaming  |
+| greedy search (max sym per frame 1) | 3.66       | 9.48       | --epoch 40 --avg 15  | streaming            |
+| fast beam search                    | 3.55       | 9.33       | --epoch 40 --avg 15  | simulated streaming  |
+| fast beam search                    | 3.57       | 9.25       | --epoch 40 --avg 15  | streaming            |
+| modified beam search                | 3.55       | 9.28       | --epoch 40 --avg 15  | simulated streaming  |
+| modified beam search                | 3.54       | 9.25       | --epoch 40 --avg 15  | streaming            |
+
+Note: `simulated streaming` indicates feeding full utterance during decoding, while `streaming` indicates feeding certain number of frames at each time.
+
+
+The training command is:
+
+```bash
+./lstm_transducer_stateless3/train.py \
+  --world-size 4 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --exp-dir lstm_transducer_stateless3/exp \
+  --full-libri 1 \
+  --max-duration 500 \
+  --master-port 12325 \
+  --num-encoder-layers 12 \
+  --grad-norm-threshold 25.0 \
+  --rnn-hidden-size 1024
+```
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/caNPyr5lT8qAl9qKsXEeEQ/>
+
+The simulated streaming decoding command using greedy search, fast beam search, and modified beam search is:
+```bash
+for decoding_method in greedy_search fast_beam_search modified_beam_search; do
+  ./lstm_transducer_stateless3/decode.py \
+    --epoch 40 \
+    --avg 15 \
+    --exp-dir lstm_transducer_stateless3/exp \
+    --max-duration 600 \
+    --num-encoder-layers 12 \
+    --rnn-hidden-size 1024 \
+    --decoding-method $decoding_method \
+    --use-averaged-model True \
+    --beam 4 \
+    --max-contexts 4 \
+    --max-states 8 \
+    --beam-size 4
+done
+```
+
+The streaming decoding command using greedy search, fast beam search, and modified beam search is:
+```bash
+for decoding_method in greedy_search fast_beam_search modified_beam_search; do
+  ./lstm_transducer_stateless3/streaming_decode.py \
+    --epoch 40 \
+    --avg 15 \
+    --exp-dir lstm_transducer_stateless3/exp \
+    --max-duration 600 \
+    --num-encoder-layers 12 \
+    --rnn-hidden-size 1024 \
+    --decoding-method $decoding_method \
+    --use-averaged-model True \
+    --beam 4 \
+    --max-contexts 4 \
+    --max-states 8 \
+    --beam-size 4
+done
+```
+
+Pretrained models, training logs, decoding logs, and decoding results
+are available at
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-lstm-transducer-stateless3-2022-09-28>
+
+
 #### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + multi-dataset)
 
 [lstm_transducer_stateless2](./lstm_transducer_stateless2)
 
 See <https://github.com/k2-fsa/icefall/pull/558> for more details.
 
-
 The WERs are:
 
 |                                     | test-clean | test-other | comment                 |
@@ -18,9 +106,10 @@ The WERs are:
 | modified_beam_search                | 2.75       | 7.08       | --iter 472000 --avg 18  |
 | fast_beam_search                    | 2.77       | 7.29       | --iter 472000 --avg 18  |
 
+
 The training command is:
 
-```bash
+```
 #!/usr/bin/env bash
 
 export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
@@ -70,6 +159,7 @@ Pretrained models, training logs, decoding logs, and decoding results
 are available at
 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>
 
+
 #### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T)
 
 [lstm_transducer_stateless](./lstm_transducer_stateless)

diff --git a/egs/librispeech/ASR/lstm_transducer_stateless3/__init__.py b/egs/librispeech/ASR/lstm_transducer_stateless3/__init__.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless2/__init__.py
diff --git a/egs/librispeech/ASR/lstm_transducer_stateless3/asr_datamodule.py b/egs/librispeech/ASR/lstm_transducer_stateless3/asr_datamodule.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless2/asr_datamodule.py
diff --git a/egs/librispeech/ASR/lstm_transducer_stateless3/beam_search.py b/egs/librispeech/ASR/lstm_transducer_stateless3/beam_search.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless2/beam_search.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../pruned_transducer_stateless2/asr_datamodule.py