k2-fsa · csukuangfj · Apr 14, 2022 · Nov 9, 2021 · Nov 12, 2021 · Nov 13, 2021
diff --git a/.flake8 b/.flake8
@@ -7,6 +7,7 @@ per-file-ignores =
     egs/librispeech/ASR/*/conformer.py: E501,
     egs/aishell/ASR/*/conformer.py: E501,
     egs/tedlium3/ASR/*/conformer.py: E501,
+    egs/gigaspeech/ASR/*/conformer.py: E501,
     egs/librispeech/ASR/pruned_transducer_stateless2/*.py: E501,
 
     # invalid escape sequence (cause by tex formular), W605

diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,8 @@ exp
 exp*/
 *.pt
 download
+dask-worker-space
+log
 *.bak
 *-bak
 *bak.py
diff --git a/egs/gigaspeech/ASR/.gitignore b/egs/gigaspeech/ASR/.gitignore
@@ -0,0 +1 @@
+log-*
diff --git a/egs/gigaspeech/ASR/README.md b/egs/gigaspeech/ASR/README.md
@@ -0,0 +1,20 @@
+# GigaSpeech
+GigaSpeech, an evolving, multi-domain English
+speech recognition corpus with 10,000 hours of high quality labeled
+audio, collected from audiobooks, podcasts
+and YouTube, covering both read and spontaneous speaking styles,
+and a variety of topics, such as arts, science, sports, etc. More details can be found: https://github.com/SpeechColab/GigaSpeech
+
+## Download
+
+Apply for the download credentials and download the dataset by following https://github.com/SpeechColab/GigaSpeech#download. Then create a symlink
+```bash
+ln -sfv /path/to/GigaSpeech download/GigaSpeech
+```
+
+## Performance Record
+|     |  Dev  | Test  |
+|-----|-------|-------|
+| WER | 10.47 | 10.58 |
+
+See [RESULTS](/egs/gigaspeech/ASR/RESULTS.md) for details.
diff --git a/egs/gigaspeech/ASR/RESULTS.md b/egs/gigaspeech/ASR/RESULTS.md
@@ -0,0 +1,79 @@
+## Results
+
+### GigaSpeech BPE training results (Conformer-CTC)
+
+#### 2022-04-06
+
+The best WER, as of 2022-04-06, for the gigaspeech is below
+
+Results using HLG decoding + n-gram LM rescoring + attention decoder rescoring:
+
+|     |  Dev  | Test  |
+|-----|-------|-------|
+| WER | 10.47 | 10.58 |
+
+Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
+| ngram_lm_scale | attention_scale |
+|----------------|-----------------|
+|      0.5       |       1.3       |
+
+
+To reproduce the above result, use the following commands for training:
+
+```
+cd egs/gigaspeech/ASR
+./prepare.sh
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+./conformer_ctc/train.py \
+  --max-duration 120 \
+  --num-workers 1 \
+  --world-size 8 \
+  --exp-dir conformer_ctc/exp_500 \
+  --lang-dir data/lang_bpe_500
+```
+
+and the following command for decoding:
+
+```
+./conformer_ctc/decode.py \
+  --epoch 18 \
+  --avg 6 \
+  --method attention-decoder \
+  --num-paths 1000 \
+  --exp-dir conformer_ctc/exp_500 \
+  --lang-dir data/lang_bpe_500 \
+  --max-duration 20 \
+  --num-workers 1
+```
+
+Results using HLG decoding + whole lattice rescoring:
+
+|     |  Dev  | Test  |
+|-----|-------|-------|
+| WER | 10.51 | 10.62 |
+
+Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
+| lm_scale |
+|----------|
+|   0.2    |
+
+To reproduce the above result, use the training commands above, and the following command for decoding:
+
+```
+./conformer_ctc/decode.py \
+  --epoch 18 \
+  --avg 6 \
+  --method whole-lattice-rescoring \
+  --num-paths 1000 \
+  --exp-dir conformer_ctc/exp_500 \
+  --lang-dir data/lang_bpe_500 \
+  --max-duration 20 \
+  --num-workers 1
+```
+Note: the `whole-lattice-rescoring` method is about twice as fast as the `attention-decoder` method, with slightly worse WER.
+
+Pretrained model is available at
+<https://huggingface.co/wgb14/icefall-asr-gigaspeech-conformer-ctc>
+
+The tensorboard log for training is available at
+<https://tensorboard.dev/experiment/rz63cmJXSK2fV9GceJtZXQ/>
diff --git a/egs/gigaspeech/ASR/conformer_ctc/__init__.py b/egs/gigaspeech/ASR/conformer_ctc/__init__.py
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,6 +6,8 @@ exp @@
     exp*/
     *.pt
     download
+    dask-worker-space
+    log
     *.bak
     *-bak
     *bak.py