Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GigaSpeech recipe #120

Merged
merged 48 commits into from
Apr 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b7bda9e
initial commit
wgb14 Nov 9, 2021
7586015
support download, data prep, and fbank
wgb14 Nov 12, 2021
1d58765
on-the-fly feature extraction by default
wgb14 Nov 13, 2021
3dbb15b
support BPE based lang
wgb14 Nov 14, 2021
16f1799
support HLG for BPE
wgb14 Nov 14, 2021
9d08b44
small fix
wgb14 Nov 14, 2021
89c0e2e
small fix
wgb14 Nov 14, 2021
fa734e0
chunked feature extraction by default
wgb14 Nov 17, 2021
317f5ec
Compute features for GigaSpeech by splitting the manifest.
csukuangfj Nov 28, 2021
4351e1e
Fixes after review.
csukuangfj Nov 28, 2021
ee7c56c
Merge pull request #1 from csukuangfj/fix-giga
wgb14 Nov 28, 2021
8109c2b
Split manifests into 2000 pieces.
csukuangfj Nov 30, 2021
b8beb00
Merge pull request #2 from csukuangfj/fix-giga
wgb14 Nov 30, 2021
64bd3f7
set audio duration mismatch tolerance to 0.01
wgb14 Dec 1, 2021
4316ec4
small fix
wgb14 Dec 3, 2021
71ef6a9
Merge remote-tracking branch 'upstream/master' into gigaspeech_recipe
wgb14 Dec 17, 2021
76a2891
add conformer training recipe
wgb14 Dec 17, 2021
532309b
Add conformer.py without pre-commit checking
wgb14 Dec 17, 2021
bea78f6
lazy loading and use SingleCutSampler
wgb14 Dec 17, 2021
6e5b189
DynamicBucketingSampler
wgb14 Dec 29, 2021
72abd38
use KaldifeatFbank to compute fbank for musan
wgb14 Jan 17, 2022
652646a
use pretrained language model and lexicon
wgb14 Jan 18, 2022
e6017ba
Merge remote-tracking branch 'upstream/master' into gigaspeech_recipe
wgb14 Jan 20, 2022
c62e0b7
use 3gram to decode, 4gram to rescore
wgb14 Feb 15, 2022
b429efa
Add decode.py
wgb14 Feb 15, 2022
c3993a5
Merge branch 'k2-fsa:master' into gigaspeech_recipe
wgb14 Mar 21, 2022
55e3019
Update .flake8
wgb14 Mar 21, 2022
3ddcc79
Delete compute_fbank_gigaspeech.py
wgb14 Apr 6, 2022
7921163
Use BucketingSampler for valid and test dataloader
wgb14 Apr 6, 2022
64bb39b
Update params in train.py
wgb14 Apr 6, 2022
9a5340b
Use bpe_500
wgb14 Apr 6, 2022
d9addb7
update params in decode.py
wgb14 Apr 6, 2022
a4e1471
Decrease num_paths while CUDA OOM
wgb14 Apr 6, 2022
f857d5a
Added README
wgb14 Apr 7, 2022
e56d327
Update RESULTS
wgb14 Apr 7, 2022
3e5436e
Merge branch 'k2-fsa:master' into gigaspeech_recipe
wgb14 Apr 7, 2022
3d2c261
black
wgb14 Apr 7, 2022
6d07cf9
Decrease num_paths while CUDA OOM
wgb14 Apr 11, 2022
f485b66
Decode with post-processing
wgb14 Apr 11, 2022
22f011e
Update results
wgb14 Apr 11, 2022
36ec10c
Merge remote-tracking branch 'upstream/master' into gigaspeech_recipe
wgb14 Apr 11, 2022
4079982
Remove lazy_load option
wgb14 Apr 11, 2022
ba245aa
Use default `storage_type`
wgb14 Apr 12, 2022
6a425ed
Keep the original tolerance
wgb14 Apr 12, 2022
e83b703
Use split-lazy
wgb14 Apr 13, 2022
0986b8f
Merge branch 'k2-fsa:master' into gigaspeech_recipe
wgb14 Apr 13, 2022
2ec8b06
black
wgb14 Apr 13, 2022
00fa309
Update pretrained model
wgb14 Apr 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ per-file-ignores =
egs/librispeech/ASR/*/conformer.py: E501,
egs/aishell/ASR/*/conformer.py: E501,
egs/tedlium3/ASR/*/conformer.py: E501,
egs/gigaspeech/ASR/*/conformer.py: E501,
egs/librispeech/ASR/pruned_transducer_stateless2/*.py: E501,

# invalid escape sequence (cause by tex formular), W605
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ exp
exp*/
*.pt
download
dask-worker-space
log
*.bak
*-bak
*bak.py
1 change: 1 addition & 0 deletions egs/gigaspeech/ASR/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
log-*
20 changes: 20 additions & 0 deletions egs/gigaspeech/ASR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# GigaSpeech
GigaSpeech, an evolving, multi-domain English
speech recognition corpus with 10,000 hours of high quality labeled
audio, collected from audiobooks, podcasts
and YouTube, covering both read and spontaneous speaking styles,
and a variety of topics, such as arts, science, sports, etc. More details can be found: https://github.com/SpeechColab/GigaSpeech

## Download

Apply for the download credentials and download the dataset by following https://github.com/SpeechColab/GigaSpeech#download. Then create a symlink
```bash
ln -sfv /path/to/GigaSpeech download/GigaSpeech
```

## Performance Record
| | Dev | Test |
|-----|-------|-------|
| WER | 10.47 | 10.58 |

See [RESULTS](/egs/gigaspeech/ASR/RESULTS.md) for details.
79 changes: 79 additions & 0 deletions egs/gigaspeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## Results

### GigaSpeech BPE training results (Conformer-CTC)

#### 2022-04-06

The best WER, as of 2022-04-06, for the gigaspeech is below

Results using HLG decoding + n-gram LM rescoring + attention decoder rescoring:

| | Dev | Test |
|-----|-------|-------|
| WER | 10.47 | 10.58 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.5 | 1.3 |


To reproduce the above result, use the following commands for training:

```
cd egs/gigaspeech/ASR
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
./conformer_ctc/train.py \
--max-duration 120 \
--num-workers 1 \
--world-size 8 \
--exp-dir conformer_ctc/exp_500 \
--lang-dir data/lang_bpe_500
```

and the following command for decoding:

```
./conformer_ctc/decode.py \
--epoch 18 \
--avg 6 \
--method attention-decoder \
--num-paths 1000 \
--exp-dir conformer_ctc/exp_500 \
--lang-dir data/lang_bpe_500 \
--max-duration 20 \
--num-workers 1
```

Results using HLG decoding + whole lattice rescoring:

| | Dev | Test |
|-----|-------|-------|
| WER | 10.51 | 10.62 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
| lm_scale |
|----------|
| 0.2 |

To reproduce the above result, use the training commands above, and the following command for decoding:

```
./conformer_ctc/decode.py \
--epoch 18 \
--avg 6 \
--method whole-lattice-rescoring \
--num-paths 1000 \
--exp-dir conformer_ctc/exp_500 \
--lang-dir data/lang_bpe_500 \
--max-duration 20 \
--num-workers 1
```
Note: the `whole-lattice-rescoring` method is about twice as fast as the `attention-decoder` method, with slightly worse WER.

Pretrained model is available at
<https://huggingface.co/wgb14/icefall-asr-gigaspeech-conformer-ctc>

The tensorboard log for training is available at
<https://tensorboard.dev/experiment/rz63cmJXSK2fV9GceJtZXQ/>
Empty file.
Loading