Skip to content

Commit 2332ba3

Browse files
authored
Begin to use multiple datasets in training (#213)
* Begin to use multiple datasets. * Finish preparing training datasets. * Minor fixes * Copy files. * Finish training code. * Display losses for gigaspeech and librispeech separately. * Fix decode.py * Make the probability to select a batch from GigaSpeech configurable. * Update results. * Minor fixes.
1 parent 1c35ae1 commit 2332ba3

26 files changed

+5342
-9
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)
2+
3+
# See ../../LICENSE for clarification regarding multiple authors
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
name: run-pre-trained-trandsucer-stateless-multi-datasets-librispeech-100h
18+
19+
on:
20+
push:
21+
branches:
22+
- master
23+
pull_request:
24+
types: [labeled]
25+
26+
jobs:
27+
run_pre_trained_transducer_stateless_multi_datasets_librispeech_100h:
28+
if: github.event.label.name == 'ready' || github.event_name == 'push'
29+
runs-on: ${{ matrix.os }}
30+
strategy:
31+
matrix:
32+
os: [ubuntu-18.04]
33+
python-version: [3.7, 3.8, 3.9]
34+
torch: ["1.10.0"]
35+
torchaudio: ["0.10.0"]
36+
k2-version: ["1.9.dev20211101"]
37+
38+
fail-fast: false
39+
40+
steps:
41+
- uses: actions/checkout@v2
42+
with:
43+
fetch-depth: 0
44+
45+
- name: Setup Python ${{ matrix.python-version }}
46+
uses: actions/setup-python@v1
47+
with:
48+
python-version: ${{ matrix.python-version }}
49+
50+
- name: Install Python dependencies
51+
run: |
52+
python3 -m pip install --upgrade pip pytest
53+
# numpy 1.20.x does not support python 3.6
54+
pip install numpy==1.19
55+
pip install torch==${{ matrix.torch }}+cpu torchaudio==${{ matrix.torchaudio }}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
56+
pip install k2==${{ matrix.k2-version }}+cpu.torch${{ matrix.torch }} -f https://k2-fsa.org/nightly/
57+
58+
python3 -m pip install git+https://github.com/lhotse-speech/lhotse
59+
python3 -m pip install kaldifeat
60+
# We are in ./icefall and there is a file: requirements.txt in it
61+
pip install -r requirements.txt
62+
63+
- name: Install graphviz
64+
shell: bash
65+
run: |
66+
python3 -m pip install -qq graphviz
67+
sudo apt-get -qq install graphviz
68+
69+
- name: Download pre-trained model
70+
shell: bash
71+
run: |
72+
sudo apt-get -qq install git-lfs tree sox
73+
cd egs/librispeech/ASR
74+
mkdir tmp
75+
cd tmp
76+
git lfs install
77+
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21
78+
79+
cd ..
80+
tree tmp
81+
soxi tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/*.wav
82+
ls -lh tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/*.wav
83+
84+
- name: Run greedy search decoding (max-sym-per-frame 1)
85+
shell: bash
86+
run: |
87+
export PYTHONPATH=$PWD:PYTHONPATH
88+
cd egs/librispeech/ASR
89+
./transducer_stateless_multi_datasets/pretrained.py \
90+
--method greedy_search \
91+
--max-sym-per-frame 1 \
92+
--checkpoint ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/exp/pretrained.pt \
93+
--bpe-model ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/data/lang_bpe_500/bpe.model \
94+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1089-134686-0001.wav \
95+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0001.wav \
96+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0002.wav
97+
98+
- name: Run greedy search decoding (max-sym-per-frame 2)
99+
shell: bash
100+
run: |
101+
export PYTHONPATH=$PWD:PYTHONPATH
102+
cd egs/librispeech/ASR
103+
./transducer_stateless_multi_datasets/pretrained.py \
104+
--method greedy_search \
105+
--max-sym-per-frame 2 \
106+
--checkpoint ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/exp/pretrained.pt \
107+
--bpe-model ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/data/lang_bpe_500/bpe.model \
108+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1089-134686-0001.wav \
109+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0001.wav \
110+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0002.wav
111+
112+
- name: Run greedy search decoding (max-sym-per-frame 3)
113+
shell: bash
114+
run: |
115+
export PYTHONPATH=$PWD:PYTHONPATH
116+
cd egs/librispeech/ASR
117+
./transducer_stateless_multi_datasets/pretrained.py \
118+
--method greedy_search \
119+
--max-sym-per-frame 3 \
120+
--checkpoint ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/exp/pretrained.pt \
121+
--bpe-model ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/data/lang_bpe_500/bpe.model \
122+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1089-134686-0001.wav \
123+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0001.wav \
124+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0002.wav
125+
126+
- name: Run beam search decoding
127+
shell: bash
128+
run: |
129+
export PYTHONPATH=$PWD:$PYTHONPATH
130+
cd egs/librispeech/ASR
131+
./transducer_stateless_multi_datasets/pretrained.py \
132+
--method beam_search \
133+
--beam-size 4 \
134+
--checkpoint ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/exp/pretrained.pt \
135+
--bpe-model ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/data/lang_bpe_500/bpe.model \
136+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1089-134686-0001.wav \
137+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0001.wav \
138+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0002.wav
139+
140+
- name: Run modified beam search decoding
141+
shell: bash
142+
run: |
143+
export PYTHONPATH=$PWD:$PYTHONPATH
144+
cd egs/librispeech/ASR
145+
./transducer_stateless_multi_datasets/pretrained.py \
146+
--method modified_beam_search \
147+
--beam-size 4 \
148+
--checkpoint ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/exp/pretrained.pt \
149+
--bpe-model ./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/data/lang_bpe_500/bpe.model \
150+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1089-134686-0001.wav \
151+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0001.wav \
152+
./tmp/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21/test_wavs/1221-135766-0002.wav

egs/librispeech/ASR/README.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,12 @@ for how to run models in this recipe.
99
There are various folders containing the name `transducer` in this folder.
1010
The following table lists the differences among them.
1111

12-
| | Encoder | Decoder |
13-
|------------------------|-----------|--------------------|
14-
| `transducer` | Conformer | LSTM |
15-
| `transducer_stateless` | Conformer | Embedding + Conv1d |
16-
| `transducer_lstm ` | LSTM | LSTM |
12+
| | Encoder | Decoder | Comment |
13+
|---------------------------------------|-----------|--------------------|---------------------------------------------------|
14+
| `transducer` | Conformer | LSTM | |
15+
| `transducer_stateless` | Conformer | Embedding + Conv1d | |
16+
| `transducer_lstm` | LSTM | LSTM | |
17+
| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
1718

1819
The decoder in `transducer_stateless` is modified from the paper
1920
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Results for train-clean-100
2+
3+
This page shows the WERs for test-clean/test-other using only
4+
train-clean-100 subset as training data.
5+
6+
## Conformer encoder + embedding decoder
7+
8+
### 2022-02-21
9+
10+
| | test-clean | test-other | comment |
11+
|-------------------------------------|------------|------------|------------------------------------------|
12+
| greedy search (max sym per frame 1) | 6.34 | 16.7 | --epoch 57, --avg 17, --max-duration 100 |
13+
| greedy search (max sym per frame 2) | 6.34 | 16.7 | --epoch 57, --avg 17, --max-duration 100 |
14+
| greedy search (max sym per frame 3) | 6.34 | 16.7 | --epoch 57, --avg 17, --max-duration 100 |
15+
| modified beam search (beam size 4) | 6.31 | 16.3 | --epoch 57, --avg 17, --max-duration 100 |
16+
17+
18+
The training command for reproducing is given below:
19+
20+
```bash
21+
cd egs/librispeech/ASR/
22+
./prepare.sh
23+
./prepare_giga_speech.sh
24+
25+
export CUDA_VISIBLE_DEVICES="0,1"
26+
27+
./transducer_stateless_multi_datasets/train.py \
28+
--world-size 2 \
29+
--num-epochs 60 \
30+
--start-epoch 0 \
31+
--exp-dir transducer_stateless_multi_datasets/exp-100-2 \
32+
--full-libri 0 \
33+
--max-duration 300 \
34+
--lr-factor 1 \
35+
--bpe-model data/lang_bpe_500/bpe.model \
36+
--modified-transducer-prob 0.25
37+
--giga-prob 0.2
38+
```
39+
40+
The decoding command is given below:
41+
42+
```bash
43+
for epoch in 57; do
44+
for avg in 17; do
45+
for sym in 1 2 3; do
46+
./transducer_stateless_multi_datasets/decode.py \
47+
--epoch $epoch \
48+
--avg $avg \
49+
--exp-dir transducer_stateless_multi_datasets/exp-100-2 \
50+
--bpe-model ./data/lang_bpe_500/bpe.model \
51+
--max-duration 100 \
52+
--context-size 2 \
53+
--max-sym-per-frame $sym
54+
done
55+
done
56+
done
57+
58+
epoch=57
59+
avg=17
60+
./transducer_stateless_multi_datasets/decode.py \
61+
--epoch $epoch \
62+
--avg $avg \
63+
--exp-dir transducer_stateless_multi_datasets/exp-100-2 \
64+
--bpe-model ./data/lang_bpe_500/bpe.model \
65+
--max-duration 100 \
66+
--context-size 2 \
67+
--decoding-method modified_beam_search \
68+
--beam-size 4
69+
```
70+
71+
The tensorboard log is available at
72+
<https://tensorboard.dev/experiment/qUEKzMnrTZmOz1EXPda9RA/>
73+
74+
A pre-trained model and decoding logs can be found at
75+
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-100h-transducer-stateless-multi-datasets-bpe-500-2022-02-21>

egs/librispeech/ASR/local/compute_fbank_librispeech.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
from pathlib import Path
2929

3030
import torch
31-
from lhotse import CutSet, Fbank, FbankConfig, LilcomHdf5Writer
31+
from lhotse import ChunkedLilcomHdf5Writer, CutSet, Fbank, FbankConfig
3232
from lhotse.recipes.utils import read_manifests_if_cached
3333

3434
from icefall.utils import get_executor
@@ -85,7 +85,7 @@ def compute_fbank_librispeech():
8585
# when an executor is specified, make more partitions
8686
num_jobs=num_jobs if ex is None else 80,
8787
executor=ex,
88-
storage_type=LilcomHdf5Writer,
88+
storage_type=ChunkedLilcomHdf5Writer,
8989
)
9090
cut_set.to_json(output_dir / f"cuts_{partition}.json.gz")
9191

egs/librispeech/ASR/local/compute_fbank_musan.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
from pathlib import Path
2929

3030
import torch
31-
from lhotse import CutSet, Fbank, FbankConfig, LilcomHdf5Writer, combine
31+
from lhotse import ChunkedLilcomHdf5Writer, CutSet, Fbank, FbankConfig, combine
3232
from lhotse.recipes.utils import read_manifests_if_cached
3333

3434
from icefall.utils import get_executor
@@ -82,7 +82,7 @@ def compute_fbank_musan():
8282
storage_path=f"{output_dir}/feats_musan",
8383
num_jobs=num_jobs if ex is None else 80,
8484
executor=ex,
85-
storage_type=LilcomHdf5Writer,
85+
storage_type=ChunkedLilcomHdf5Writer,
8686
)
8787
)
8888
musan_cuts.to_json(musan_cuts_path)

0 commit comments

Comments
 (0)