Skip to content

Commit c4e14ad

Browse files
committed
Rename "pipeline" to "cleaners"
No need to introduce new terminology.
1 parent 9e1ea7a commit c4e14ad

File tree

4 files changed

+18
-18
lines changed

4 files changed

+18
-18
lines changed

TRAINING_DATA.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -50,21 +50,21 @@ following the example of the other preprocessors in that file.
5050
### Non-English Data
5151

5252
If your training data is in a language other than English, you will probably want to change the
53-
text cleaning pipeline by setting the `cleaners` hyperparameter.
53+
text cleaners by setting the `cleaners` hyperparameter.
5454

5555
* If your text is in a Latin script or can be transliterated to ASCII using the
5656
[Unidecode](https://pypi.python.org/pypi/Unidecode) library, you can use the transliteration
57-
pipeline by setting the hyperparameter `cleaners=transliteration_pipeline`.
57+
cleaners by setting the hyperparameter `cleaners=transliteration_cleaners`.
5858

5959
* If you don't want to transliterate, you can define a custom character set.
6060
This allows you to train directly on the character set used in your data.
6161

6262
To do so, edit [symbols.py](text/symbols.py) and change the `_characters` variable to be a
63-
string containing the UTF-8 characters in your data. Then set the hyperparameter `cleaners=basic_pipeline`.
63+
string containing the UTF-8 characters in your data. Then set the hyperparameter `cleaners=basic_cleaners`.
6464

65-
* If you're not sure which option to use, you can evaluate the transliteration pipeline like so:
65+
* If you're not sure which option to use, you can evaluate the transliteration cleaners like this:
6666

6767
```python
6868
from text import cleaners
69-
cleaners.transliteration_pipeline('Здравствуйте') # Replace with the text you want to try
69+
cleaners.transliteration_cleaners('Здравствуйте') # Replace with the text you want to try
7070
```

hparams.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
# Default hyperparameters:
55
hparams = tf.contrib.training.HParams(
66
# Comma-separated list of cleaners to run on text prior to training and eval. For non-English
7-
# text, you may want to use "basic_pipeline" or "transliteration_pipeline" See TRAINING_DATA.md.
8-
cleaners='english_pipeline',
7+
# text, you may want to use "basic_cleaners" or "transliteration_cleaners" See TRAINING_DATA.md.
8+
cleaners='english_cleaners',
99

1010
# Audio:
1111
num_mels=80,

tests/text_test.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ def test_text_to_sequence():
1414
assert text_to_sequence('"A"_B', []) == [2, 3, 1]
1515
assert text_to_sequence('A {AW1 S} B', []) == [2, 64, 83, 132, 64, 3, 1]
1616
assert text_to_sequence('Hi', ['lowercase']) == [35, 36, 1]
17-
assert text_to_sequence('A {AW1 S} B', ['english_pipeline']) == [28, 64, 83, 132, 64, 29, 1]
17+
assert text_to_sequence('A {AW1 S} B', ['english_cleaners']) == [28, 64, 83, 132, 64, 29, 1]
1818

1919

2020
def test_sequence_to_text():
@@ -52,9 +52,9 @@ def test_expand_numbers():
5252
assert cleaners.expand_numbers('$3.50 for gas.') == 'three dollars, fifty cents for gas.'
5353

5454

55-
def test_pipelines():
55+
def test_cleaner_pipelines():
5656
text = 'Mr. Müller ate 2 Apples'
57-
assert cleaners.english_pipeline(text) == 'mister muller ate two apples'
58-
assert cleaners.transliteration_pipeline(text) == 'mr. muller ate 2 apples'
59-
assert cleaners.basic_pipeline(text) == 'mr. müller ate 2 apples'
57+
assert cleaners.english_cleaners(text) == 'mister muller ate two apples'
58+
assert cleaners.transliteration_cleaners(text) == 'mr. muller ate 2 apples'
59+
assert cleaners.basic_cleaners(text) == 'mr. müller ate 2 apples'
6060

text/cleaners.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
44
Cleaners can be selected by passing a comma-delimited list of cleaner names as the "cleaners"
55
hyperparameter. Some cleaners are English-specific. You'll typically want to use:
6-
1. "english_pipeline" for English text
7-
2. "transliteration_pipeline" for non-English text that can be transliterated to ASCII using
6+
1. "english_cleaners" for English text
7+
2. "transliteration_cleaners" for non-English text that can be transliterated to ASCII using
88
the Unidecode library (https://pypi.python.org/pypi/Unidecode)
9-
3. "basic_pipeline" if you do not want to transliterate (in this case, you should also update
9+
3. "basic_cleaners" if you do not want to transliterate (in this case, you should also update
1010
the symbols in symbols.py to match your data).
1111
'''
1212

@@ -63,22 +63,22 @@ def convert_to_ascii(text):
6363
return unidecode(text)
6464

6565

66-
def basic_pipeline(text):
66+
def basic_cleaners(text):
6767
'''Basic pipeline that lowercases and collapses whitespace without transliteration.'''
6868
text = lowercase(text)
6969
text = collapse_whitespace(text)
7070
return text
7171

7272

73-
def transliteration_pipeline(text):
73+
def transliteration_cleaners(text):
7474
'''Pipeline for non-English text that transliterates to ASCII.'''
7575
text = convert_to_ascii(text)
7676
text = lowercase(text)
7777
text = collapse_whitespace(text)
7878
return text
7979

8080

81-
def english_pipeline(text):
81+
def english_cleaners(text):
8282
'''Pipeline for English text, including number and abbreviation expansion.'''
8383
text = convert_to_ascii(text)
8484
text = lowercase(text)

0 commit comments

Comments
 (0)