use scale=1.0 in floats_tensor called in speech model testers #17007

ydshieh · 2022-04-29T10:28:06Z

What does this PR do?

Fix the failure of Speech2TextModelTest.test_pt_tf_model_equivalence. This is caused by

transformers/tests/speech_to_text/test_modeling_speech_to_text.py

Lines 134 to 136 in e6f00a1

    
           input_features = floats_tensor( 
        
               [self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size 
        
           )

where the input_features get a large magnitude of 1e2 (from self.vocab_size=99).

(probably this happens because we just copied the input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) from NLP models?)

I changed it to scale=1.0, but need @patrickvonplaten's expertise to make sure there was no particular reason to use self.vocab_size.

Details

Current speech model testers have

def prepare_config_and_inputs(self):
    input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)

The self.vocab_size argument is the scale, so the generated dummy input_values has the magnitude of self.vocab_size.
For Speech2TextModelTester, we have vocab_size=99.

Furthermore, Speech2TextEncoder has

transformers/src/transformers/models/speech_to_text/modeling_speech_to_text.py

Line 705 in e6f00a1

self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

and from the tester's hidden_size=16, we get embed_scale=4.

The input_features goes through the conv layer(s) and being scaled:

transformers/src/transformers/models/speech_to_text/modeling_speech_to_text.py

Lines 767 to 768 in e6f00a1

    
           inputs_embeds = self.conv(input_features) 
        
           inputs_embeds = self.embed_scale * inputs_embeds

On CPU however, the conv layers of PT/TF gives diff. with a magnitude of 1e-7 for input values with 1s. So with the above 2 scalings, this error becomes 4e-5, and the PT/TF equiv. test fails.

HuggingFaceDocBuilderDev · 2022-04-29T10:44:41Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-04-29T11:15:17Z

tests/data2vec/test_modeling_data2vec_audio.py

@@ -116,7 +116,7 @@ def __init__(
        self.adapter_output_seq_length = (self.output_seq_length - 1) // adapter_stride + 1

    def prepare_config_and_inputs(self):
-        input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)
+        input_values = floats_tensor([self.batch_size, self.seq_length], scale=1.0)


wow good catch!

patrickvonplaten

You're 100% right - this was indeed a bad copy paste!

patrickvonplaten · 2022-04-29T11:16:23Z

Thanks for fixing all the tests!

sgugger

Nice fix! Thanks a lot!

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

use scale=1.0 in floats_tensor called in speech model testers

9a22702

ydshieh requested review from patrickvonplaten and sgugger April 29, 2022 10:28

patrickvonplaten reviewed Apr 29, 2022

View reviewed changes

patrickvonplaten approved these changes Apr 29, 2022

View reviewed changes

sgugger approved these changes Apr 29, 2022

View reviewed changes

ydshieh merged commit e952e04 into huggingface:main Apr 29, 2022

ydshieh deleted the fix_speech_to_text_ci_failure branch April 29, 2022 12:41

stevhliu pushed a commit to stevhliu/transformers that referenced this pull request May 3, 2022

use scale=1.0 in floats_tensor called in speech model testers (huggin…

db85031

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

use scale=1.0 in floats_tensor called in speech model testers (huggin…

30034dc

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use scale=1.0 in floats_tensor called in speech model testers #17007

use scale=1.0 in floats_tensor called in speech model testers #17007

ydshieh commented Apr 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 29, 2022 •

edited

Loading

patrickvonplaten Apr 29, 2022 •

edited

Loading

patrickvonplaten left a comment

patrickvonplaten commented Apr 29, 2022

sgugger left a comment

	input_features = floats_tensor(
	[self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size
	)

	inputs_embeds = self.conv(input_features)
	inputs_embeds = self.embed_scale * inputs_embeds

use scale=1.0 in floats_tensor called in speech model testers #17007

use scale=1.0 in floats_tensor called in speech model testers #17007

Conversation

ydshieh commented Apr 29, 2022 • edited Loading

What does this PR do?

Details

HuggingFaceDocBuilderDev commented Apr 29, 2022 • edited Loading

patrickvonplaten Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Apr 29, 2022

sgugger left a comment

Choose a reason for hiding this comment

ydshieh commented Apr 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 29, 2022 •

edited

Loading

patrickvonplaten Apr 29, 2022 •

edited

Loading