Add support for HubertForCTC #347

jimypbr · 2023-04-20T09:40:06Z

What does this PR do?

Adds support for HubertForCTC model.

Can fine-tune it on librispeech with the command:

python examples/speech-recognition/run_speech_recognition_ctc.py \
    --dataset_name "librispeech_asr" \
    --dataset_config_name "clean" \
    --train_split_name "train.100" \
    --eval_split_name "validation" \
    --model_name_or_path facebook/hubert-base-ls960 \
    --ipu_config_name Graphcore/wav2vec2-ctc-base-ipu \
    --ipu_config_overrides "device_iterations=16,inference_device_iterations=16" \
    --mask_time_prob 0.0 \
    --mask_feature_prob 0.0 \
    --output_dir "./hubert-base-960h" \
    --overwrite_output_dir \
    --length_column_name "input_length" \
    --num_train_epochs 5 \
    --learning_rate "3e-4" \
    --warmup_steps 400 \
    --evaluation_strategy "steps" \
    --text_column_name "text" \
    --save_steps 400 \
    --eval_steps 400 \
    --logging_steps 10 \
    --save_total_limit 1 \
    --freeze_feature_encoder \
    --do_train \
    --do_eval \
    --layerdrop 0.0 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 2 \
    --pod_type pod8 \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 0.0001 \
    --report_to wandb \
    --dataloader_drop_last \
    --dataloader_mode "async_rebatched" \
    --dataloader_num_workers 8

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-04-20T09:49:58Z

The documentation is not available anymore as the PR was closed or merged.

optimum/graphcore/models/hubert/modeling_hubert.py

katalinic-gc · 2023-04-20T11:51:58Z

optimum/graphcore/models/hubert/modeling_hubert.py

+        Undo the changes to the model done by `parallelize`.
+        """
+        super().deparallelize()
+        self.change_hubert_encoder_class(True)


in parallelize the feature encoder gets frozen; does any "unfreezing" need to happen here?

optimum/graphcore/models/hubert/modeling_hubert.py

jimypbr mentioned this pull request Apr 20, 2023

HuBERT integration available? #325

Closed

katalinic-gc reviewed Apr 20, 2023

View reviewed changes

jimypbr added 5 commits April 24, 2023 15:48

Add support for HubertForCTC

66fdcff

style fix

275fc8f

Update README table

26b602d

clean up code

baf7437

Fix tests for Hubert ctc

70f3456

jimypbr force-pushed the hubert-ctc branch from 43e91ef to 70f3456 Compare April 24, 2023 15:48

jimypbr merged commit 869cd06 into main Apr 24, 2023

jimypbr deleted the hubert-ctc branch April 24, 2023 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for HubertForCTC #347

Add support for HubertForCTC #347

jimypbr commented Apr 20, 2023

HuggingFaceDocBuilderDev commented Apr 20, 2023 •

edited

Loading

katalinic-gc Apr 20, 2023

Add support for HubertForCTC #347

Add support for HubertForCTC #347

Conversation

jimypbr commented Apr 20, 2023

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Apr 20, 2023 • edited Loading

katalinic-gc Apr 20, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 20, 2023 •

edited

Loading