Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPU support #3378

Open
wants to merge 116 commits into
base: main
Choose a base branch
from
Open

HPU support #3378

wants to merge 116 commits into from

Conversation

IlyasMoutawwakil
Copy link
Member

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch! From the accelerate side this looks fine; do you want to do big model inference while you're at it? (if not all good).

I assume accelerate test etc went well? 🤗

@muellerzr muellerzr requested a review from SunMarc February 5, 2025 21:38
@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as draft February 6, 2025 07:50
@IlyasMoutawwakil
Copy link
Member Author

@muellerzr I forgot to mark it as a draft 😅.
I'm still debugging some issues, both from accelerate and optimum-habana side, should be ready by next week.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for the PR ! +1 for big model inference also for a folow-up PR for example

@muellerzr
Copy link
Collaborator

No worries @IlyasMoutawwakil ! Just let us know when you're all set to go 🫡

@@ -28,7 +28,7 @@ test_big_modeling:

test_core:
python -m pytest -s -v ./tests/ --ignore=./tests/test_examples.py --ignore=./tests/deepspeed --ignore=./tests/test_big_modeling.py \
--ignore=./tests/fsdp --ignore=./tests/test_cli.py $(if $(IS_GITHUB_CI),--report-log "$(PYTORCH_VERSION)_core.log",)
--ignore=./tests/fsdp --ignore=./tests/tp --ignore=./tests/test_cli.py $(if $(IS_GITHUB_CI),--report-log "$(PYTORCH_VERSION)_core.log",)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure TP should be part of test_core, tell me if you want me to revert this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't think we want that cc @muellerzr

Copy link
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -295,7 +298,7 @@ class FSDPIntegrationTest(TempDirTestCase):

def setUp(self):
super().setUp()
self.performance_lower_bound = 0.82
self.performance_lower_bound = 0.82 if torch.cuda.is_available() else 0.70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 0.70 is specific to HPU? Maybe this should rather be

0.70 if is_hpu_available() else 0.82

?

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review February 25, 2025 09:46
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice integration. Left a couple of comments

@@ -28,7 +28,7 @@ test_big_modeling:

test_core:
python -m pytest -s -v ./tests/ --ignore=./tests/test_examples.py --ignore=./tests/deepspeed --ignore=./tests/test_big_modeling.py \
--ignore=./tests/fsdp --ignore=./tests/test_cli.py $(if $(IS_GITHUB_CI),--report-log "$(PYTORCH_VERSION)_core.log",)
--ignore=./tests/fsdp --ignore=./tests/tp --ignore=./tests/test_cli.py $(if $(IS_GITHUB_CI),--report-log "$(PYTORCH_VERSION)_core.log",)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't think we want that cc @muellerzr

Comment on lines 1524 to 1525
else:
device_ids, output_device = [self.device.index], self.device.index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes this should only be done in the case of hpu, basically with hpu, all processes will have access to their respective devices as hpu:0, the device allocation for processes is managed on a lower level. @regisss

Comment on lines +1900 to +1914
if self.device.type == "hpu":
# This env variable is initialized here to make sure it is set to "true"
# It should be done by the launcher but it does not work for multi-node runs
os.environ["DEEPSPEED_USE_HPU"] = "true"

# This should be verified in the config validation and not here
if (
self.deepspeed_config["zero_optimization"].get("offload_optimizer", {}).get("device", "none")
!= "none"
and os.environ.get("PT_HPU_LAZY_MODE", "1") == "1"
):
raise ValueError(
"You can't use an Offload Optimizer with HPU in Lazy Mode. "
"Please set the environment variable `PT_HPU_LAZY_MODE` to `0`."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -453,6 +471,13 @@ def require_torchdata_stateful_dataloader(test_case):
)(test_case)


def launches_subprocesses(test_case):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to rename it run_first and add explanations on the docstring in which case this might be useful

@require_huggingface_suite
@require_multi_device
@slow
def test_train_multiple_models(self):
self.test_file_path = self.test_scripts_folder / "test_ds_multiple_model.py"
args = ["--num_processes=2", "--num_machines=1", "--main_process_port=0", str(self.test_file_path)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -295,7 +298,7 @@ class FSDPIntegrationTest(TempDirTestCase):

def setUp(self):
super().setUp()
self.performance_lower_bound = 0.82
self.performance_lower_bound = 0.82 if torch.cuda.is_available() else 0.70
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for regis suggestion

Comment on lines +742 to +746

if is_hpu_available():
device_map = {"linear1": "cpu", "linear2": "disk", "batchnorm": "cpu", "linear3": 0, "linear4": 0}
else:
device_map = {"linear1": "cpu", "linear2": "disk", "batchnorm": "cpu", "linear3": 0, "linear4": 1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because you don't have access to multiple hpus ?

Copy link
Member Author

@IlyasMoutawwakil IlyasMoutawwakil Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one process can only acquire one hou device, and in the context of that process, the id of that device is 0. if you put something on 'hpu:1' you just get a warning and the tensor is put on 'hpu:0'.
but it seems like 'hpu:x' is gonna be supported at some point (from comment in the optimum-habana repo), @regisss ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I guess XD
But there is no ETA AFAIK, so I wouldn't expect to be able to use hpu:x in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants