-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jobs] two-hop file_mounts upload #4708
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for quickly adding this @cg505! Would it be possible for us to add a smoke test for this?
sky/jobs/controller.py
Outdated
if os.path.isdir(path): | ||
shutil.rmtree(path) | ||
else: | ||
os.remove(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we catch any potential file not exist issue to avoid going into FAILED_CONTROLLER
?
sky/jobs/server/core.py
Outdated
for task_ in dag.tasks: | ||
controller_utils.maybe_translate_local_file_mounts_and_sync_up( | ||
task_, task_type='jobs') | ||
if storage_lib.get_cached_enabled_storage_clouds_or_refresh(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call can be moved outside the loop, i.e. only need to be called once.
sky/jobs/server/core.py
Outdated
'storage is available. Please specify local ' | ||
'file_mounts only.') | ||
|
||
first_hop_file_mounts = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name is a bit confusing. How about user_file_mounts_to_controller
?
sky/jobs/server/core.py
Outdated
@@ -119,6 +144,7 @@ def launch( | |||
vars_to_fill = { | |||
'remote_user_yaml_path': remote_user_yaml_path, | |||
'user_yaml_path': f.name, | |||
'two_hop_file_mounts': first_hop_file_mounts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'two_hop_file_mounts': first_hop_file_mounts, | |
'user_file_mounts_to_controller': user_file_mounts_to_controller, |
sky/utils/controller_utils.py
Outdated
Local machine/API server Controller Job cluster | ||
------------------------ ----------------------- -------------------- | ||
| local path -----|--|-> controller path --|--|-> job dst path | | ||
------------------------ ----------------------- -------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor
Local machine/API server Controller Job cluster | |
------------------------ ----------------------- -------------------- | |
| local path -----|--|-> controller path --|--|-> job dst path | | |
------------------------ ----------------------- -------------------- | |
Local machine/API server Controller Job cluster | |
------------------------ ----------------------- -------------------- | |
| local path --|--|-> controller path --|--|-> job dst path | | |
------------------------ ----------------------- -------------------- |
sky/utils/controller_utils.py
Outdated
This strategy will upload the local files to the controller first, using | ||
a normal rsync as part of sky.launch(). Then, when the controller launches | ||
the task, it will also use local file_mounts from the destination path of | ||
the first hop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strategy will upload the local files to the controller first, using | |
a normal rsync as part of sky.launch(). Then, when the controller launches | |
the task, it will also use local file_mounts from the destination path of | |
the first hop. | |
This strategy will upload the local files to the controller first, using | |
a normal rsync as part of sky.launch() for the controller. Then, when the | |
controller launches the task, it will also use local file_mounts from the | |
destination path of the first hop. |
We can probably hold off this PR from being merged until the client-server branch is merged. Just to make a clean cut for the different functionality and easier for maintanence later. : ) |
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
/quicktest-core |
/smoke-tests --managed-jobs |
It's very difficult because we need the jobs controller to disable all the clouds except k8s, but it probably already exists. |
Avoids the need for a cloud bucket for simple file mounts in managed jobs, by syncing files over two hops:
See
translate_local_file_mounts_to_two_hop()
:Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh