Remote caching should be a strategy #18245

jmmv · 2023-04-27T20:48:35Z

Description of the feature request:

When configuring Bazel to use a remote cache without remote execution, the remote cache is not exposed as a strategy as far as I can tell. (Bazel does print remote-cache in the list of strategies when scheduling an action, but this value cannot be explicitly selected via strategy flags, so it is misleading.)

The use of a remote cache should be a strategy that works well with any other strategy, such as dynamic, and can be selectively enabled for individual actions.

What underlying problem are you trying to solve with this feature?

We are facing two issues when remote caching is enabled:

It is not possible to configure Bazel so that certain actions use the cache and other actions skip it, yet we need to do this for build performance reasons and to minimize network traffic. The remote-cache magic identifier cannot be passed to --strategy. More context in Expose workspace provenance for strategy selection #18244.
It is not possible to combine the use of remote caching with the dynamic strategy. When remote caching is enabled, as soon as Bazel scores a cache hit for an action, it will patiently wait for the outputs of the action to be downloaded. If you consider a "slow" network and very large artifacts, this is a big problem: many times it's silly to wait for the download because re-running the action locally would have generated the output much sooner.

Which operating system are you running Bazel on?

N/A

What is the output of `bazel info release`?

bazel-6.1.1

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

larsrc-google · 2023-05-05T07:00:12Z

For point #2, would changing AbstractSpawnStrategy.java:134 to

        spawnRunner.handlesCaching() || stopConcurrentSpawns != null

do it?

WSUFan · 2023-05-20T05:48:19Z

I have a similar use case where I would like to configure different endpoints for the remote cache and remote executor. For example I have a remote cache service A but remote executor B. The remote executor B has its own implementation of remote cache. And what I want is for Bazel to check the remote cache to see if it has a hit. If not, Bazel will fallback to remote execution, upload all inputs, and let the remote executor perform the jobs. Is this possible in the current version?

larsrc-google · 2023-06-22T07:46:39Z

@jmmv Forget my previous comment. I'm confused about #2. src/main/java/com/google/devtools/build/lib/exec/AbstractSpawnStrategy.java:144 ought to prevent that problem.

coeuvre · 2023-06-22T08:58:03Z

I have a similar use case where I would like to configure different endpoints for the remote cache and remote executor. For example I have a remote cache service A but remote executor B. The remote executor B has its own implementation of remote cache. And what I want is for Bazel to check the remote cache to see if it has a hit. If not, Bazel will fallback to remote execution, upload all inputs, and let the remote executor perform the jobs. Is this possible in the current version?

This is possible today. The caveat is your remote executor needs to upload the execution result to your remote cache service. Otherwise, Bazel will fail to fetch them after remote execution.

This new flag is similar in spirit to --remote_accept_cached but allows being more selective about what's accepted and what's not. The specific problem I face is the following: we have a setup where we want to use dynamic execution for performance reasons. However, we know some actions in our build (those run by rules_foreign_cc) are not deterministic. To mitigate this, we force the actions that we know are not deterministic to run remotely, without dynamic execution, as this will prevent exposing the non-determinism for as long as they are cached and until we can fix their problems. However, we still observe non-deterministic actions in the build and we need to diagnose what those are. To do this, I need to run two builds and compare their execlogs. And I need these builds to continue to reuse the non-deterministic artifacts we _already_ know about from the cache, but to rerun other local actions from scratch. Unfortunately, the fact that "remote-cache" is not a strategy (see bazelbuild#18245) makes this very difficult to do because, even if I configure certain actions to run locally unconditionally, the spawn strategy insists on checking the remote cache for them. With this new flag, I can run a build where the remote actions remain remote but where I disable the dynamic scheduler and force the remaining actions to re-run locally. I'm marking the flag as experimental because this feels like a huge kludge to paper over the fact that the remote cache should really be a strategy, but isn't. In other words: this flag should go away with a better rearchitecting of the remote caching interface.

This new flag is similar in spirit to --remote_accept_cached but allows being more selective about what's accepted and what's not. The specific problem I face is the following: we have a setup where we want to use dynamic execution for performance reasons. However, we know some actions in our build (those run by rules_foreign_cc) are not deterministic. To mitigate this, we force the actions that we know are not deterministic to run remotely, without dynamic execution, as this will prevent exposing the non-determinism for as long as they are cached and until we can fix their problems. However, we still observe non-deterministic actions in the build and we need to diagnose what those are. To do this, I need to run two builds and compare their execlogs. And I need these builds to continue to reuse the non-deterministic artifacts we _already_ know about from the cache, but to rerun other local actions from scratch. Unfortunately, the fact that "remote-cache" is not a strategy (see bazelbuild#18245) makes this very difficult to do because, even if I configure certain actions to run locally unconditionally, the spawn strategy insists on checking the remote cache for them. With this new flag, I can run a build where the remote actions remain remote but where I disable the dynamic scheduler and force the remaining actions to re-run locally. I'm marking the flag as experimental because this feels like a huge kludge to paper over the fact that the remote cache should really be a strategy, but isn't. In other words: this flag should go away with a better rearchitecting of the remote caching interface. Upstream PR: bazelbuild#18944 Author: Julio Merino <julio.merino+oss@snowflake.com> Date: Fri Jul 14 10:32:41 2023 -0700 Description Testing

github-actions · 2024-08-26T01:32:07Z

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

csmulhern · 2024-11-02T14:24:04Z

Just wanted to add that I think this is an important feature. For projects with smaller codebases but large dependencies, remote caching is only a win when limiting it to the large dependencies; the small projects can be built much faster locally, and incremental runtimes get dominated by artifact uploading. Today, the only real solution is to manually tag everything with "no-remote-cache-upload", which is not practical.

If remote-cache were a strategy, then I believe this could be accomplished with --strategy_regexp, with e.g. something like --strategy_regexp=^external=remote-cache. Then just the current --spawn_strategy value could be used as a fallback in the case of a cache miss.

This new flag is similar in spirit to --remote_accept_cached but allows being more selective about what's accepted and what's not. The specific problem I face is the following: we have a setup where we want to use dynamic execution for performance reasons. However, we know some actions in our build (those run by rules_foreign_cc) are not deterministic. To mitigate this, we force the actions that we know are not deterministic to run remotely, without dynamic execution, as this will prevent exposing the non-determinism for as long as they are cached and until we can fix their problems. However, we still observe non-deterministic actions in the build and we need to diagnose what those are. To do this, I need to run two builds and compare their execlogs. And I need these builds to continue to reuse the non-deterministic artifacts we _already_ know about from the cache, but to rerun other local actions from scratch. Unfortunately, the fact that "remote-cache" is not a strategy (see bazelbuild#18245) makes this very difficult to do because, even if I configure certain actions to run locally unconditionally, the spawn strategy insists on checking the remote cache for them. With this new flag, I can run a build where the remote actions remain remote but where I disable the dynamic scheduler and force the remaining actions to re-run locally. I'm marking the flag as experimental because this feels like a huge kludge to paper over the fact that the remote cache should really be a strategy, but isn't. In other words: this flag should go away with a better rearchitecting of the remote caching interface. Upstream PR: bazelbuild#18944 Author: Julio Merino <julio.merino+oss@snowflake.com> Date: Fri Jul 14 10:32:41 2023 -0700 Description Testing

This new flag is similar in spirit to --remote_accept_cached but allows being more selective about what's accepted and what's not. The specific problem I face is the following: we have a setup where we want to use dynamic execution for performance reasons. However, we know some actions in our build (those run by rules_foreign_cc) are not deterministic. To mitigate this, we force the actions that we know are not deterministic to run remotely, without dynamic execution, as this will prevent exposing the non-determinism for as long as they are cached and until we can fix their problems. However, we still observe non-deterministic actions in the build and we need to diagnose what those are. To do this, I need to run two builds and compare their execlogs. And I need these builds to continue to reuse the non-deterministic artifacts we _already_ know about from the cache, but to rerun other local actions from scratch. Unfortunately, the fact that "remote-cache" is not a strategy (see bazelbuild#18245) makes this very difficult to do because, even if I configure certain actions to run locally unconditionally, the spawn strategy insists on checking the remote cache for them. With this new flag, I can run a build where the remote actions remain remote but where I disable the dynamic scheduler and force the remaining actions to re-run locally. I'm marking the flag as experimental because this feels like a huge kludge to paper over the fact that the remote cache should really be a strategy, but isn't. In other words: this flag should go away with a better rearchitecting of the remote caching interface. Upstream PR: bazelbuild#18944

jmmv added type: feature request untriaged labels Apr 27, 2023

jmmv assigned Pavank1992 and sgowroji Apr 27, 2023

sgowroji unassigned sgowroji and Pavank1992 Apr 28, 2023

sgowroji added team-Remote-Exec Issues and PRs for the Execution (Remote) team Starlark configuration Starlark transitions, build_settings and removed Starlark configuration Starlark transitions, build_settings labels Apr 28, 2023

wilwell added P3 We're not considering working on this, but happy to review a PR. (No assignee) awaiting-user-response Awaiting a response from the author and removed untriaged labels May 9, 2023

jmmv mentioned this issue Jul 14, 2023

Add the experimental_use_remote_cache_for_cache_unaware_spawns flag #18944

Open

coeuvre mentioned this issue Oct 20, 2023

Rethink spawn strategies #19904

Open

12 tasks

github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Aug 26, 2024

coeuvre added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote caching should be a strategy #18245

Remote caching should be a strategy #18245

jmmv commented Apr 27, 2023

larsrc-google commented May 5, 2023

WSUFan commented May 20, 2023

larsrc-google commented Jun 22, 2023

coeuvre commented Jun 22, 2023

github-actions bot commented Aug 26, 2024

csmulhern commented Nov 2, 2024

Remote caching should be a strategy #18245

Remote caching should be a strategy #18245

Comments

jmmv commented Apr 27, 2023

Description of the feature request:

What underlying problem are you trying to solve with this feature?

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

larsrc-google commented May 5, 2023

WSUFan commented May 20, 2023

larsrc-google commented Jun 22, 2023

coeuvre commented Jun 22, 2023

github-actions bot commented Aug 26, 2024

csmulhern commented Nov 2, 2024

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?