Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPU CI Test - Remove bazelversion file in docker image #2318

Merged
merged 4 commits into from
Dec 29, 2020

Conversation

seanpmorgan
Copy link
Member

Description

GPU test is currently failing on master for what looks like a frivolous error. Appears to be related to bazelbuild/bazel#10356

@google-cla google-cla bot added the cla: yes label Dec 24, 2020
@WindQAQ
Copy link
Member

WindQAQ commented Dec 28, 2020

Maybe try to bump our bazel version to 3.7.2. TF bumped bazel version to 3.7.2 four days ago. I think they also updated their CI enviornment accordingly.

@seanpmorgan
Copy link
Member Author

Maybe try to bump our bazel version to 3.7.2. TF bumped bazel version to 3.7.2 four days ago. I think they also updated their CI enviornment accordingly.

Hmmm no luck. Checking the container we build with 3.1.0 is the default bazel installation anyway:

(base) spmorgan@DESKTOP-SB3:/mnt/c/Users/SeanM/code/addons$ docker run --rm -it -v ${PWD}:/addons -w /addons gcr.io/tensorflow-testing/nosla-cuda11.0-cudnn8-ubuntu18.04-manylinux2010-multipytho
n
root@46c6fb16da61:/addons# bazel --version
bazel 3.1.0

And switching the version in .bazelversion just yields:

 bazel build --crosstool_top=//build_deps/toolchains/gcc7_manylinux2010-nvcc-cuda11:toolchain //tensorflow_addons/...
ERROR: The project you're trying to build requires Bazel 3.7.2 (specified in /addons/.bazelversion), but it wasn't found in /usr/local/lib/bazel/bin.

Will look into this when time allows later today

@bhack
Copy link
Contributor

bhack commented Dec 28, 2020

Is our image still derived from the published custom_ops image?

@bhack
Copy link
Contributor

bhack commented Dec 28, 2020

Cause we find bazel 3.7.2 only on nightly custom ops images.

@seanpmorgan seanpmorgan changed the title [WIP] Fix GPU CI Test Fix GPU CI Test - Remove bazelversion file in docker image Dec 29, 2020
@seanpmorgan
Copy link
Member Author

seanpmorgan commented Dec 29, 2020

@WindQAQ @bhack
It appears this issue was originating form a script that we have no visibility into within kokoro:

+ echo KOKORO_FOUNDRY_BACKEND_ADDRESS: ''
KOKORO_FOUNDRY_BACKEND_ADDRESS:
+ source /tmpfs/src/piper/google3/learning/brain/testing/tensorflow_addons/kokoro/common.sh
++ LATEST_BAZEL_VERSION=3.7.2
+ update_bazel_linux

Ultimately we don't need a .bazelverison inside of docker containers that already have bazel installed (but our bazelisk installs should match the version) so this change will fix the issue.

Is our image still derived from the published custom_ops image?

Currently we're using the no-SLA image that builds the TF pip package (AFAIK) and is also being used by tensorflow-io. The custom-op image for TF2.4 should work but that was built after we we're already testing release candidates. I've set time aside tomorrow to create an issue on the current state of those containers and see if we can get the TF build team to consolidate on a modular set of Dockerfiles.

@seanpmorgan seanpmorgan changed the title Fix GPU CI Test - Remove bazelversion file in docker image [WIP]Fix GPU CI Test - Remove bazelversion file in docker image Dec 29, 2020
@seanpmorgan seanpmorgan changed the title [WIP]Fix GPU CI Test - Remove bazelversion file in docker image Fix GPU CI Test - Remove bazelversion file in docker image Dec 29, 2020
@WindQAQ WindQAQ self-requested a review December 29, 2020 04:50
Copy link
Member

@WindQAQ WindQAQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigation!

@WindQAQ WindQAQ merged commit 2dc6681 into tensorflow:master Dec 29, 2020
@seanpmorgan seanpmorgan deleted the fix-gpu-cloud-test branch December 29, 2020 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants