-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCloud tf-gpu Image and tfa compatability? #676
Comments
To investigate if this error was somehow related to the host machine, I re-ran the same container on a GCloud Compute instance with Ubuntu 18.04, P100 GPU and CUDA 10.0 installed. I get the same import error on that machine as my non-CUDA Mac desktop. |
I saw the same error on Kaggle when running "pip install tensorflow-addons" in a non-GPU Kaggle notebook. If you enable the GPU, the installation succeeds, probably because the underlying image is different. |
Thanks for the info! Yes this is generally because the TensorFlow has a an ABI incompatibility with the way we package our pip wheel. This issue will one day be resolved by this RFC which is already underway: In the mean time though we can look at that docker image and see how it was compiling TF. Does anyone know where we can access the Dockerfiles for these? |
An update - I've changed my setup to use the Tensorflow/CUDA "Deep Learning VM" boot disk from GCP directly (i.e. without docker) and I get the same error. I assume the VM is created the same way as the Dockerfile. (this was motivated by another op issue in TensorFlow itself, tensorflow/tensorflow#34112 - I'm not having much luck!) |
Got the same error from a SageMaker instance with TF2.0.0. |
So I just wanted to comment on this so it's known that this is still being looked at: At a high level the issue is that we can only upload a single package to pip, so we make that whl compatible with the pip version of tensorflow. Many docker images will choose to compile TF from source (or install from conda) so that TF will be most performant, but that means ABI incompatibilities with the whl we publish. For example I checked
The best solution we can provide atm is that for custom TF installs (as found in these docker images) you'll want to compile TFA from source so that it matches the installed version of TF. You can see in our configure script that we compile TFA based on the installed TF compile flags. There are some other build issues we have (look for a refactor issue to be posted today/tomorrow), so this might still run into issues for the GPU version built from source. Once we get ABI stable custom-ops (see RFC mentioned previously) this might be a non-issue, but at a minimum we would like it so compiling from source is always a workable solution. |
is the RFC #133 planned to be included in TF 2.1 ? |
Unfortunately no. It's being actively worked on and I believe a stable API has been built for CPU Only kernels but they're working on CUDA kernels. |
Consolidating in #987 |
System information
Describe the bug
I'm trying to use tfa on the GCloud tensorflow-gpu image via Docker and I'm having trouble getting it to import. The import fails with the error
tensorflow.python.framework.errors_impl.NotFoundError: /root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs
.Looks like some gnarly incompatibility between TF and TFA installs/versions, but the error is way above my head.
Code to reproduce the issue
Python:
import tensorflow_addons as tfa
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: