GCloud tf-gpu Image and tfa compatability? #676

mwalmsley · 2019-11-06T16:02:10Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 within Docker on Mac
TensorFlow version and how it was installed (source or binary): 2.0-gpu via GCloud image (see below)
TensorFlow-Addons version and how it was installed (source or binary): 0.6.0 via pip
Python version: 3.5
Is GPU used? (yes/no): no

Describe the bug

I'm trying to use tfa on the GCloud tensorflow-gpu image via Docker and I'm having trouble getting it to import. The import fails with the error tensorflow.python.framework.errors_impl.NotFoundError: /root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs.

Looks like some gnarly incompatibility between TF and TFA installs/versions, but the error is way above my head.

Code to reproduce the issue

Dockerfile:
FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-0
...
RUN pip install --upgrade pip
RUN pip install -r requirements.txt # including tensorflow-addons==0.6.0

Python:
import tensorflow_addons as tfa

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

2019-11-06 15:29:43.940868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/init.py", line 21, in <module>
from tensorflow_addons import activations
File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/activations/init.py", line 21, in <module>
from tensorflow_addons.activations.gelu import gelu
File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/activations/gelu.py", line 25, in <module>
get_path_to_datafile("custom_ops/activations/_activation_ops.so"))
File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /root/miniconda3/lib/python3.5/site-packages/tensorflow_addons/custom_ops/activations/_activation_ops.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

The text was updated successfully, but these errors were encountered:

mwalmsley · 2019-11-06T16:18:00Z

To investigate if this error was somehow related to the host machine, I re-ran the same container on a GCloud Compute instance with Ubuntu 18.04, P100 GPU and CUDA 10.0 installed. I get the same import error on that machine as my non-CUDA Mac desktop.

martin-gorner · 2019-11-07T17:56:11Z

I saw the same error on Kaggle when running "pip install tensorflow-addons" in a non-GPU Kaggle notebook. If you enable the GPU, the installation succeeds, probably because the underlying image is different.

seanpmorgan · 2019-11-07T18:16:47Z

Thanks for the info! Yes this is generally because the TensorFlow has a an ABI incompatibility with the way we package our pip wheel. This issue will one day be resolved by this RFC which is already underway:
tensorflow/community#133

In the mean time though we can look at that docker image and see how it was compiling TF. Does anyone know where we can access the Dockerfiles for these?

mwalmsley · 2019-11-11T12:58:50Z

An update - I've changed my setup to use the Tensorflow/CUDA "Deep Learning VM" boot disk from GCP directly (i.e. without docker) and I get the same error. I assume the VM is created the same way as the Dockerfile.

(this was motivated by another op issue in TensorFlow itself, tensorflow/tensorflow#34112 - I'm not having much luck!)

diesendruck · 2019-11-25T23:39:52Z

Got the same error from a SageMaker instance with TF2.0.0.

seanpmorgan · 2019-12-01T21:14:22Z

So I just wanted to comment on this so it's known that this is still being looked at:

At a high level the issue is that we can only upload a single package to pip, so we make that whl compatible with the pip version of tensorflow. Many docker images will choose to compile TF from source (or install from conda) so that TF will be most performant, but that means ABI incompatibilities with the whl we publish.

For example I checked gcr.io/deeplearning-platform-release/tf2-gpu.2-0 image and the default installed tensorflow uses CXX11_ABI_FLAG=1, but the tensorflow pip package / tfa pip package use CXX11_ABI_FLAG=0

docker run --rm gcr.io/deeplearning-platform-release/tf2-gpu.2-0 python -c "import tensorflow as tf; print(tf.sysconfig.CXX11_ABI_FLAG)"
>> 1

The best solution we can provide atm is that for custom TF installs (as found in these docker images) you'll want to compile TFA from source so that it matches the installed version of TF. You can see in our configure script that we compile TFA based on the installed TF compile flags. There are some other build issues we have (look for a refactor issue to be posted today/tomorrow), so this might still run into issues for the GPU version built from source.

Once we get ABI stable custom-ops (see RFC mentioned previously) this might be a non-issue, but at a minimum we would like it so compiling from source is always a workable solution.

martin-gorner · 2019-12-10T20:26:45Z

is the RFC #133 planned to be included in TF 2.1 ?

seanpmorgan · 2019-12-10T20:52:02Z

is the RFC #133 planned to be included in TF 2.1 ?

Unfortunately no. It's being actively worked on and I believe a stable API has been built for CPU Only kernels but they're working on CUDA kernels.

seanpmorgan · 2020-01-31T14:54:30Z

Consolidating in #987

seanpmorgan added bug Something isn't working build labels Nov 6, 2019

seanpmorgan mentioned this issue Dec 10, 2019

_activation_ops.so: undefined symbol error when importing tensorflow_addons #753

Closed

seanpmorgan mentioned this issue Dec 20, 2019

File not found during import #792

Closed

martham93 mentioned this issue Dec 20, 2019

[question] driver support for cuda 10? kubeflow/kubeflow#2573

Closed

seanpmorgan mentioned this issue Jan 10, 2020

Lazy Load Custom Ops #855

Closed

Om-Pandey mentioned this issue Jan 28, 2020

Seq2seq NMT tutorial needs refactoring #935

Closed

seanpmorgan mentioned this issue Jan 31, 2020

Custom Op Linux ABI Incompatibility: Undefined Symbol #987

Closed

seanpmorgan closed this as completed Jan 31, 2020

alexblnn mentioned this issue Dec 27, 2020

NotFoundError when running example faustomorales/vit-keras#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCloud tf-gpu Image and tfa compatability? #676

GCloud tf-gpu Image and tfa compatability? #676

mwalmsley commented Nov 6, 2019 •

edited by Squadrick

Loading

mwalmsley commented Nov 6, 2019

martin-gorner commented Nov 7, 2019

seanpmorgan commented Nov 7, 2019

mwalmsley commented Nov 11, 2019

diesendruck commented Nov 25, 2019

seanpmorgan commented Dec 1, 2019

martin-gorner commented Dec 10, 2019

seanpmorgan commented Dec 10, 2019

seanpmorgan commented Jan 31, 2020

GCloud tf-gpu Image and tfa compatability? #676

GCloud tf-gpu Image and tfa compatability? #676

Comments

mwalmsley commented Nov 6, 2019 • edited by Squadrick Loading

mwalmsley commented Nov 6, 2019

martin-gorner commented Nov 7, 2019

seanpmorgan commented Nov 7, 2019

mwalmsley commented Nov 11, 2019

diesendruck commented Nov 25, 2019

seanpmorgan commented Dec 1, 2019

martin-gorner commented Dec 10, 2019

seanpmorgan commented Dec 10, 2019

seanpmorgan commented Jan 31, 2020

mwalmsley commented Nov 6, 2019 •

edited by Squadrick

Loading