Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3203] Fix DockerOperator & some operator test #4049

Merged
merged 1 commit into from
Oct 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 9 additions & 15 deletions airflow/operators/docker_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class DockerOperator(BaseOperator):
be provided with the parameter ``docker_conn_id``.
:param image: Docker image from which to create the container.
If image tag is omitted, "latest" will be used.
:type image: str
:param api_version: Remote API version. Set to ``auto`` to automatically
detect the server's version.
Expand All @@ -62,7 +63,7 @@ class DockerOperator(BaseOperator):
:type docker_url: str
:param environment: Environment variables to set in the container. (templated)
:type environment: dict
:param force_pull: Pull the docker image on every run. Default is false.
:param force_pull: Pull the docker image on every run. Default is False.
:type force_pull: bool
:param mem_limit: Maximum amount of memory the container can use.
Either a float value, which represents the limit in bytes,
Expand Down Expand Up @@ -187,35 +188,28 @@ def execute(self, context):
tls=tls_config
)

if ':' not in self.image:
image = self.image + ':latest'
else:
image = self.image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these remove? (I can't work out from the diff alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashb , this is because I found out that this will be handled by docker package itself.

Here this operator is using docker.APIClient.create_container(image=...) to create the container under the hood. If the image is just the name withOUT tag, say "fake_image", then the program will go search for "fake_image:latest" by default (of course it will search for the tag given by user if user has specified the tag).

You can try the code below to validate:

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

It will tell you something like "'fake_image:latest' can't be found" (I don't have Docker on the laptop I'm using at this moment so can't provide the exact exception). Please help double-validate.

Cheers

Copy link
Member Author

@XD-DENG XD-DENG Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashb , I have checked on a machine with Docker installed.

Without Tag

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

Result:

ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:latest")

With Tag

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image:version_1")

Result:

ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:version_1")

Conclusion

So as I shared earlier: if tag is omitted for image, "latest" will be used by default; if tag is provided by user, then that tag will be used.

So here in DockerOperator, we don't need to explicitly check & add ":latest".

Copy link
Member

@feluelle feluelle Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @XD-DENG were you really certain about that? Because I noticed something weird when I use for example just r-base as an image name.

The DockerOperator task pulled every image.

r-base                                                     latest              62c848eeb175        2 weeks ago         649MB
r-base                                                     3.5.3               62c848eeb175        2 weeks ago         649MB
r-base                                                     3.5.2               880eb7da671b        5 weeks ago         655MB
r-base                                                     3.5.1               bd9edc1a85ed        4 months ago        712MB
r-base                                                     3.5.0               190658892827        9 months ago        692MB
r-base                                                     3.4.4               d1325eaa28ad        11 months ago       667MB
r-base                                                     3.4.3               d1e1c25485af        13 months ago       670MB
r-base                                                     3.4.2               02d3b7e00020        17 months ago       651MB
r-base                                                     3.4.1               0ab131e275e4        19 months ago       1.18GB
r-base                                                     3.4.0               5a6c58403310        22 months ago       656MB
r-base                                                     3.3.3               88436550cddc        2 years ago         635MB
r-base                                                     3.3.2               38270dc4745b        2 years ago         635MB
r-base                                                     3.3.1               7ba1baf9d8bb        2 years ago         657MB
r-base                                                     3.3.0               8f62a54a58ab        2 years ago         970MB
r-base                                                     3.2.5               ee4ea743f431        2 years ago         1.05GB
r-base                                                     3.2.4               9d9c50e41475        3 years ago         1.07GB
r-base                                                     3.2.3               1844ae3ec8d8        3 years ago         1.03GB
r-base                                                     3.2.2               7201dfdf7e21        3 years ago         1.01GB
r-base                                                     3.2.1               62e43f6f9a91        3 years ago         563MB
r-base                                                     3.2.0               9adf6ba7ea84        3 years ago         552MB
r-base                                                     3.1.3               8b37253aa5ea        4 years ago         517MB
r-base                                                     3.1.2               803d5e0e0278        4 years ago         483MB

pip list shows me docker 3.7.2 in airflow 1.10.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feluelle let me have a quick check and get back here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feluelle may you share your log? You may have logs like Pulling image (xxx) from xxx. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @feluelle the change was made based on the facts below:

  1. For pull: https://docker-py.readthedocs.io/en/stable/api.html#docker.api.image.ImageApiMixin.pull In the example, you can find that the latest tag is used if no tag is provided together with the repo.

  2. For create_container: you can try to code below

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

It will give you error "ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:latest")"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That's not all) I think it is for like 3 images


if self.force_pull or len(self.cli.images(name=image)) == 0:
self.log.info('Pulling docker image %s', image)
for l in self.cli.pull(image, stream=True):
if self.force_pull or len(self.cli.images(name=self.image)) == 0:
self.log.info('Pulling docker image %s', self.image)
for l in self.cli.pull(self.image, stream=True):
output = json.loads(l.decode('utf-8'))
self.log.info("%s", output['status'])

cpu_shares = int(round(self.cpus * 1024))

with TemporaryDirectory(prefix='airflowtmp') as host_tmp_dir:
self.environment['AIRFLOW_TMP_DIR'] = self.tmp_dir
self.volumes.append('{0}:{1}'.format(host_tmp_dir, self.tmp_dir))

self.container = self.cli.create_container(
command=self.get_command(),
cpu_shares=cpu_shares,
environment=self.environment,
host_config=self.cli.create_host_config(
binds=self.volumes,
network_mode=self.network_mode,
shm_size=self.shm_size,
dns=self.dns,
dns_search=self.dns_search),
image=image,
mem_limit=self.mem_limit,
dns_search=self.dns_search,
cpu_shares=int(round(self.cpus * 1024)),
mem_limit=self.mem_limit),
image=self.image,
user=self.user,
working_dir=self.working_dir
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,22 @@ def test_execute(self, client_class_mock, mkdtemp_mock):
client_class_mock.assert_called_with(base_url='unix://var/run/docker.sock', tls=None,
version='1.19')

client_mock.create_container.assert_called_with(command='env', cpu_shares=1024,
client_mock.create_container.assert_called_with(command='env',
environment={
'AIRFLOW_TMP_DIR': '/tmp/airflow',
'UNIT': 'TEST'
},
host_config=host_config,
image='ubuntu:latest',
mem_limit=None, user=None,
user=None,
working_dir='/container/path'
)
client_mock.create_host_config.assert_called_with(binds=['/host/path:/container/path',
'/mkdtemp:/tmp/airflow'],
network_mode='bridge',
shm_size=1000,
cpu_shares=1024,
mem_limit=None,
dns=None,
dns_search=None)
client_mock.images.assert_called_with(name='ubuntu:latest')
Expand Down
File renamed without changes.