Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Need to compare Che 7 workspace startup time when using images from docker registry deployed in the user namespace in comparison with dockerhub / quay #1014

Closed
ibuziuk opened this issue Oct 24, 2018 · 12 comments
Assignees

Comments

@ibuziuk
Copy link
Member

ibuziuk commented Oct 24, 2018

The idea is to deploy docker registry to the user namespace (similar approach is used for deploying custom che-plugin-registry [1] in the user namespace and using it instead of the default plugin registry deployed on OSD) and start Che 7 workpace based on the images from this custom che-docker-registry. After that it would be required to compare workspace startup time with the setup when Che 7 images are used from dockerhub / quay.

NOTE: need to compare not only image pulling time, but rather pod startup time in general. There is an assumption that image extraction could be much faster when using custom docker registry in comparison with dockerhub / quay.

[1] https://youtu.be/v92h4JV-bS8

@ibuziuk
Copy link
Member Author

ibuziuk commented Oct 24, 2018

@l0rd maybe you have some hints how it should be possible to deploy custom docker registry to the user namespace on oso ?

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 5, 2018

@amisevsk could you please also provide the list of different operations during workspace startup that take most time ?

@riuvshin
Copy link
Contributor

riuvshin commented Nov 8, 2018

@ibuziuk Can you please elaborate on why do you think that extraction of image pulled from custom registry would be faster? AFAIK once image layers are downloaded from the registry extraction is a local process and I doubt it communicates back to the registry during extraction.


Another question, don't you think that having proxy registry would slow down ws start up?
I mean proxy registry would need to make sure that image is up to date first and then allow it tobe pulled, otherwise how you would keep intermediate registry up-to-date?
I can imagine you can have some task that would periodically run to pull images from quay / docker.io
but that hack would work only for images that we know... and will not help for any user custom image...
So I have doubts that this approach can speed up ws start up and from what I see it can may even make things worst...

Maybe I don't know something so I would be really happy if you can explain how that supposed to speed up general WS startup time?

@amisevsk
Copy link
Collaborator

amisevsk commented Nov 8, 2018

@riuvshin The answer is that it doesn't, after all :). I think that the idea was to try and avoid some of the issues involved in the AlwaysPullImages admission controller.

In short: there's less than 5% difference between the two cases.

@amisevsk
Copy link
Collaborator

amisevsk commented Nov 8, 2018

A general breakdown of start time:

Che 7 Workspaces (ephemeral, but without eclipse-che/che#11786 so normal PVC is still used for broker):

  • Broker start: ~7.5 seconds average
  • Pulling images: ~100 seconds on average (che-dev, che-machine-exec, che-hello, che-theia).
  • Start Theia: ~10 seconds, sometimes quite a bit longer
    Che7

Che 6 workspaces (ephemeral):

  • Setting up container: ~3 seconds
  • Pulling image: ~20 seconds average (image is centos_jdk8)
  • Starting agents: ~25 seconds (exec, terminal, wsagent) -- sometimes much longer for unclear reasons)
    Che6

Also note that sometimes pulling takes no time at all -- not clear on why this happens, as it definitely should not be.

@amisevsk
Copy link
Collaborator

amisevsk commented Nov 8, 2018

Closing as I think all information we're going to get at the moment is here, and there's no PR to be made.

@amisevsk amisevsk closed this as completed Nov 8, 2018
@amisevsk
Copy link
Collaborator

amisevsk commented Nov 8, 2018

Also as an example of how the AlwayPull admission control affects Che startup: The quickest Che 7 startup time was 28 seconds (where no actual pull seems to have happened). The slowest Che 7 start (with the pull) was 138 seconds. In both cases, merging eclipse-che/che#11786 should decrease this time by ~5-6 seconds, for an even more dramatic difference.

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 8, 2018

@riuvshin when we discussed this with @l0rd there was just an assumption that potentially extraction of
could be much faster and we wanted to check it (but this does not seem to be the case)

@amisevsk thanks for the data provided, but as I understand those graphs represent the startup time when images are pulled not from a local registry deployed in the user namespace, but from external one, right? Would it be possible to provide some data about workspace startup for local vs external registries or there is no difference at all ?

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 8, 2018

Also as an example of how the AlwayPull admission control affects Che startup: The quickest Che 7 startup time was 28 seconds (where no actual pull seems to have happened)

I also noticed that sometimes pulling is blazingly fast and sometimes really slow. I commented about it in the HK issue [1], but there has been no answer so far. @riuvshin maybe you can clarify how it is possible that pulling times vary so dramatically for the same image (workspace staring is done by the same user, in the same namespace) ?

[1] https://gitlab.cee.redhat.com/dtsd/housekeeping/issues/2350#note_432591

@gorkem
Copy link
Contributor

gorkem commented Nov 8, 2018

I think the pull variance is because of the node level caches. So if the startup of a workspace happens on a node that has pulled an image earlier the image is served from the node's cache.

@l0rd
Copy link
Contributor

l0rd commented Nov 9, 2018

@gorkem what is the node level cache? You mean the local docker repository of a node? If this is the case we should be able prepopulate it. @amisevsk could you try to verify this hypothesis somehow? The images pulled are always the same so if they can be cached somehow we need to do it aggressively on all tenants nodes.

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 13, 2018

Separate issue for making image pulling phase blazingly fast for predefined stacks have been created, so we can continue discussion there - #1056

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants