Skip to content
This repository has been archived by the owner on Mar 17, 2021. It is now read-only.

Che 7 startup should be faster on che.openshift.io #1011

Closed
8 of 20 tasks
ibuziuk opened this issue Oct 24, 2018 · 12 comments
Closed
8 of 20 tasks

Che 7 startup should be faster on che.openshift.io #1011

ibuziuk opened this issue Oct 24, 2018 · 12 comments

Comments

@ibuziuk
Copy link
Member

ibuziuk commented Oct 24, 2018

Info about Che 7 workspace startup (ephemeral, but without eclipse-che/che#11786 so normal PVC is still used for broker):

  • Broker start: ~7.5 seconds average
  • Pulling images: ~100 seconds on average (che-dev, che-machine-exec, che-hello, che-theia).
  • Start Theia: ~10 seconds, sometimes quite a bit longer
    Che7

Areas for improvement:

Noteworthy comment from @amisevsk:

To add to the Che 7 part: Without image pull times, starting the back end components could be done in around 15 seconds without changing anything -- the charts here show that without the pulls, the only significant action is Theia start. Note also that the pinkish block (plugin broker) should already be much shorter since the majority of the wait there is for the PVC mount, which is fixed by eclipse-che/che#11657

@l0rd
Copy link
Contributor

l0rd commented Nov 9, 2018

@ibuziuk @garagatyi @gorkem @amisevsk copying here a comment from #1012 (comment) about what I think should be the next things to look at. We could use this epic to discuss new ideas to improve Che 7 startup on OSIO:

  1. Images pulling: @amisevsk analysis shows that this is by far our bottleneck. The good news is that pulling an image is, sometimes, really fast. If we understand why pulling is fast we may be able to make it always fast! We need to continue looking at it: it's critical.
  2. Load plain theia first: we currently start Theia after all brokers have been pulled and run, all plugins have been pulled and container created. We need to make it faster. We need to have Theia loaded in the user browser first and, in a second step, start all the rest. Something that comes to my mind to achieve this is to create a first k8s deployment with only one pod/container/service (theia) and, after we have started, start a rolling update of it with a new deployment definition that has all the pod/containers/services for the plugins. That's just an idea and I am sure there are other way to achieve that but the goal should be clear: starting a workspace should not take more than starting standalone Theia (5s or so).

Point 1. is specific to OSIO, 2. is not. It's an upstream improvement and would dramatically improve Che UX.

Another thing that we may still look at is reducing images size. This can be achieved using distroless images (or FROM scratch) for brokers, editors and plugins. This has lower priority because if we fix 1. and 2. we would not download the images on OSIO. But anyway its' a quick win, could be a panacea until 1. and 2. get addressed, and would have a great benefit for local Che bootstrap.

@garagatyi
Copy link

@l0rd loading IDE first can improve the experience indeed, but we should implement it in a natural way that is somehow described by a toolings configs. Otherwise, we would have a lot of hardcodings that don't respect other editor implementations.

Here are other ideas on how we can improve WS start time.
We still use a broker to evaluate Theia sidecar and we can actually improve it. If a plugin/editor has only che-plugin.yaml in the archive we can point to it in meta.yaml instead of pointing to the archive with this file. In this case, Che master can resolve sidecar configuration even before it launches brokers.
Another improvement would be to add a mark to a meta.yaml whether the plugin supports running as init container: doesn't change workspace configuration, doesn't require to be processed before IDE start (IDE might allow/not allow adding plugins after IDE start). This is the case with plain Theia plugins.
With these improvements:

  • plain Theia plugins are brokered in init container broker
  • Theia meta.yaml points to a che-plugin.yaml and doesn't need running broker to evaluate its config
    What's left is remote plugins, such as JDT.LS.
    JDT.LS:
  • adds a sidecar to the workspace config
  • changes environment variables set in all the containers in the workspace, so require that these env vars be applied to all the sidecars
  • require unarchiving .thia zip archive to get workspace config changes
    To run it after Theia we would indeed require post restarting of deployments after broker evaluate the influence of it on a workspace.
    But this part is would be hard to implement and can be more complex than implementing syncing.

The simplest implementation of a sync sidecar can be:

  • implement new volume strategy similar to ephemeral one, but would respect a flag from sync sidecar that it needs a real volume
  • implement adding sidecar as a container into the single deployment and share ephemeral volumes, so it can locally sync it with a real gluster volume using local rsync
  • this doesn't block us from separating sidecars to separate deployments if they do not require sharing projects sources or other volumes
  • this doesn't allow us to separate everything to deployments instead of containers in a single pod. But we have this limitation now.
    Next step would be sync sidecar that can run in a separate deployment and sync over the network.
    To implement that we could add a sidecar container with rsync daemon to every tooling or user pod that needs a volume and let separate master rsync sidecar connect to those slaves to sync files to a gluster volume attached to the master rsync sidecar.

@benoitf
Copy link
Contributor

benoitf commented Nov 12, 2018

@garagatyi about init container and broker.
If I run again the same workspace, without having changing anything in my workspace config, will the broker redo everything ? or computed information is stored somewhere and it's re-used directly (no need to run a broker)

@garagatyi
Copy link

For the time being, we do re-run brokering each restart of a workspace. We can implement this approach to speed up the start but there are things that we need to consider before start coding that. Since archive can change without change a URL we should do something with that. Checking checksum (or even archive size) is probably not the best approach because it would require to download the whole archive to evaluate it.
We can declare that any changes in a plugin should be done using the update of a plugin in Che registry, otherwise we would not re-run brokering. Does this policy make sense?
Another problem is that something (user?!) can corrupt plugins files and a user needs an ability to trigger brokering restart to fix that.

@l0rd
Copy link
Contributor

l0rd commented Nov 12, 2018

@garagatyi I think we agree that starting the editor as the first container (even before brokers) is the important goal and we agree on that. And your idea to have the meta.yaml to point to the che-plugin.yaml directly is a good one. But I am a little bit lost with your other proposals: sync sidecar and init container doesn't look simple.

I am still convinced that adding a "fast startup" phase where wsmaster starts the workspace pod with only the editor container (no plugins, no brokers, no sidecars) makes things really simple. Kubernetes rolling update allows a seamless transition from the first workspace pod (with a bare theia) to the second pod (with theia, theia plugins and sidecars).

@garagatyi
Copy link

@l0rd Starting editor first would probably involve major refactoring of the code and flow because it would make workspace start 3 phase flow rather than 2 phase.

  1. Start editor configuration
  2. Start brokers
  3. Start workspace

Even though from a k8s standpoint updating deployment is simple we don't have this flow in Che workspace start flow and we would need some time to implement this flow. I don't think it is a trivial task.

@l0rd
Copy link
Contributor

l0rd commented Nov 12, 2018

@garagatyi but you don't need to change the current 2 phases right? You just need to add a new one. The existing phases should remain untouched.

@garagatyi
Copy link

@l0rd If we want to rollout workspace properly we would need to change the code. To change Theia container config we either need to roll it out or delete deployment, wait it gets deleted, create the new one.

@l0rd
Copy link
Contributor

l0rd commented Nov 12, 2018

To change Theia container config we either need to roll it out or delete deployment, wait it gets deleted, create the new one.

RollingUpdate is the default StrategyType of a Kubernetes Deployment. You should not delete the previous deployment, you should not wait and create a new one, Kubernetes does it for you.

@garagatyi
Copy link

@l0rd unfortunately for that we need to edit deployment and we don't have this idea in Che master code. So, we would still have to rework some code to allow this editing capability. We may try to save just IDs and integrate them into a new deployment config which would be similar to editing deployment. But in this case, our autogenerated k8s service names and OS routes would change their names probably, which might not be tolerated by a client side. In any case, we will have to figure out how we can deal with it without rewriting a lot of code.

@l0rd
Copy link
Contributor

l0rd commented Nov 12, 2018

@garagatyi I guess that re-using the same name for the deployment and theia routes will do the trick.

@ibuziuk
Copy link
Member Author

ibuziuk commented Nov 8, 2019

Closing in favor of the upstream epic - eclipse-che/che#11476
Current data for the last month on Hosted Che:

average workspace startup: 34 seconds / 98.4 % of workspaces started faster than 60 seconds

@ibuziuk ibuziuk closed this as completed Nov 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants