[WIP] Activator daemonset #2557

greghaynes · 2018-11-27T22:00:04Z

Proposed Changes

Run activator as a daemonset
Create handler for activator which creates a revision pod using the static-pod kubelet api.

Release Note

Local scheduling of pods on cold-start for better cold-start fastpathing.

knative-prow-robot · 2018-11-27T22:00:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: greghaynes
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: mattmoor

If they are not already assigned, you can assign the PR to them by writing /assign @mattmoor in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

knative-prow-robot

@greghaynes: 2 warnings.

In response to this:

Fixes #2497
Fixes #2498

Proposed Changes

Run activator as a daemonset

Create handler for activator which creates a revision pod using the static-pod kubelet api.

Release Note
Local scheduling of pods on cold-start for better cold-start fastpathing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot · 2018-11-27T22:00:18Z

pkg/activator/handler/localscheduler.go

+	"github.com/knative/serving/pkg/localscheduler"
+)
+
+type LocalSchedulerHandler struct {


Golint comments: exported type LocalSchedulerHandler should have comment or be unexported. More info.

knative-prow-robot · 2018-11-27T22:00:18Z

pkg/localscheduler/schedule.go

+	}
+}
+
+func Schedule(rev *servingv1.Revision, logger *zap.SugaredLogger) error {


Golint comments: exported function Schedule should have comment or be unexported. More info.

greghaynes · 2018-11-27T22:00:28Z

This still has a ways to go - meant as a strawman POC to explore the implementation.

mrmcmuffinz · 2018-11-28T19:34:48Z

config/activator.yaml

-            name: config-observability
+      - name: localpods
+        hostPath:
+          path: /etc/kubernetes/manifests


Is this safe, what I mean is would this be considered a security risk? Don't we have the k8s backplane and all that jazz in that directory?

Also another question most notably due to my ignorance here but do we need a localmount PV for this?

I am not sure we need a pv since we dont really need to do some magic to persist the data - the hypervisor does it for us. Theres definitely some issues that need to be worked out here though:

This path isn't something we can depend on (its operator specific / set via kubelet config).

As you point out this allows us to create (and read) arbitrary static pod definitions on the cluster. Potential security implications here.

mrmcmuffinz · 2018-11-28T19:42:00Z

pkg/localscheduler/schedule.go

+
+	f, err := os.Create(fmt.Sprintf("/etc/localpods/%s", podName))
+	if err != nil {
+		logger.Errorf("Opening local pod definition (%q): %v", rev.Name, err)


If there was an error in creating the file, shouldn't we bail out here?

The localscheduler creates a pod using the static pod kubelet API immediately upon activator receiving a request for a revision.

knative-prow-robot · 2018-11-28T23:25:19Z

@greghaynes: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-knative-serving-go-coverage	`3187edc`	link	`/test pull-knative-serving-go-coverage`
pull-knative-serving-build-tests	`3187edc`	link	`/test pull-knative-serving-build-tests`
pull-knative-serving-upgrade-tests	`3187edc`	link	`/test pull-knative-serving-upgrade-tests`
pull-knative-serving-unit-tests	`3187edc`	link	`/test pull-knative-serving-unit-tests`
pull-knative-serving-integration-tests	`3187edc`	link	`/test pull-knative-serving-integration-tests`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

greghaynes · 2018-11-29T01:23:40Z

I did some testing of this locally on a 1 worker cluster (with a separate controller node) using calico and got cold-start times of about 2.5 seconds. On this same cluster using knative master with istio sidecar injection disabled I got a cold start time of about 6 seconds. This test was performed by running time curl <hello-world-go endpoint> after the application had scaled down to 0.

Raw data:

Daemonset + static-pod cold start times:

2.364
2.426
2.304

Using master (25656e4):

5.966
5.605
6.372

Some details of what this patch does currently:

This runs the activator as a daemonset. Once a request is received it writes out a pod using the static-pod definition API (https://kubernetes.io/docs/tasks/administer-cluster/static-pod/). This pod contains only a user container (no queue-proxy nor istio-proxy). A watch is also created for the creation of this pod, once it is seen with Status.Phase == "Running" a request is made to Status.PodIP of the created pod.

tcnghia · 2018-11-29T01:50:55Z

@greghaynes can you please elaborate on where the gain in cold start latency would come from?

greghaynes · 2018-11-29T01:55:44Z

@tcnghia Off the top of my head - we eliminate the autoscaler waiting to detect the request via stat, knative controllers operating on revision being set active, the k8s replicaset controller, k8s scheduler, any potential queue-proxy startup cost.

Its very possible im missing some other things were avoiding here too if anyone can think of others...

markusthoemmes · 2018-11-29T06:53:34Z

You said the 6 seconds is with istio-injection? What's that number without istio injection for comparison?

greghaynes · 2018-11-29T06:55:39Z

@markusthoemmes The 6 seconds is without istio injection

markusthoemmes · 2018-11-29T06:59:48Z

@greghaynes Whoops, sorry. I shall not read complicated issues before 8am.

That's a fairly big shave off in cold-start time indeed. Interesting stuff, thanks for the protoyping 🎉

mattmoor · 2018-11-29T14:52:30Z

A watch is also created

I assume you mean via API Server. Couldn't the pod be ready before it's reflected in the API?

I know the kubelet handles readiness probes, perhaps we could have our local daemonset find/probe the user container directly (as an approximation of what a kubelet-level API might give).

no queue-proxy nor istio-proxy

Why no queue-proxy?

mattmoor · 2018-11-29T14:52:36Z

Also 🎉

greghaynes · 2018-11-29T15:14:18Z

@mattmoor Yes, I expect the pod is ready earlier than we see via the API as our first request to the pod succeeds. I'm not sure what you have in mind about finding the pod directly: I don't know of an API kubelet exposes for us to read directly from it and any type of checking the underlying container system I think will be a pain to support with the various CRI backends. Maybe we could add another container to the pod which requests out to the activator on startup to signal it is ready...

The main reason for not having queue-proxy is I just didn't bother to add it to the pod definition :). I doubt there's much cost to having it so this isn't a big issue but I am not sure it buys us anything here as our activator is/can be essentially doing the same job.

mattmoor · 2018-11-29T16:29:56Z

@greghaynes Having the queue-proxy would enable us to have cooperative readiness without too much CRI-incompatible hackery. If we have the queue-proxy report readiness to the DaemonSet on startup (after doing it's own readiness probing of the user container), then the activator would effectively know in ~realtime that the pod is ready for traffic.

greghaynes · 2018-11-29T16:31:53Z

@mattmoor Ah, good idea - I was thinking of a similar pattern but putting it in queue proxy makes a lot of sense.

josephburnett · 2018-12-01T00:35:21Z

@greghaynes, #2557 (comment) this is really great to see. At the end of the day, the work that needs to be done (if the container is disk warm) is two network hops and starting a process. Client to ingress, ingress to activator. This should be 100s of ms. Anything more is an opportunity for optimization.

So we should keep digging until we understand exactly where all this extra time comes from (docker, k8s controllers, retries ...). If we find the k8s control plane is getting in our way, which is our current theory, then we may indeed need local scheduling support in k8s. But we'll want to run it by SIG Arch first.

greghaynes · 2018-12-03T18:11:36Z

@josephburnett I agree we should isolate this down to us/the upstream where its being introduced. I think docker is out of the picture here as all of that happens within kubelet (although they are likely a large part of the remaining 2.5 cold-start delay). AFAIK the potential sources here are: autoscaler delay detecting request stat > 0; knative controllers (I think just revision); k8s controllers (deployment, replication, scheduler). I can try whipping up a test to get a good measurement for just scaling a deployment from 0 then we see what part of this time savings is knative vs k8s.

Once we get some validation that there's k8s control plane delay here I think we'll need to do our own local workaround in parallel to driving better api support upstream - I worry that the timeline for being ably to get and rely on any fix we drive in to k8s is going to be very long (not only do we have to actually get the changes accepted/released, we also have to wait for k8s platforms to roll out the new k8s). ISTM while we are driving these changes upstream local scheduling should be an optional/feature-flagged thing we support as we know its not a great api to rely on and we can then make it opt-in + marked as unstable?

josephburnett · 2018-12-03T21:43:49Z

@greghaynes sounds good.

FYI: there is no reason for the autoscaler to wait for activator to update the stat. The activator should push a stat immediately upon getting a request from a revision scaled to zero (like this) and the autoscaler should immediately recompute the desired scale for the same (like this).

Some of this was just added latency from moving activation to the autoscaler. Previously the activator flipped serving state which immediately trigger scale up.

josephburnett · 2018-12-04T19:16:31Z

@greghaynes another middle-ground option you might try is to have the Activator create the Pod by calling the Master, but populating the Spec.NodeName with the current node and Spec.SchedulerName="Activator". So the Activator does the scheduling but loops in the master so it has a chance to do Envoy injection. Then the Activator can just wait for the Kubelet to pick up and start the Pod, then proxy traffic to it.

This might reduce the difference between the two serving paths while maintaining the local-scheduling decision.

greghaynes · 2018-12-11T16:22:29Z

I finally got some time to dig on where the delay is coming from. I upgraded my cluster and reran the previous testing and things are overall a bit faster afterwards. I was seeing avg cold start times (using plain master) of about 3.6s and with this patch rebased on master was seeing about 1.8s (so still ~50% or 1.8s gains). Looking at logs of the plain master run, out of the 3 test runs an average of 1.6s was spent between the time when ingressgateway saw the request and when autoscaler attempted to scale from 0 to 1. This means #2659 and #2615 should get us ~90% of the way to the perf we see here.

markusthoemmes · 2018-12-11T18:37:24Z

Neato, thanks for the insights @greghaynes. Glad to hear we can drive things into the existing architecture without having to work around too much.

mattmoor · 2018-12-14T04:45:33Z

@greghaynes Were you able to prototype the tighter feedback loop from the queue-proxy?

greghaynes · 2018-12-15T18:16:46Z

@mattmoor We created #2615 for that issue and @imikushin mentioned hed like to poke at it

markusthoemmes · 2019-04-15T08:49:16Z

@greghaynes could you give a status update on this one? Which state is it in, is it being worked on actively?

greghaynes · 2019-04-15T16:20:04Z

@markusthoemmes Hey, sure. My plan was to get tracing (#2726 and #3491) merged first. Then to get some new data on what this actually buys us in terms of performance to send upstream and ask sig-arch/k8s if anyone had any thoughts. Based on my one-off measurements this doesn't buy us a lot of performance in the cluster without load case but it is needed because controlplane latency is extremely variable based on load. Beyond that planning there hasn't been any progress on this as tracing has taken (much) longer than expected to get merged...

markusthoemmes · 2019-07-09T08:33:32Z

@greghaynes are we still following through on this? We have activator scalability coming up on the scaling roadmap and I was wondering about this PR in relation to that.

markusthoemmes · 2019-08-07T14:20:55Z

I feel like this is vastly outdated and the method is kinda accepted as one of autoscaling's feature streams forward. I expect a lot of PRs falling out of that work that'll supersede this one so I'll naively close it for now.

/close

If you feel differently, please feel free to reopen @greghaynes

knative-prow-robot · 2019-08-07T14:20:58Z

@markusthoemmes: Closed this PR.

In response to this:

I feel like this is vastly outdated and the method is kinda accepted as one of autoscaling's feature streams forward. I expect a lot of PRs falling out of that work that'll supersede this one so I'll naively close it for now.

/close

If you feel differently, please feel free to reopen @greghaynes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Move activator to daemonset

9fb0064

knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2018

knative-prow-robot requested review from mdemirhan and tcnghia November 27, 2018 22:00

knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 27, 2018

knative-prow-robot reviewed Nov 27, 2018

View reviewed changes

mrmcmuffinz reviewed Nov 28, 2018

View reviewed changes

Add localscheduler to activator

3187edc

The localscheduler creates a pod using the static pod kubelet API immediately upon activator receiving a request for a revision.

greghaynes force-pushed the activator-daemonset branch from e8a63c0 to 3187edc Compare November 28, 2018 23:24

greghaynes mentioned this pull request Dec 3, 2018

Have queue-proxy trigger activator directly #2615

Closed

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Apr 15, 2019

knative-prow-robot closed this Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Activator daemonset #2557

[WIP] Activator daemonset #2557

greghaynes commented Nov 27, 2018

knative-prow-robot commented Nov 27, 2018

knative-prow-robot left a comment

knative-prow-robot Nov 27, 2018

knative-prow-robot Nov 27, 2018

greghaynes commented Nov 27, 2018

mrmcmuffinz Nov 28, 2018

mrmcmuffinz Nov 28, 2018

greghaynes Nov 29, 2018

mrmcmuffinz Nov 28, 2018

knative-prow-robot commented Nov 28, 2018

greghaynes commented Nov 29, 2018 •

edited

Loading

tcnghia commented Nov 29, 2018

greghaynes commented Nov 29, 2018

markusthoemmes commented Nov 29, 2018

greghaynes commented Nov 29, 2018

markusthoemmes commented Nov 29, 2018

mattmoor commented Nov 29, 2018

mattmoor commented Nov 29, 2018

greghaynes commented Nov 29, 2018

mattmoor commented Nov 29, 2018

greghaynes commented Nov 29, 2018

josephburnett commented Dec 1, 2018

greghaynes commented Dec 3, 2018

josephburnett commented Dec 3, 2018

josephburnett commented Dec 4, 2018

greghaynes commented Dec 11, 2018

markusthoemmes commented Dec 11, 2018

mattmoor commented Dec 14, 2018

greghaynes commented Dec 15, 2018

markusthoemmes commented Apr 15, 2019

greghaynes commented Apr 15, 2019

markusthoemmes commented Jul 9, 2019

markusthoemmes commented Aug 7, 2019

knative-prow-robot commented Aug 7, 2019

[WIP] Activator daemonset #2557

[WIP] Activator daemonset #2557

Conversation

greghaynes commented Nov 27, 2018

Proposed Changes

knative-prow-robot commented Nov 27, 2018

knative-prow-robot left a comment

Choose a reason for hiding this comment

Proposed Changes

knative-prow-robot Nov 27, 2018

Choose a reason for hiding this comment

knative-prow-robot Nov 27, 2018

Choose a reason for hiding this comment

greghaynes commented Nov 27, 2018

mrmcmuffinz Nov 28, 2018

Choose a reason for hiding this comment

mrmcmuffinz Nov 28, 2018

Choose a reason for hiding this comment

greghaynes Nov 29, 2018

Choose a reason for hiding this comment

mrmcmuffinz Nov 28, 2018

Choose a reason for hiding this comment

knative-prow-robot commented Nov 28, 2018

greghaynes commented Nov 29, 2018 • edited Loading

tcnghia commented Nov 29, 2018

greghaynes commented Nov 29, 2018

markusthoemmes commented Nov 29, 2018

greghaynes commented Nov 29, 2018

markusthoemmes commented Nov 29, 2018

mattmoor commented Nov 29, 2018

mattmoor commented Nov 29, 2018

greghaynes commented Nov 29, 2018

mattmoor commented Nov 29, 2018

greghaynes commented Nov 29, 2018

josephburnett commented Dec 1, 2018

greghaynes commented Dec 3, 2018

josephburnett commented Dec 3, 2018

josephburnett commented Dec 4, 2018

greghaynes commented Dec 11, 2018

markusthoemmes commented Dec 11, 2018

mattmoor commented Dec 14, 2018

greghaynes commented Dec 15, 2018

markusthoemmes commented Apr 15, 2019

greghaynes commented Apr 15, 2019

markusthoemmes commented Jul 9, 2019

markusthoemmes commented Aug 7, 2019

knative-prow-robot commented Aug 7, 2019

greghaynes commented Nov 29, 2018 •

edited

Loading