Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 100-node higher QPS limit scheduler job #20023

Merged
merged 2 commits into from
Dec 21, 2020

Conversation

adtac
Copy link
Member

@adtac adtac commented Nov 23, 2020

/hold until kubernetes/kubernetes#96813 is merged

So sig/scheduling has made numerous improvements to the scheduler over the past few releases, but these improvements have not been realised in practice due to the scheduler's QPS limits. With API priority and fairness being graduated to beta in 1.20 (kubernetes/kubernetes#96527), we now have a path towards safely removing the scheduler's rate limiting in the 1.21+.

Of course, this would need to be done slowly. @ahg-g and I discussed this offline and we believe that increasing the scheduler's QPS limit (not removing) in 1.21 is the first step. Once things are observed to be stable over a couple of releases, the client-side rate limiting will be entirely limit and kube-scheduler will depend on APF to make sure it doesn't overload the API server.

While I have done some preliminary benchmarking of APF with higher scheduler QPS limits (not public yet, will share with everyone in a short while), there would be more confidence in this approach if there were public, periodic benchmarks that run on test-infra. The results could be displayed in perf-dash. These benchmarks should run on the higher QPS limit. The job should be carefully observed for signs of instability. The job would be temporary and is only expected to run for a 3-4 releases; once the scheduler's client-side QPS limits are removed, just the regular job (ci-kubernetes-kubemark-100-gce) would be sufficient.

A separate job is needed to do this because all changes happen during cluster creation. As a result, existing jobs cannot be re-used for this.

/sig scheduling
cc @ahg-g

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 23, 2020
@adtac adtac force-pushed the scheduler-200qps branch 2 times, most recently from 920b451 to ebe13eb Compare December 1, 2020 14:33
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
@adtac
Copy link
Member Author

adtac commented Dec 7, 2020

@wojtek-t updated this PR to use --env=CONTROLLER_MANAGER_TEST_ARGS and --env=SCHEDULER_TEST_ARGS (ref: kubernetes/kubernetes#96813 (comment))

@adtac
Copy link
Member Author

adtac commented Dec 14, 2020

/assign @wojtek-t

preset-service-account: "true"
preset-k8s-ssh: "true"
preset-dind-enabled: "true"
preset-e2e-kubemark-common: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with starting with kubemark, but eventually we should consider migrating to real clusters. Kubemark is still visibly different than real clusters (easier from the point-of-view of control-plane).

@wojtek-t
Copy link
Member

@adtac - next time please ping me if I'm not responding for week+.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 16, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adtac, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2020
@adtac
Copy link
Member Author

adtac commented Dec 21, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 21, 2020
@k8s-ci-robot k8s-ci-robot merged commit aafaaeb into kubernetes:master Dec 21, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Dec 21, 2020
@k8s-ci-robot
Copy link
Contributor

@adtac: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key sig-scalability-periodic-jobs.yaml using file config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml

In response to this:

/hold until kubernetes/kubernetes#96813 is merged

So sig/scheduling has made numerous improvements to the scheduler over the past few releases, but these improvements have not been realised in practice due to the scheduler's QPS limits. With API priority and fairness being graduated to beta in 1.20 (kubernetes/kubernetes#96527), we now have a path towards safely removing the scheduler's rate limiting in the 1.21+.

Of course, this would need to be done slowly. @ahg-g and I discussed this offline and we believe that increasing the scheduler's QPS limit (not removing) in 1.21 is the first step. Once things are observed to be stable over a couple of releases, the client-side rate limiting will be entirely limit and kube-scheduler will depend on APF to make sure it doesn't overload the API server.

While I have done some preliminary benchmarking of APF with higher scheduler QPS limits (not public yet, will share with everyone in a short while), there would be more confidence in this approach if there were public, periodic benchmarks that run on test-infra. The results could be displayed in perf-dash. These benchmarks should run on the higher QPS limit. The job should be carefully observed for signs of instability. The job would be temporary and is only expected to run for a 3-4 releases; once the scheduler's client-side QPS limits are removed, just the regular job (ci-kubernetes-kubemark-100-gce) would be sufficient.

A separate job is needed to do this because all changes happen during cluster creation. As a result, existing jobs cannot be re-used for this.

/sig scheduling
cc @ahg-g

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- --scenario=kubernetes_e2e
- --
- --cluster=kubemark-100-scheduler-highqps
- --build=bazel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to be checking out kubernetes and doing a source build here?
most CI jobs consume our existing builds (--extract=ci/latest or similar)

this is cheaper and faster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, but IIRC to test the PR I had to build from my fork and I must have forgotten to change back to --extract=ci/latest. I'll open a PR to switch to reuse builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack thanks!
I'm working on switching over how we build (bazel=>quick) but suspected this one shouldn't be building anyhow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#21062 covers this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @BenTheElder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants