-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 100-node higher QPS limit scheduler job #20023
Conversation
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Outdated
Show resolved
Hide resolved
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Show resolved
Hide resolved
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Outdated
Show resolved
Hide resolved
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Show resolved
Hide resolved
920b451
to
ebe13eb
Compare
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Outdated
Show resolved
Hide resolved
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Outdated
Show resolved
Hide resolved
config/jobs/kubernetes/sig-scalability/sig-scalability-periodic-jobs.yaml
Outdated
Show resolved
Hide resolved
ebe13eb
to
871b590
Compare
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
871b590
to
a684db7
Compare
Signed-off-by: Adhityaa Chandrasekar <adtac@google.com>
a684db7
to
0204e3e
Compare
@wojtek-t updated this PR to use |
/assign @wojtek-t |
preset-service-account: "true" | ||
preset-k8s-ssh: "true" | ||
preset-dind-enabled: "true" | ||
preset-e2e-kubemark-common: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with starting with kubemark, but eventually we should consider migrating to real clusters. Kubemark is still visibly different than real clusters (easier from the point-of-view of control-plane).
@adtac - next time please ping me if I'm not responding for week+. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adtac, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
@adtac: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- --scenario=kubernetes_e2e | ||
- -- | ||
- --cluster=kubemark-100-scheduler-highqps | ||
- --build=bazel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to be checking out kubernetes and doing a source build here?
most CI jobs consume our existing builds (--extract=ci/latest or similar)
this is cheaper and faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, but IIRC to test the PR I had to build from my fork and I must have forgotten to change back to --extract=ci/latest. I'll open a PR to switch to reuse builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack thanks!
I'm working on switching over how we build (bazel=>quick) but suspected this one shouldn't be building anyhow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#21062 covers this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @BenTheElder!
/hold until kubernetes/kubernetes#96813 is merged
So sig/scheduling has made numerous improvements to the scheduler over the past few releases, but these improvements have not been realised in practice due to the scheduler's QPS limits. With API priority and fairness being graduated to beta in 1.20 (kubernetes/kubernetes#96527), we now have a path towards safely removing the scheduler's rate limiting in the 1.21+.
Of course, this would need to be done slowly. @ahg-g and I discussed this offline and we believe that increasing the scheduler's QPS limit (not removing) in 1.21 is the first step. Once things are observed to be stable over a couple of releases, the client-side rate limiting will be entirely limit and kube-scheduler will depend on APF to make sure it doesn't overload the API server.
While I have done some preliminary benchmarking of APF with higher scheduler QPS limits (not public yet, will share with everyone in a short while), there would be more confidence in this approach if there were public, periodic benchmarks that run on test-infra. The results could be displayed in perf-dash. These benchmarks should run on the higher QPS limit. The job should be carefully observed for signs of instability. The job would be temporary and is only expected to run for a 3-4 releases; once the scheduler's client-side QPS limits are removed, just the regular job (
ci-kubernetes-kubemark-100-gce
) would be sufficient.A separate job is needed to do this because all changes happen during cluster creation. As a result, existing jobs cannot be re-used for this.
/sig scheduling
cc @ahg-g