Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report RPS for autoscaler metrics #5238

Merged
merged 1 commit into from
Aug 26, 2019

Conversation

taragu
Copy link
Contributor

@taragu taragu commented Aug 21, 2019

/lint

Part of #5228

Proposed Changes

  • Add reporting for stable and panic RPS metrics

Release Note

NONE

/cc @yanweiguo

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 21, 2019
@knative-prow-robot knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/autoscale area/monitoring labels Aug 21, 2019
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 0 warnings.

In response to this:

/lint

Part of #5228

Proposed Changes

  • Add reporting for stable and panic RPS metrics

Release Note

NONE

/cc @yanweiguo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot
Copy link
Contributor

Hi @taragu. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 21, 2019
@taragu
Copy link
Contributor Author

taragu commented Aug 21, 2019

/hold
because knative/pkg#589 should be merged before this one

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019
Copy link
Contributor

@yanweiguo yanweiguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert knative/pkg#589.

panicRPSM = stats.Float64(
"panic_requests_per_second",
"Average requests-per-second per observed pod over the panic window",
stats.UnitDimensionless)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add targetRPS as well?

@@ -53,6 +53,14 @@ var (
"panic_request_concurrency",
"Average of requests count per observed pod over the panic window",
stats.UnitDimensionless)
stableRPSM = stats.Float64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to register these metrics in func register.

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/API API objects and controllers and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 21, 2019
@taragu
Copy link
Contributor Author

taragu commented Aug 21, 2019

/hold cancel

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 21, 2019
@taragu taragu force-pushed the autoscaler-rps-metrics branch from e3cf049 to ba70dd1 Compare August 21, 2019 19:03
@@ -48,6 +48,8 @@ type DeciderSpec struct {
// The value of scaling metric per pod that we target to maintain.
// TargetValue <= TotalValue.
TargetValue float64
// The value of requests-per-second per pod that we target to maintain.
TargetRPS float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. TargetValue is intended to be used for any metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh got it!

"expr": "sum(autoscaler_target_requests_per_second{namespace_name=\"$namespace\", configuration_name=\"$configuration\", revision_name=\"$revision\"})",
"format": "time_series",
"interval": "1s",
"intervalFactor": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change the title for this dashboard as well or create a new one for RPS.

@@ -65,6 +65,7 @@ func MakeDecider(ctx context.Context, pa *v1alpha1.PodAutoscaler, config *autosc
MaxScaleUpRate: config.MaxScaleUpRate,
ScalingMetric: pa.Metric(),
TargetValue: target,
TargetRPS: config.RPSTargetDefault,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @yanweiguo mentioned, this shouldn't be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still here.

@yanweiguo
Copy link
Contributor

Hi @taragu, thanks for this PR.

Could you test it end to end to get a screenshot of the Grafana dashboard? I'm not confident with what it will look like when we put concurrency and RPS into one table.

@taragu taragu force-pushed the autoscaler-rps-metrics branch 2 times, most recently from 5d3d452 to 0f4a00c Compare August 26, 2019 14:21
@taragu
Copy link
Contributor Author

taragu commented Aug 26, 2019

@yanweiguo I've updated the PR to put the RPS metrics in a separate graph. Would you please review the PR again? This is what the dashboard looks like
2019-08-26_10-19-10

@taragu taragu force-pushed the autoscaler-rps-metrics branch from 0f4a00c to ad886da Compare August 26, 2019 14:29
@vagababov
Copy link
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 26, 2019
@knative-test-reporter-robot

The following jobs failed due to test flakiness:

Test name Triggers Retries
pull-knative-serving-integration-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-integration-tests:

test/e2e.TestDestroyPodTimely

@taragu taragu force-pushed the autoscaler-rps-metrics branch from ad886da to aead62f Compare August 26, 2019 15:14
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/autoscaler/autoscaler.go 96.1% 96.2% 0.1
pkg/autoscaler/stats_reporter.go 87.2% 88.1% 0.9

@yanweiguo
Copy link
Contributor

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2019
@taragu
Copy link
Contributor Author

taragu commented Aug 26, 2019

/assign @mdemirhan

@yanweiguo
Copy link
Contributor

/approve
For dashboard part.

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: taragu, vagababov, yanweiguo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2019
@knative-prow-robot knative-prow-robot merged commit 6591f41 into knative:master Aug 26, 2019
@taragu taragu deleted the autoscaler-rps-metrics branch November 12, 2019 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/monitoring cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants