-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate to tcp probe in queue-proxy when exec probe was used for readinessprobe in user-container #5712
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nak3: 0 warnings.
In response to this:
Proposed Changes
When users defined exec probe for readinessprobe, it becomes
nil
and
get unexpected error. To make matters worse, the pod never becomes ready.This patch changes to translates exec probe in user-container to tcp
probe as per following code comment.serving/pkg/queue/readiness/probe.go
Lines 65 to 66 in 3174fce
// Should never be reachable. Exec probes to be translated to // TCP probes when container is built. /lint
Fixes #5711
Release Note
NONE
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@nak3 What's the unexpected error you see? Would it be possible to create an e2e test that exhibits the behavior you describe? I can think of ways we could abuse the relationship and the exec probe's wait period to create a solid window of 5xx errors. |
dc8b2c1
to
56b52ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nak3 Does "no probe found" mean this never becomes ready?
for _, tc := range testCases { | ||
tc := tc | ||
t.Run(tc.name, func(t *testing.T) { | ||
t.Parallel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will detach from the logstream, I'd move that in here as part of the change to t.Run
corev1.Handler{ | ||
Exec: &corev1.ExecAction{ | ||
Command: []string{"/ko-app/runtime", "probe"}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a case for TCP while you are in here?
I feel like for conformance we should have variants of these tests that FAIL the readiness probes. Could you either add that or open an issue to follow up?
cc @dgerd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've sort of been here before, but the runtime contract suggests that only httpGet
and tcpSocket
probes should be used. We've quietly left exec probes in, but do we want to codify their allowance into our conformance tests?
@@ -355,7 +355,12 @@ func applyReadinessProbeDefaults(p *corev1.Probe, port int32) { | |||
p.TCPSocket.Host = localAddress | |||
p.TCPSocket.Port = intstr.FromInt(int(port)) | |||
case p.Exec != nil: | |||
//User-defined ExecProbe will still be run on user-container. | |||
// User-defined ExecProbe will still be run on user-container. | |||
// Translate to TCP probes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we initially had something like this on #4731 but it was decided to leave it out since it didn't necessarily tell us what we needed to know. See: #4731 (comment)
It looks like something with an execProbe will never become ready. There's an (outdated) comment in the readinessProbe code that suggests that an execProbe will be translated to a tcp probe, but I think we just missed pulling that out when we reverted the functionality linked above. I think we missed changing A possible fix that lets us maintain the undocumented support of execProbes would be to succeed in cases where the queue-proxy does not find an |
It looks like the runtime contract forbids execProbe, so we should just refuse
https://github.com/knative/serving/blob/master/docs/runtime-contract.md#meta-requests |
Yeah, but how can I probe the gRPC server? Current widespread strategy is to use execProbe with grpc-health-probe bundled in the image. |
I brought up the case of the runtime contract vs. execProbes a few months ago and the consensus was to leave them be. Is it time to either revisit this or consider loosening the runtime contract? |
/cc @evankanderson |
/test pull-knative-serving-unit-tests |
So, there are three options for the change. When exec probe is used in user container, a. Translate it to TCP probe in queue-proxy (current PR) Even if we do not support exec probe, I would like to change the current error message. The current error #5711 is very unclear error. For a and b, I can open an issue to track it if it is difficult to make the decision. |
FWIW I do prefer the current implementation (option A above). It makes sure that we test for general connectivity on the configured ports regardless on the user's probes. I do wonder why we forbid exec probes or at least discourage them. @mattmoor any idea on that? Do we need to take this to the API WG in that regard? |
As per @dgerd 's #4086 (comment), I think option B above was the current expected workaround discussed in #4086 actually. @dgerd any thoughts for this? |
/test pull-knative-serving-smoke-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/assign @dgerd |
/hold cancel |
/test pull-knative-serving-upgrade-tests |
2 similar comments
/test pull-knative-serving-upgrade-tests |
/test pull-knative-serving-upgrade-tests |
/test pull-knative-serving-upgrade-tests
I think this is not related to this change. I will rebase if next re-test failed. |
…adinessprobe in user-container When users defined exec probe for readinessprobe, it becomes `nil` and get unexpected error. To make matters worse, the pod never becomes ready. This patch changes to translates exec probe in user-container to tcp probe.
/retest |
/test pull-knative-serving-upgrade-tests It looks like Let's confirm it once again and will file the issue if it fail. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markusthoemmes, nak3 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following jobs failed:
Failed non-flaky tests preventing automatic retry of pull-knative-serving-integration-tests:
|
/test pull-knative-serving-integration-tests |
Proposed Changes
When users defined exec probe for readinessprobe, it becomes
nil
andget unexpected error. To make matters worse, the pod never becomes ready.
This patch changes to translates exec probe in user-container to tcp
probe as per following code comment.
serving/pkg/queue/readiness/probe.go
Lines 65 to 66 in 3174fce
/lint
Fixes #5711
Release Note