-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement timeouts in checks and check hooks #893
Conversation
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
…u-go into feature/check-timeout Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
…u-go into feature/check-timeout
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
…u-go into feature/check-timeout Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
Two small things, otherwise it seems we might have to prevent a race condition (see travis unit tests)
command/command.go
Outdated
// Kill process and all of its children when the timeout has expired. | ||
// context.WithTimeout will not kill child/grandchild processes | ||
// (see issues tagged in https://github.com/sensu/sensu-go/issues/781), | ||
// rather we will use a timer and utility_os package to perform full cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the reference to utility_os
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, yeah!
command/command.go
Outdated
SetProcessGroup(cmd) | ||
time.AfterFunc(time.Duration(execution.Timeout)*time.Second, func() { | ||
if err := KillProcess(cmd); err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log something here, in case we are not able to kill the process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that something should happen here. We would return an error if the process is already done or if a failure writing to the process file occurs. Perhaps we should return immediately from the ExecuteCommand function, since this is an unexpected case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent about whether or not we should return immediately, since the command executed successfully and it's more about cleaning up our mess 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be something similar to https://github.com/enkessler/childprocess/blob/master/lib/childprocess/unix/process.rb#L22-L26?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about it some more, and on the rare instance that the process cleanly, or unexpectedly exits after the timeout but before we kill the process, we should still continue as normal with logging the error. 👍
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
Signed-off-by: Nikki Attea <nikki@sensu.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if possible, try to squash the commits when merging into master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍕 🌮 🍨 🍔 🌯 this looks good! a squash for some of the commits would be 💯
What is this change?
Implements a timeout in check and check hook execution, adds the timeout field to CheckConfig.
Why is this change necessary?
Closes #781
Closes #879
Does your change need a Changelog entry?
Yas.
Do you need clarification on anything?
Nah.
Were there any complications while making this change?
There were several. Most of them are documented in #781.