Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving a SIGUSR2 makes Vault log the running goroutines' stacks. #6240

Merged
merged 4 commits into from
Mar 15, 2019

Conversation

ncabatoff
Copy link
Collaborator

Fixes #6180.

signalCh := make(chan os.Signal, 1)
signal.Notify(signalCh, syscall.SIGUSR2)
go func() {
var buf = make([]byte, 65536)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems extremely small to me given panics we've seen from live running systems. I'd suggest it needs to be 1MB at the very minimum and should possibly be significantly more, up to 32 or 64MB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. How would you feel about a new config option? Since we recommend disabling swap I'm reluctant to pre-allocate enough to handle the biggest systems, and then foist that on the smaller ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a one-time allocation of somewhere between 1 and say 32MB that can be GC'd almost immediately and is a result of a direct user action as a real burden. Especially since this will almost never be used outside of a direct debugging context with our team.

I forgot you're doing the allocation outside of the signal handler. Just put it inside the for loop and you don't have to have it preallocated if it's not being used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it outside the for loop thinking that if the user is in a situation where they need to look at stack traces, there might not be memory available to allocate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're in a state where there is no memory and no swap available they're probably edging on OOM anyways so at that point they can probably SIGQUIT :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case 64MB or whatever isn't enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you want. I'd be shocked if 64MB isn't enough. I think in the majority of cases 1MB will be plenty. I think 32MB is a reasonable upper limit in all cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's start with 64MB then and we'll see whether it's an issue in practice before adding complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 is fine, personally I'd go 32.

@jefferai
Copy link
Member

Is there a reason to put this goroutine where it is instead of simply adding this as a case at https://github.com/hashicorp/vault/pull/6240/files#diff-593cd0d199cb70fe727e6b64eb150baaL1175 ?

@ncabatoff
Copy link
Collaborator Author

Is there a reason to put this goroutine where it is instead of simply adding this as a case at https://github.com/hashicorp/vault/pull/6240/files#diff-593cd0d199cb70fe727e6b64eb150baaL1175 ?

No, that sounds like a good idea.

@ncabatoff
Copy link
Collaborator Author

Actually, now I remember why I didn't put it in the place you suggested. It's hard to get a logger from there. Any suggestions, or should I just leave it where I had it?

@jefferai
Copy link
Member

I would just use c.logger.

signals are handled.  Use a 32MB buffer allocated on demand instead of a
fixed 64KB buffer.
@ncabatoff
Copy link
Collaborator Author

I'd misunderstood, I thought you wanted me to follow the pattern used for the other signal handlers.

@jefferai
Copy link
Member

Now I'm confused. I see c.logger being used in the sighup channel case. Isn't that following the same pattern?

@ncabatoff
Copy link
Collaborator Author

Apologies for confusing you with my confusion. Fixed, I hope.

jefferai
jefferai previously approved these changes Feb 15, 2019
Copy link
Member

@jefferai jefferai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Co-Authored-By: ncabatoff <nick.cabatoff@gmail.com>
@@ -1238,6 +1239,11 @@ CLUSTER_SYNTHESIS_COMPLETE:
if err := c.Reload(c.reloadFuncsLock, c.reloadFuncs, c.flagConfigs); err != nil {
c.UI.Error(fmt.Sprintf("Error(s) were encountered during reload: %s", err))
}

case <-c.SigUSR2Ch:
buf := make([]byte, 32*1024*1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could allocate this only the first time SigUSR2 is called and then just reslice on each call, but I don't think it's a super important optimization.

@ncabatoff ncabatoff merged commit ccfeef6 into master Mar 15, 2019
@ncabatoff ncabatoff deleted the dump-goroutines-nondestructively branch August 23, 2019 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants