-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logging: fix delegating log sink races #12298
Conversation
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a link sink to get removed by one thread while getting written to be another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used or death test stderr workaround for coverage which has been removed because it no longer logs to a file. Fixes #11841 Signed-off-by: Matt Klein <mklein@lyft.com>
cc @jmarantz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch - thank you for debugging all of these!
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
@alyssawilk updated |
I'm not sure why my clang-tidy fixes are not working. cc @lizan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for digging into this. Is there more of a story. you can put into the description about why EXPECT_DEATH_LOG_TO_STDERR
is lo longer needed?
// protection is really only needed in tests. It would be nice to figure out a test-only | ||
// mechanism for this that does not require extra locking that we don't explicitly need in the | ||
// prod code. | ||
absl::ReaderMutexLock sink_lock(&sink_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was doing benchmarks around ReaderMutexLock for symbol-tables, I found that the overall performance was worse than just using normal locks.
I tried this with a co-workers 72-core machine as well. I wouldn't say my tests were conclusive but it made me shy away from these if performance is the goal.
I do have an idea for test-only locks: you could use BasicLockable and have an empty implementation of that, which you could pass into LoggerContext in prod, and use a real one when instantiating the logger in test/test_runner.cc. You might need to update LoggerContext to take two mutexes rather than one, for this purpose.
It's would need careful commenting/variable-naming as a mutex lock which is a no-op in production could be dangerous if misinterpreted by a later change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this with a co-workers 72-core machine as well. I wouldn't say my tests were conclusive but it made me shy away from these if performance is the goal.
I guess my feeling on this is that a) logging performance is not the most critical thing since this is already after log level checks and b) it seems like the read mutex should be faster here, though if it isn't that seems like a larger problem that I Would rather not tackle right now as part of this change, so my inclination is to just leave this for now as it's technically correct and we can revisit later if needed.
It's would need careful commenting/variable-naming as a mutex lock which is a no-op in production could be dangerous if misinterpreted by a later change.
Per above since it's not 100% critical that we maximize perf in this path I think I would rather get this flake fix in and possibly try out your idea as part of the TODO later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo @jmarantz comments
Updated |
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: chaoqinli <chaoqinli@google.com>
This fixes two different issues:
so it was possibles for a log sink to get removed by one
thread while getting written to by another thread.
do the swap after the derived class was destroyed, leading to
undefined behavior.
This was easy to reproduce in cx_limit_integration_test but is
an issue anywhere the log expectations are used, or previously in the death test
stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has
been removed because coverage no longer logs to a file and instead logs to stderr
like the rest of the tests.
Fixes #11841
Risk Level: Medium, code is a bit scary, though only really in tests
Testing: Existing tests
Docs Changes: N/A
Release Notes: N/A