logging: fix delegating log sink races #12298

mattklein123 · 2020-07-26T05:09:18Z

This fixes two different issues:

Previously there was no locking around log sink replacement,
so it was possibles for a log sink to get removed by one
thread while getting written to by another thread.
Even with locking, the base class destructor pattern would
do the swap after the derived class was destroyed, leading to
undefined behavior.

This was easy to reproduce in cx_limit_integration_test but is
an issue anywhere the log expectations are used, or previously in the death test
stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has
been removed because coverage no longer logs to a file and instead logs to stderr
like the rest of the tests.

Fixes #11841

Risk Level: Medium, code is a bit scary, though only really in tests
Testing: Existing tests
Docs Changes: N/A
Release Notes: N/A

This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a link sink to get removed by one thread while getting written to be another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used or death test stderr workaround for coverage which has been removed because it no longer logs to a file. Fixes #11841 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 · 2020-07-26T05:10:21Z

cc @jmarantz

alyssawilk

Nice catch - thank you for debugging all of these!

source/common/common/logger.cc

Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 · 2020-07-27T22:53:56Z

@alyssawilk updated

mattklein123 · 2020-07-28T00:12:50Z

I'm not sure why my clang-tidy fixes are not working. cc @lizan

jmarantz

Thanks very much for digging into this. Is there more of a story. you can put into the description about why EXPECT_DEATH_LOG_TO_STDERR is lo longer needed?

jmarantz · 2020-07-28T12:16:30Z

source/common/common/logger.cc

+  // protection is really only needed in tests. It would be nice to figure out a test-only
+  // mechanism for this that does not require extra locking that we don't explicitly need in the
+  // prod code.
+  absl::ReaderMutexLock sink_lock(&sink_mutex_);


When I was doing benchmarks around ReaderMutexLock for symbol-tables, I found that the overall performance was worse than just using normal locks.

I tried this with a co-workers 72-core machine as well. I wouldn't say my tests were conclusive but it made me shy away from these if performance is the goal.

I do have an idea for test-only locks: you could use BasicLockable and have an empty implementation of that, which you could pass into LoggerContext in prod, and use a real one when instantiating the logger in test/test_runner.cc. You might need to update LoggerContext to take two mutexes rather than one, for this purpose.

It's would need careful commenting/variable-naming as a mutex lock which is a no-op in production could be dangerous if misinterpreted by a later change.

I tried this with a co-workers 72-core machine as well. I wouldn't say my tests were conclusive but it made me shy away from these if performance is the goal.

I guess my feeling on this is that a) logging performance is not the most critical thing since this is already after log level checks and b) it seems like the read mutex should be faster here, though if it isn't that seems like a larger problem that I Would rather not tackle right now as part of this change, so my inclination is to just leave this for now as it's technically correct and we can revisit later if needed.

It's would need careful commenting/variable-naming as a mutex lock which is a no-op in production could be dangerous if misinterpreted by a later change.

Per above since it's not 100% critical that we maximize perf in this path I think I would rather get this flake fix in and possibly try out your idea as part of the TODO later.

alyssawilk

LGTM modulo @jmarantz comments

mattklein123 · 2020-07-28T16:47:56Z

Thanks very much for digging into this. Is there more of a story. you can put into the description about why EXPECT_DEATH_LOG_TO_STDERR is lo longer needed?

Updated

This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>

This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com>

This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: chaoqinli <chaoqinli@google.com>

mattklein123 requested review from alyssawilk and junr03 as code owners July 26, 2020 05:09

mattklein123 assigned jmarantz and alyssawilk Jul 26, 2020

mattklein123 changed the title ~~logging: fix delegating lock sink races~~ logging: fix delegating log sink races Jul 26, 2020

alyssawilk reviewed Jul 27, 2020

View reviewed changes

source/common/common/logger.cc Show resolved Hide resolved

source/common/common/logger.cc Show resolved Hide resolved

mattklein123 added 2 commits July 27, 2020 22:30

Merge remote-tracking branch 'origin/master' into cx_limit_flake_fixes

d55e271

Signed-off-by: Matt Klein <mklein@lyft.com>

comments and tidy fixes

e9de895

Signed-off-by: Matt Klein <mklein@lyft.com>

jmarantz reviewed Jul 28, 2020

View reviewed changes

alyssawilk approved these changes Jul 28, 2020

View reviewed changes

mattklein123 merged commit dbbcc69 into master Jul 28, 2020

mattklein123 deleted the cx_limit_flake_fixes branch July 28, 2020 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logging: fix delegating log sink races #12298

logging: fix delegating log sink races #12298

mattklein123 commented Jul 26, 2020 •

edited

Loading

mattklein123 commented Jul 26, 2020

alyssawilk left a comment

mattklein123 commented Jul 27, 2020

mattklein123 commented Jul 28, 2020

jmarantz left a comment

jmarantz Jul 28, 2020 •

edited

Loading

mattklein123 Jul 28, 2020

alyssawilk left a comment

mattklein123 commented Jul 28, 2020

logging: fix delegating log sink races #12298

logging: fix delegating log sink races #12298

Conversation

mattklein123 commented Jul 26, 2020 • edited Loading

mattklein123 commented Jul 26, 2020

alyssawilk left a comment

Choose a reason for hiding this comment

mattklein123 commented Jul 27, 2020

mattklein123 commented Jul 28, 2020

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz Jul 28, 2020 • edited Loading

Choose a reason for hiding this comment

mattklein123 Jul 28, 2020

Choose a reason for hiding this comment

alyssawilk left a comment

Choose a reason for hiding this comment

mattklein123 commented Jul 28, 2020

mattklein123 commented Jul 26, 2020 •

edited

Loading

jmarantz Jul 28, 2020 •

edited

Loading