-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cx_limit_integration_test flakes #11841
Comments
Fails consistently on windows too. have a PR out for that just to get windows CI green |
I'm wondering if there's a race, where the client picks up the disconnect before the server decrements the overload counter, so the next connection fails. If so I'm not sure if there's any counters we should explicitly wait for, or if we just try connecting until one gets through? cc @tonya11en |
Another failure on ASAN 2020-07-01T14:08:07.8198111Z [ RUN ] IpVersions/ConnectionLimitIntegrationTest.TestGlobalLimit/IPv4 I think this is the double close so I'll fix this one |
Risk Level: n/a Testing: n/a Docs Changes: n/a Release Notes: n/a part of #11841 Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
2020-07-21T00:06:09.0956342Z [ RUN ] IpVersions/ConnectionLimitIntegrationTest.TestBothLimits/IPv6 |
I was looking into this yesterday and I will continue to look into it today. I have the debug logs from a repro and I have uploaded them here The repro is from a windows machine but I dont think it should matter |
This is still broken. I am looking into this further. |
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a link sink to get removed by one thread while getting written to be another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used or death test stderr workaround for coverage which has been removed because it no longer logs to a file. Fixes #11841 Signed-off-by: Matt Klein <mklein@lyft.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes #11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com>
This fixes two different issues: 1) Previously there was no locking around log sink replacement, so it was possibles for a log sink to get removed by one thread while getting written to by another thread. 2) Even with locking, the base class destructor pattern would do the swap after the derived class was destroyed, leading to undefined behavior. This was easy to reproduce in cx_limit_integration_test but is an issue anywhere the log expectations are used, or previously in the death test stderr workaround (EXPECT_DEATH_LOG_TO_STDERR) for coverage which has been removed because coverage no longer logs to a file and instead logs to stderr like the rest of the tests. Fixes envoyproxy#11841 Risk Level: Medium, code is a bit scary, though only really in tests Testing: Existing tests Docs Changes: N/A Release Notes: N/A Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: chaoqinli <chaoqinli@google.com>
This appears to be resolved now on Windows with the work on simtime, proposing to reenable the test on windows with #12695 |
2020-07-01T13:04:39.4991615Z [ RUN ] IpVersions/ConnectionLimitIntegrationTest.TestBothLimits/IPv4
2020-07-01T13:04:39.4992248Z test/integration/cx_limit_integration_test.cc:74: Failure
2020-07-01T13:04:39.4993196Z Value of: fake_upstreams_[0]->waitForRawConnection(raw_conns.back())
2020-07-01T13:04:39.4994095Z Actual: false (Timed out waiting for raw connection)
2020-07-01T13:04:39.4994618Z Expected: true
2020-07-01T13:04:39.4995531Z Stack trace:
2020-07-01T13:04:39.4996224Z 0x1d3caf5: __llvm_coverage_mapping
2020-07-01T13:04:39.4999246Z 0x1d43958: __llvm_coverage_mapping
2020-07-01T13:04:39.4999853Z 0x59ff1b6: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
2020-07-01T13:04:39.5000495Z 0x59ea751: testing::internal::HandleExceptionsInMethodIfSupported<>()
2020-07-01T13:04:39.5001236Z 0x59d4d82: testing::Test::Run()
2020-07-01T13:04:39.5001716Z 0x59d5888: testing::TestInfo::Run()
2020-07-01T13:04:39.5002321Z ... Google Test internal frames ...
2020-07-01T13:04:39.5002653Z
2020-07-01T13:04:39.5003721Z [ FAILED ] IpVersions/ConnectionLimitIntegrationTest.TestBothLimits/IPv4, where GetParam() = 4-byte object <00-00 00-00> (16148 ms)
The text was updated successfully, but these errors were encountered: