-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm: onLog is not always called for network streams on upstream failure #13806
Comments
Note: stream contexts are also leaking (they are not destroyed in null VM after the stream is gone). |
We chatted briefly and believe that we should probably simplify the ABI to stop trying to figure out who closed the connection (upstream or downstream) and then wait for both. Instead, we should just emit closure event if either upstream or downstream closes. |
You can do this right now, there is nothing at the ABI level that is stopping you from performing actions on downstream or upstream connection close. It looks that the guard to wait for both connections being closed was added here: envoyproxy/envoy-wasm#453, though I don't see anything wrong there. Note that the "upstream connection close" event implementation in Envoy is a bit hacky right now, and it's triggered when downstream connection sees cc @gargnupur |
At the time of envoyproxy/envoy-wasm#453 the lifecycles were not very clear and hence the guard was added to make sure, we don't close preemptively. But looks like now, it's ok to close if downstream close event is reached.. |
From what @kyessenov mentioned yesterday, I was under the impression that downstream close events are getting lost, and if that's the case, then relying on them might not solve anything. |
I think we might be lost in the terminology here. Upstream with respect to the downstream connection is the proxy, so upstream close really is downstream local close from the client perspective. From reading other network filters, either Local or Remote close is guaranteed for a connection (but not both). I suggest we follow this practice and initiate termination once any close event is received. I don't think we can reliably detect whether the connection to an upstream host is closed (it's not even always there for direct response, or could be tunnelled). |
The issue is that the upstream close event is not raised if the connection to upstream was never established (e.g. connection timeout, connection refused), but the context destruction is gated on that event, so it leaks in case of failures. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
Fixed / worked around in #13836. |
Ref: istio/istio#24720
Validated with a flakey upstream that becomes unavailable at an interval.
Recording number of calls to onLog and onNewConnection, we see that onLog is not always called:
connection_count: 11030 log_count: 10662
/cc @PiotrSikora
The text was updated successfully, but these errors were encountered: